- If the url starts with http you can use download.file()
- If the url starts with https, you may have to set method = curl
- Check if a directory exists or not :
- if (!file.exists("data")) {dir.create("data")}
- Example :
- fileUrl <- span="">"https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"->
- download.file(fileUrl, destfile = "./data/cameras.csv", method = "curl")
- list.files("./data")
- dateDownloaded <- date="" li="">
- dateDownloaded ->
- Read data from the file
- cameraData <- read.table="" span="">"./data/cameras.csv", sep = ",", header = TRUE)->
- head(cameraData)
- read.csv sets sep="," and header=true
- quote - you can tell R whether there are any quoted values quote="" means no quotes.
- na.strings - set the character that represents a missing value.
- nrows - how many rows to read of the file (e.g. nrows=10 reads 10 lines).
- skip - number of lines to skip before starting to read
- The biggest trouble with reading flat files are quotation marks ` or " placed in data values, setting quote="" often resolves these.
- Reading excel files :
- library(xlsx)
- cameraData <- read.xlsx="" span="">"./data/cameras.xlsx",sheetIndex=1,header=TRUE)->
- head(cameraData)
- Reading specific rows and columns
- colIndex <- span="">2:3->
-
rowIndex <- span="">1:4->
- cameraDataSubset <-read .xlsx="" span="">"./data/cameras.xlsx",sheetIndex=1,colIndex=colIndex,rowIndex=rowIndex)-read>
- read.xlsx2 is much faster than read.xlsx but for reading subsets of rows may be slightly unstable.
This blog is about my learnings in big data, product management and digital advertising.
Friday, September 12, 2014
Downloading and Reading data in R
Labels:
R
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment