My experiments with Big Data: Downloading and Reading data in R

Friday, September 12, 2014

fileUrl <- span="">"https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
download.file(fileUrl, destfile = "./data/cameras.csv", method = "curl")
list.files("./data")
dateDownloaded <- date="" li="">
dateDownloaded

cameraData <- read.table="" span="">"./data/cameras.csv", sep = ",", header = TRUE)
head(cameraData)

read.csv sets sep="," and header=true
quote - you can tell R whether there are any quoted values quote="" means no quotes.
na.strings - set the character that represents a missing value.
nrows - how many rows to read of the file (e.g. nrows=10 reads 10 lines).
skip - number of lines to skip before starting to read
The biggest trouble with reading flat files are quotation marks ` or " placed in data values, setting quote="" often resolves these.
Reading excel files :

library(xlsx)
cameraData <- read.xlsx="" span="">"./data/cameras.xlsx",sheetIndex=1,header=TRUE)
head(cameraData)

colIndex <- span="">2:3
rowIndex <- span="">1:4
cameraDataSubset <-read .xlsx="" span="">"./data/cameras.xlsx",sheetIndex=1,colIndex=colIndex,rowIndex=rowIndex)

read.xlsx2 is much faster than read.xlsx but for reading subsets of rows may be slightly unstable.

My experiments with Big Data