Pages

Friday, September 12, 2014

Downloading and Reading data in R


  1. If the url starts with http you can use download.file()
  2. If the url starts with https, you may have to set method = curl
  3. Check if a directory exists or not :
    1. if (!file.exists("data")) {dir.create("data")}
  4. Example : 
    1. fileUrl <- span="">"https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD"
    2. download.file(fileUrl, destfile = "./data/cameras.csv", method = "curl")
    3. list.files("./data")
    4. dateDownloaded <- date="" li="">
    5. dateDownloaded
  5. Read data from the file
    1. cameraData <- read.table="" span="">"./data/cameras.csv", sep = ",", header = TRUE)
    2. head(cameraData)
  6. read.csv sets sep="," and header=true
  7. quote - you can tell R whether there are any quoted values quote="" means no quotes.
  8. na.strings - set the character that represents a missing value.
  9. nrows - how many rows to read of the file (e.g. nrows=10 reads 10 lines).
  10. skip - number of lines to skip before starting to read
  11. The biggest trouble with reading flat files are quotation marks ` or " placed in data values, setting quote="" often resolves these.
  12. Reading excel files :
    1. library(xlsx)
    2. cameraData <- read.xlsx="" span="">"./data/cameras.xlsx",sheetIndex=1,header=TRUE)
    3. head(cameraData)
  13. Reading specific rows and columns
    1. colIndex <- span="">2:3
    2. rowIndex <- span="">1:4
    3. cameraDataSubset <-read .xlsx="" span="">"./data/cameras.xlsx",sheetIndex=1,colIndex=colIndex,rowIndex=rowIndex)
  14. read.xlsx2 is much faster than read.xlsx but for reading subsets of rows may be slightly unstable.


    No comments:

    Post a Comment