My experiments with Big Data: Reading data in R

Tuesday, September 16, 2014

Reading data in R

There are a couple of simple things to try, whether you use read.table or scan.

Set nrows=the number of records in your data (nmax in scan).
Make sure that comment.char="" to turn off interpretation of comments.
Explicitly define the classes of each column using colClasses in read.table.
Setting multi.line=FALSE may also improve performance in scan.

If none of these thing work, then use one of the profiling packages to determine which lines are slowing things down. Perhaps you can write a cut down version of read.table based on the results.

The other alternative is filtering your data before you read it into R.

Or, if the problem is that you have to read it in regularly, then use these methods to read the data in once, then save the data frame as a binary blob with save, then next time you can retrieve it faster with load.

http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r

My experiments with Big Data

Pages

Tuesday, September 16, 2014

Reading data in R

No comments:

Post a Comment