Pages

Tuesday, August 5, 2014

Introduction to R

Concepts 
  1. Data types : character, numeric, integer, complex, logical
  2. A vector can only contain objects of the same class
  3. List is represented as a vector but can contain objects of different classes
  4. Numbers in R are generally represented as numeric objects
  5. If you explicitly want an integer, you need to specify the L suffix
  6. Ex : entering 1 will be treated as a numeric object and 1L will be treated as an integer
  7. R objects can have attributes - names, dimensions, class, other user defined attributes
  8. x <- -="" 1:20="" create="" integer="" is="" operation="" sequences="" span="" the="" to="" used="">
  9. The function c() is used to create vectors of objects
  10. Objects can be co-erced from one class to another using as.* function
    1. x <- 1:20="" as.character="" span="" x="">
  11. Non sensical co-ersion results in NA
  12. Matrices are vectors with dimension attribute. 
  13. Matrices are created columnwise, so entries start at the upper left corner
  14. Matrices can also be created from vectors by adding the dimension attribute.
    1. x <- 1:10="" c="" dim="" m="" nbsp="" span="">
  15. Matrices can also be created by column binding or row binding 
    1. x <- 10:12="" 2:4="" cbind="" m="" nbsp="" rbind="" span="" x="" y="">
  16. Factors - categorical data
  17. Missing values : NA and NAN. NA can be integer NA or character NA and they have classes.
  18. NAN value is NA but the converse is not true
  19. Data Frames 
    1. Special type of list where every element of the list has to be the same length
    2. Each element of the list can be thought of as a column and the length of each element of the list is the number of rows
    3. data frames can store different classes of objects in each column. Matrixes all elemets have to be of the same class
    4. Data frames also have special attributes called row.names
    5. Data frames are created by read.table() or read.csv()
  20. R Objects can have names
    1. m <- matrix="" nrow="2,ncol=2)</span">
    2. dimnames(m) <- a="" b="" c="" d="" list="" span="">
  21. sd
  22. sd
Examples
  1. How to get a list of available packages in R ?  
    
     
  2. How to install packages ?
    install.packages("slidify"); install.packages(c("slidify", "ggplot")); 
    source("http://www.bioconductor.org/biocLite.R");
    biocLite();
    #Place the names of the packages in a vector
    biocLite(c("GenomicFeatures","AnotationDbi"));
  3. How do you load R packages ?
    After loading a package the functions loaded in the package will be attached to the top of the search list
  4. How do you load the package in R ? library("slidify")
  5. 
    
  6. sd

1) How to read a csv file in R ?
1
data<- code="">read.csv(filename,header=TRUE)
2) How to display the first n lines of the file ?
1
head(data,n) : The default value of n is 6.
3) How to display the last n lines of the file ?
1
tail(data,n)
4) Calculate missing values in all the columns in the data set ?
1
colSums(data)
Other functions that can be used for this purpose are sapply and apply.
5) Calculate the mean of a column without the missing values ?
1
2
3
4
5
6
7
8
9
colMeans(data,na.rm=TRUE)
     Ozone    Solar.R       Wind       Temp      Month        Day
 42.129310 185.931507   9.957516  77.882353   6.993464  15.803922
 colMeans(data)
    Ozone   Solar.R      Wind      Temp     Month       Day
       NA        NA  9.957516 77.882353  6.993464 15.803922
 colMeans(data["Ozone"],na.rm=TRUE)
   Ozone
42.12931
6) Extract the subset of rows of the data frame where Ozone values are above 31 and Temp values are above 90. What is the mean of Solar.R in this subset?
1
2
3
colMeans(subset(data,(Ozone>31 & Temp>90)))
 Ozone Solar.R    Wind    Temp   Month     Day
 89.5   212.8     5.6    93.4     8.2    14.5
7) Find the mean temperature in the Month of n ?
1
2
3
colMeans(subset(data,Month==n))
    Ozone   Solar.R      Wind      Temp     Month       Day
    NA 190.16667  10.26667  79.10000   6.00000  15.50000
Additional Resources :
1) Filling in nas with column medians in R

No comments:

Post a Comment