In this prerequisite module, we give an overview of R, the open source statistics package that we will be using in this course. To learn more about using R with healthcare and medicine data, we recommend the book R for Medicine and Biology

About R

R is one of the leading tools for statistics, data analysis, and machine learning, and can be downloaded for a variety of different platforms at the R-Project website. "R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. S was created by John Chambers while at Bell Labs. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand." It is more than a statistical package - it’s a programming language, so you can create your own objects, functions, and packages. There are over 2,000 user-contributed packages (extensions) available on CRAN (not to mention Bioconductor and Omegahat). To get an idea of what packages are available, take a look at these Task Views. Many packages are submitted by prominent members of their respective fields.

Since it is a kind of programming language, R programs explicitly document the steps of your analysis and make it easy to reproduce and/or update analysis, which means you can quickly evolve ideas and/or correct issues. It's platform-independent, so you can use it on any operating system. And it's free, so you can use it at any employer without having to persuade someone to purchase a license.

Not only is R free, but it's also open-source. That means anyone can, if they wish, examine the source code to see exactly what it’s doing. This also means that you, or anyone, can fix bugs and/or add features, rather than waiting for the vendor to find/fix the bug and/or add the feature--at their discretion--in a future release.R allows you to integrate with other languages (C/C++, Java, Python) and enables you to interact with many data sources: ODBC-compliant databases (Excel, Access) and other statistical packages (SAS, Stata, SPSS, Minitab). Explicit parallelism is straightforward in R (see the High Performance Computing Task View): several packages allow you to take advantage of multiple cores, either on a single machine or across a network. You can also build R with custom BLAS. R has a large, active, and growing community of users. The mailing lists provide access to many users and package authors who are experts in their respective fields. Additionally, there are several R conferences every year. The most prominent and general is useR. Finance-related conferences include Rmetrics Workshop on Computational Finance and Financial Engineering in Meielisalp, Switzerland and R/Finance: Applied Finance with R in Chicago, USA.

An excellent interface for using R is called R Studio. This provides a graphical front end and various other features.

How to get started with R.

There are two ways for you to use R: you can either install it on your own machine (Mac, PC, Linux) and use it from the command line (you can also download R Studio in addition if you want to use R with a graphical interface) ; or you can use R and R-Studio from any computer using the IU Anywhere virtual machine, and save your files in your IU Box or Google Drive account.

The following video goes through how to use R and R-Studio from IU Anywhere. To save files, you will also need to link your Box account in Oncourse (instructions for doing this are also given in the video).

This 20-minute video shows the basic functionalities of R, and also gives instructions on how to download and install R on your own machine.

A series of online lectures is also available, and for a deeper, practical learning, check out the book R for Medicine and Biology.

References and Cheat Sheets:

1. R Blogger website
2. Using R
3. Cheat sheet1
4. Cheat sheet2
5. Cheat Sheet3
6. Plotting and Graphics Cheat Sheet
7. One page survival guide in R

R Resources
These external resources can help you get started working with R.
Download and Install R
Download and Install RStudio

R Help and Examples
Quick R
R Cookbook
StackOverflow About R
StackOverflow R FAQ

R Basics
Titanic: Getting Started with R by Trevor Stephens (4 part - blog series)
R Video Tutorials by Google Developers (short videos on YouTube)
R Two Minute Tutorials by Anthony Damico (website)
Debugging with RStudio (website)
Import Data into R (blog post)
Sample random rows from a data frame (StackoverFlow Post)

R Style Guide
R Style Guide by Hadley Wickham (pdf)
Google's R Style Guide (website)

ggplot2 Resources
A Simple Intro to the Graphing Philosophy of ggplot2 by Tom Hopper (blog post)
Grammar of Graphics by Hadley Wickham (pdf)
Grammar of Graphics: Past, Present, and Future by Hadley Wickham (pdf)
ggplot2 official documentation (website)
ggplot2 tutorial by Ramon Saccilotto (pdf)
ggplot2: Cheatsheet for Visualizations (website)
ggplot2: Scales and Themes (website)
ggplot2: geom quick reference (website)
Intro to Stats Using R
Open Intro to Statistics Labs (website)
Supplemental Materials
Create a Heatmap using the Base Graphics (blog post)
Converting Between Long and Wide Format (website)
Melt Data Frames (blog post)
Predict Movie Ratings using IMDB and R (blog post)
A Visual Guide to Correlation (jpg)

The following texts are optional for the course. These texts can enhance your learning, but they are not required to succeed in this course.

Exploratory Data Analysis by John Tukey
Visualizing Data by William S. Cleveland
ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham

Continue Learning
The R Meta Book, which is a collection of links and resources by Joseph Rickham (GREAT blog post)
Fitting and Interpreting Linear Models in R (blog post)
Analyze your social network on Facebook using R (blog post)
Multivariate Display about Movies: Genres (blog post)
Multivariate Display about Movies: Comparing Movie Sequels to their Originals (blog post)
Predict Movie Ratings using your IMDB Data and R (blog post)
Read Large Data Sets:The Iterator Package by Flavio Barros (blog post)
Where can I find large data sets that are open to the public? (Quora post)