Data Analysis with R — Part 1 (Getting Started)
The secret of getting ahead is getting started — Mark Twain
My posts on Data Analysis with R will be an adaptation of a workshop that I had conducted a couple of years ago. This series of posts would cover topics ranging from installing R/RStudio, basics of R programming, and all the way to running a machine learning model in R. The series could be suitable for any person who wants to learn R. I will try my best to convert the workshop materials and teachings into a set of lucid blog posts with examples and code snippets.
In the first part of a series of posts on the topic of Data Analysis with R, these are the things you can expect out of this post.
- What is R?
- Installing R and RStudio
- Orientation to RStudio
- Installation of packages (with & without internet connectivity)
What is R?
R is an open-source software environment that enables statistical computing and graphics. It was created in 1991 by Ross Ihaka and Robert Gentleman in New Zealand. For a detailed account of how R was developed from another language ‘S’, do watch the lecture from Prof. Roger Peng of John Hopkins University — Overview and History of R.
Installing R and RStudio
- Download and Install R from this website — https://www.r-project.org/
- RStudio is the most popular IDE that many prefer to use and makes the development of code and projects easier. Download RStudio from this website — https://www.rstudio.com/products/rstudio/download/
- Note: Install R before you install RStudio
Orientation to RStudio IDE
Upon opening RStudio for the first time, you should see the following window
- Console — Commands, and computations can be executed here
- Environment — All the variables currently used and their data type (integer/string/factor) can be seen here
- Files/Plots/Help— This section is mostly used to view plots or get help to understand the syntax of any command/package.
Useful commands/shortcuts
- To clear the console, click anywhere in the console and press — Ctrl+L
- To clear a selected variable (say ‘x’), type in the console — rm(x)
- To clear all the variables from memory — rm(list = ls())
- To know in which directory we are in — getwd()
- To set/change the working directory to ‘Downloads’ — setwd(“C:/Users/my_user/Downloads”)
Packages
Packages are the libraries or modules that perform common actions (like mean, sum, cor, etc.) by merely invoking them. As seen in the above screenshot, mean is a package that would compute the mean of a vector of numbers. In this case, mean package is part of base R and hence we don’t need to install or invoke it.
Historically, R was developed as statistical software. So, all the typical statistics like mean, median, t.test, anova, etc. are already available in R and don’t require installation. However, other packages need installing followed by invoking the library every time it’s used.
Installing packages — with internet
This scenario assumes a stable internet connection for installing the packages. Let us consider installing the package called tidyr as an example. There are two ways to install packages in R.
- On the RStudio Menu bar, go to Tools → Install Packages. Here, type the name of the package (tidyr) and click Enter
- On the RStudio Console, execute the following command
install.packages("tidyr")
Installing packages — without internet
During my last job, I had to face the situation of installing R packages with no internet connection on my computer. All our notebooks and desktops were cut off from the internet for security purposes. So, if you ever face this situation please follow these slightly longer steps.
- On an internet-enabled computer, Google the package name + cran (tidyr + cran)
- On the corresponding CRAN webpage, download the r-release of either Windows or Mac OS depending on the OS you’re using.
- In addition to tidyr package, we also need to download the dependent packages. On the same webpage, you would find Imports (as seen in the screenshot below). We need to open each of these packages (dplyr, ellipsis, glue, lifecycle, etc.) and download each of them and their corresponding dependencies as well.
4. Unzip all these downloaded files and transfer them (via flash drive or email) to the notebook without internet into the R library folder.
5. To find out the location of R’s library folder, type the following command in the console.
.libPaths()
6. Finally, you may invoke the library by using the following command. If there is no error, then the package is installed properly.
library(tidyr)
Conclusion
This post is like setting up the dining table for a yummy three-course meal — we are merely starting up. In the next post, let’s get started with the basics of R. In the meanwhile, do play around with RStudio’s Tool → Global Options → Appearance to change the appearance of RStudio IDE. You can even choose a Dark theme if you’re a fan of it. 😄