Data Analysis with R — Part 4 (Reading data from Excel/CSV)

Srinath Sridharan
3 min readJan 15, 2022

“We are surrounded by data, but starved for insights” — Jay Baer

Photo by Carlos Muza on Unsplash

This is the fourth part of a series of posts on the topic of Data Analysis with R. Here are the links to the previous parts.

Data Analysis with R — Part 1 (Getting Started)

Data Analysis with R — Part 2 (Basics of R Programming)

Data Analysis with R— Part 3 (Control Flow)

These are the things you can expect out of this post.

  • Reading data from CSV files
  • Reading data from Excel files

Introduction

The majority of data these days is stored in databases, either on the cloud or on-premises. For purposes of analysis, sharing, etc, subsets of the data are stored in MS Excel or in Comma Separated Value (CSV) files. So, it’s quintessential to learn to read data from these files into R and eventually write our R script’s output back into Excel/CSV files.

CSV files

The base packages in R are capable of reading CSV files into R. There is no need to download any package for this. This is how we import a CSV file named “my_data.csv” into R.

#1. Simple
csv_data = read.csv("data_file_name.csv")
# Notes- the variable csv_data will be stored as a data frame data type.
# 2. Additional arguments
csv_data1 = read.csv("my_data.csv", sep = ",", na.strings = c("", "Missing"), skip = 1)
# Notes
# a. sep - this argument is to tell R what's the field separator character in our data. The default separator is comma
# b. na.strings - (character or character vector) telling R how is missing data coded in the data. Here in my data misssing data can be either a blank cell ("") or can be coded as "Missing".
# c. skip - (integer) tells R how lines to skip at the start before reading the data. Some people practice leaving a few blank lines at the top before the data.

Excel files

Unlike CSV files, R cannot read in Excel files without downloading additional packages. There are quite a few packages to read data Excel files into R — openxlsx, readxl, xlsx, etc.). My personal favorite is openxlsx and we will be seeing that in detail in this post and beyond.

First, install openxlsx package by invoking the install.packages command

install.packages("openxlsx")

Then invoke the library and read the excel file, as follows.

library(openxlsx)# 1. Simple
excel_data = read.xlsx("my_data.xlsx")
# 2. With additional arguments
excel_data1 = read.xlsx("my_data.xlsx", detectDates = T, sheet = "Data", startRow = 3, na.strings = "Missing", skipEmptyRows = True)
# Notes
# a. detectDates- (True or False) this argument will tell R to detect dates in the data and store it in "Date" format
# b. sheet- (character) tells R the name of Sheet in the Excel file where the data is stored.You can only import data one sheet at a time.
# c. na.strings - (character or character vector) telling R how is missing data coded in the data. Here in my data misssing data can be either a blank cell ("").
# c. skipEmptyRows - (True/False) an integer that tells R to skip empty rows after the 1st row of data. If there are any blank rows in the data, they will be skipped, thereby making data cleaning easier.

Conclusions

In this post we have seen how to load data from CSV and Excel files into R. This is a crucial aspect of data analysis that every analyst needs to know. Here, I’ve explained one package (openxlsx) that helps to import data in Excel file into R. Feel free to explore the other packages, mentioned above, and choose the one that works best for you.

The post is like setting up a tent for camping — the actual fun starts only now. In the next post, we will be learning about data processing via a popular package called DPLYR. I’m excited about that post. 😃

In the meanwhile, just create a sample Excel/CSV file or use any existing files on your computer and try loading it into R. I will reiterate the reading of files Excel/CSV files in the next post.

--

--

Srinath Sridharan

Data Enthusiast | Healthcare Aficionado | Digital Consultant