How to read csv files in Python (without Pandas)
Reading CSV files are a basic and important first step in getting data. In this article I will go over the basic Python read functions. This example will use the csv.reader from Python’s csv library. This story assumes you know how to access files using relative of absolute file paths. If you don’t please see my story here:
https://medium.com/@AIWatson/understanding-file-path-with-vscode-a7e92e62d7ec
Goal
I am a data scientist and I want to read in the iris dataset. I have an iris.csv here in my directory. Currently I have a folder called read_write_csv. Inside the read_write_csv folder there are 2 files, iris.csv, and read_write_csv.py
The Iris Dataset
Inside my read_write_csv.py, I would like to read it the data of iris.csv. This data set is made up of 3 flower species (setosa, versicolor, and virginica) and there are measurements (sepal_length, sepal_width, petal_length, petal_width) measured for 50 of each flower. There is a total of 150 measurements in this csv file (50 measurements x 3 flower species). The goal now is I want to analyze this in Python.
You can find the dataset here:
https://gist.github.com/CXWatson/576ee708b0c9f7004bf4b152f84607bb
Getting the csv file in Python
First, we open up the py file in your favorite editor of choice (mine is VS Code) and import csv. This is a built-in python library that will allow us to get the commands for reading csv files.
import csv
There are a few things that we have to tell our computer in order to get the data:
- Where the csv file is
- The name of the csv file
- If you want to read or write from the file
Explanation of these steps:
- Where the csv file is
You have to specify where your csv file is, whether in the current folder or in a subfolder. The computer has access to ALL your files so you need to specify which folder and file you want. In our scenario, we saved the csv file in the same directory(folder) as our read_write_csv.py file. This helps to simplify our file path when retrieving the csv file. - The name of the csv file
The next thing your computer needs is the name of the csv file. Your folder can contain multiple csv files with many different names. You have to differentiate the files for your computer by explicitly typing in the name of the file. NOTE: This is also the same reason you can’t have a file with the same name in the same folder. - If you want to read or write from the file
Your computer can only handle one read or write operation at a time; meaning it can only read OR write a file at any given time. The computer can’t do both operations at the same time, you have to specify which one you want it to do.
Reading the CSV in Python
Here is a snippet combining the things we just talked about:
import csvwith open("./iris.csv", "r") as csvfile:
reader_variable = csv.reader(csvfile, delimiter=",")
for row in reader_variable:
print(row)
Description of the code:
with open takes in 2 arguments, one being the file location and name (also called: the file path) and how you want to open the file (“r” for read, “w” for write).
as csvfile means you are going to open file and call it csvfile which is later on passed into an argument in csv.reader(csvfile, delimiter=”,”) on line 4. csvfile is the name we gave to the csvfile content. You can also name this whatever you want, as long as you pass in the same name into csv.reader() to keep it consistent to return the same results.
csv_reader(csvfile, delimiter=”,”) takes in the file that you named in this case csvfile, and it needs to also know how to separate the values. Typically they are commas because csv stands for (comma separated values).
reader_variable is just a variable we are using to store the csv.reader. We use this variable to write a for loop to print the information out.
Now we can write a for loop to print out all the rows of the read_variable.
The first 5 lines of the output should look like this:
['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
['5.1', '3.5', '1.4', '0.2', 'setosa']
['4.9', '3.0', '1.4', '0.2', 'setosa']
['4.7', '3.2', '1.3', '0.2', 'setosa']
['4.6', '3.1', '1.5', '0.2', 'setosa']
Notice every row iteration is a list and it works its way through the whole csv file and stops and the end of the csv file.
Here is the final code from github.
Other ways to read csv files
This story only shows one way of reading in a csv file using the csv library and the reader method. There is another method to read in csv files with just using with open. This method doesn’t require any imports.
CODE — READLINE:
with open("./iris.csv", "r") as csvfile:
first_line = csvfile.readline()
print(first_line)
OUTPUT:
sepal_length,sepal_width,petal_length,petal_width,species
Explanation: This way only requires two lines of code. It uses the with open paired with the as csvfile. At this point csvfile contains all the information of the csv file and all you have to do to get the information is to call the method readline(). Notice that readline() will only give back 1 line, which in our case is the first line of our csv file. I saved the csvfile.readline() as a variable called first_line because this will just read the first line.
To read all the lines you simply call readlines().
CODE — READLINES():
with open("./iris.csv", "r") as csvfile:
all_lines = csvfile.readlines()
print(all_lines)
OUTPUT:
['sepal_length,sepal_width,petal_length,petal_width,species\n', '5.1,3.5,1.4,0.2,setosa\n', '4.9,3.0,1.4,0.2,setosa\n', '4.7,3.2,1.3,0.2,setosa\n', '4.6,3.1,1.5,0.2,setosa\n', '5.0,3.6,1.4,0.2,setosa\n', '5.4,3.9,1.7,0.4,setosa\n', '4.6,3.4,1.4,0.3,setosa\n', '5.0,3.4,1.5,0.2,setosa\n', '4.4,2.9,1.4,0.2,setosa\n', '4.9,3.1,1.5,0.1,setosa\n', '5.4,3.7,1.5,0.2,setosa\n', '4.8,3.4,1.6,0.2,setosa\n', '4.8,3.0,1.4,0.1,setosa\n', '4.3,3.0,1.1,0.1,setosa\n', '5.8,4.0,1.2,0.2,setosa\n', '5.7,4.4,1.5,0.4,setosa\n', '5.4,3.9,1.3,0.4,setosa\n',
'5.1,3.5,1.4,0.3,setosa\n', '5.7,3.8,1.7,0.3,setosa\n', '5.1,3.8,1.5,0.3,setosa\n', '5.4,3.4,1.7,0.2,setosa\n', '5.1,3.7,1.5,0.4,setosa\n', '4.6,3.6,1.0,0.2,setosa\n', '5.1,3.3,1.7,0.5,setosa\n', '4.8,3.4,1.9,0.2,setosa\n', '5.0,3.0,1.6,0.2,setosa\n', '5.0,3.4,1.6,0.4,setosa\n', '5.2,3.5,1.5,0.2,setosa\n', '5.2,3.4,1.4,0.2,setosa\n', '4.7,3.2,1.6,0.2,setosa\n', '4.8,3.1,1.6,0.2,setosa\n', '5.4,3.4,1.5,0.4,setosa\n', '5.2,4.1,1.5,0.1,setosa\n', '5.5,4.2,1.4,0.2,setosa\n', '4.9,3.1,1.5,0.1,setosa\n', '5.0,3.2,1.2,0.2,setosa\n', '5.5,3.5,1.3,0.2,setosa\n', '4.9,3.1,1.5,0.1,setosa\n', '4.4,3.0,1.3,0.2,setosa\n', '5.1,3.4,1.5,0.2,setosa\n', '5.0,3.5,1.3,0.3,setosa\n', '4.5,2.3,1.3,0.3,setosa\n', '4.4,3.2,1.3,0.2,setosa\n', '5.0,3.5,1.6,0.6,setosa\n', '5.1,3.8,1.9,0.4,setosa\n', '4.8,3.0,1.4,0.3,setosa\n', '5.1,3.8,1.6,0.2,setosa\n', '4.6,3.2,1.4,0.2,setosa\n', '5.3,3.7,1.5,0.2,setosa\n', '5.0,3.3,1.4,0.2,setosa\n', '7.0,3.2,4.7,1.4,versicolor\n', '6.4,3.2,4.5,1.5,versicolor\n', '6.9,3.1,4.9,1.5,versicolor\n', '5.5,2.3,4.0,1.3,versicolor\n', '6.5,2.8,4.6,1.5,versicolor\n', '5.7,2.8,4.5,1.3,versicolor\n', '6.3,3.3,4.7,1.6,versicolor\n', '4.9,2.4,3.3,1.0,versicolor\n', '6.6,2.9,4.6,1.3,versicolor\n', '5.2,2.7,3.9,1.4,versicolor\n', '5.0,2.0,3.5,1.0,versicolor\n', '5.9,3.0,4.2,1.5,versicolor\n', '6.0,2.2,4.0,1.0,versicolor\n', '6.1,2.9,4.7,1.4,versicolor\n', '5.6,2.9,3.6,1.3,versicolor\n', '6.7,3.1,4.4,1.4,versicolor\n', '5.6,3.0,4.5,1.5,versicolor\n', '5.8,2.7,4.1,1.0,versicolor\n', '6.2,2.2,4.5,1.5,versicolor\n', '5.6,2.5,3.9,1.1,versicolor\n', '5.9,3.2,4.8,1.8,versicolor\n', '6.1,2.8,4.0,1.3,versicolor\n', '6.3,2.5,4.9,1.5,versicolor\n', '6.1,2.8,4.7,1.2,versicolor\n', '6.4,2.9,4.3,1.3,versicolor\n', '6.6,3.0,4.4,1.4,versicolor\n', '6.8,2.8,4.8,1.4,versicolor\n', '6.7,3.0,5.0,1.7,versicolor\n', '6.0,2.9,4.5,1.5,versicolor\n', '5.7,2.6,3.5,1.0,versicolor\n', '5.5,2.4,3.8,1.1,versicolor\n', '5.5,2.4,3.7,1.0,versicolor\n', '5.8,2.7,3.9,1.2,versicolor\n',
'6.0,2.7,5.1,1.6,versicolor\n', '5.4,3.0,4.5,1.5,versicolor\n', '6.0,3.4,4.5,1.6,versicolor\n', '6.7,3.1,4.7,1.5,versicolor\n', '6.3,2.3,4.4,1.3,versicolor\n', '5.6,3.0,4.1,1.3,versicolor\n', '5.5,2.5,4.0,1.3,versicolor\n', '5.5,2.6,4.4,1.2,versicolor\n', '6.1,3.0,4.6,1.4,versicolor\n', '5.8,2.6,4.0,1.2,versicolor\n', '5.0,2.3,3.3,1.0,versicolor\n', '5.6,2.7,4.2,1.3,versicolor\n', '5.7,3.0,4.2,1.2,versicolor\n', '5.7,2.9,4.2,1.3,versicolor\n', '6.2,2.9,4.3,1.3,versicolor\n', '5.1,2.5,3.0,1.1,versicolor\n', '5.7,2.8,4.1,1.3,versicolor\n', '6.3,3.3,6.0,2.5,virginica\n', '5.8,2.7,5.1,1.9,virginica\n', '7.1,3.0,5.9,2.1,virginica\n', '6.3,2.9,5.6,1.8,virginica\n', '6.5,3.0,5.8,2.2,virginica\n', '7.6,3.0,6.6,2.1,virginica\n', '4.9,2.5,4.5,1.7,virginica\n', '7.3,2.9,6.3,1.8,virginica\n', '6.7,2.5,5.8,1.8,virginica\n', '7.2,3.6,6.1,2.5,virginica\n', '6.5,3.2,5.1,2.0,virginica\n', '6.4,2.7,5.3,1.9,virginica\n', '6.8,3.0,5.5,2.1,virginica\n', '5.7,2.5,5.0,2.0,virginica\n', '5.8,2.8,5.1,2.4,virginica\n', '6.4,3.2,5.3,2.3,virginica\n', '6.5,3.0,5.5,1.8,virginica\n', '7.7,3.8,6.7,2.2,virginica\n', '7.7,2.6,6.9,2.3,virginica\n', '6.0,2.2,5.0,1.5,virginica\n', '6.9,3.2,5.7,2.3,virginica\n', '5.6,2.8,4.9,2.0,virginica\n', '7.7,2.8,6.7,2.0,virginica\n', '6.3,2.7,4.9,1.8,virginica\n', '6.7,3.3,5.7,2.1,virginica\n', '7.2,3.2,6.0,1.8,virginica\n',
'6.2,2.8,4.8,1.8,virginica\n', '6.1,3.0,4.9,1.8,virginica\n', '6.4,2.8,5.6,2.1,virginica\n', '7.2,3.0,5.8,1.6,virginica\n', '7.4,2.8,6.1,1.9,virginica\n', '7.9,3.8,6.4,2.0,virginica\n', '6.4,2.8,5.6,2.2,virginica\n', '6.3,2.8,5.1,1.5,virginica\n', '6.1,2.6,5.6,1.4,virginica\n', '7.7,3.0,6.1,2.3,virginica\n', '6.3,3.4,5.6,2.4,virginica\n', '6.4,3.1,5.5,1.8,virginica\n', '6.0,3.0,4.8,1.8,virginica\n', '6.9,3.1,5.4,2.1,virginica\n', '6.7,3.1,5.6,2.4,virginica\n', '6.9,3.1,5.1,2.3,virginica\n', '5.8,2.7,5.1,1.9,virginica\n', '6.8,3.2,5.9,2.3,virginica\n', '6.7,3.3,5.7,2.5,virginica\n', '6.7,3.0,5.2,2.3,virginica\n', '6.3,2.5,5.0,1.9,virginica\n', '6.5,3.0,5.2,2.0,virginica\n', '6.2,3.4,5.4,2.3,virginica\n', '5.9,3.0,5.1,1.8,virginica\n']
Explanation: This prints out a raw output with everything in the csvfile. It is raw because there is no special spacing or format to it. You can also notice there is a \n at the end of every line. This \n is a special character that stands for new line. This means when the computer processes the raw data, there should be a new line printed at this point.
Conclusion
We looked at 2 major approaches in reading csv files. One uses csv.reader() from the python csv library. The other one uses with open and as csvfile. Python’s csv library returns a more iterative way to read in data with a for loop. This is more helpful for making calculations and doing analysis. Using the with open and as csvfile returns a raw output. This method might be better if you just wanted a block of data to be read in.