Installing pandas

Reading CSV and XLSX files in the simplest way

6 min readNov 14, 2023

Python and pandas, where the former are a general-purpose programming language used in various fields and the later is one of the first library mainly used for data manipulation and analysis, are two difference things and their differences are sometimes confusing beginners in the Python ecosystem (Abba, 2023). The pandas is an open source Python package, that is built on top of another package name NumPy, providing support for multi-dimensional arrays and works well with many other data science module inside the Python ecosystem (Suhani, 2020). In general it provides two types of data structure for manipulating data, which are Series dan DataFrame (nikhilaggarwal3, 2023). There are also alternatives for pandas as Python package, e.g. Polars and Vaex, that offers similar functionality (Andersen, 2023). If you are considering becoming a data scientist, then learning Python Pandas is a great first step (Mitchell, 2023). Here how to install pandas and use it are given in brief.

Package installation

The easiest and recommended way to set up your environment to use pandas is using Anaconda as given on pandas website, but here pandas will be installed view pip from PyPI, where the latest version is 2.1.3 released three days ago, Nov 11, 2023. So, it is still fresh 😊.

Open your working directory, where in my case is D:\Python. In the folder there is already python.lnk, a shortcut for Windows cmd. Double click the shortcut to open Windows command prompt.

Type

pip show pandas

to see whether you have already pandas package installed on your system.

D:\python>pip show pandas
WARNING: Package(s) not found: pandas

It shows that the pandas package has not been installed. Then type

pip install pandas

to install it.

D:\python>pip install pandas
Collecting pandas
  Downloading pandas-2.1.3-cp312-cp312-win_amd64.whl.metadata (18 kB)
Requirement already satisfied: numpy<2,>=1.26.0 in c:\users\sparisoma viridi\appdata\local\programs\python\python312\lib\site-packages (from pandas) (1.26.0)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\sparisoma viridi\appdata\local\programs\python\python312\lib\site-packages (from pandas) (2.8.2)
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2023.3.post1-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.1 (from pandas)
  Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 341.8/341.8 kB 1.6 MB/s eta 0:00:00Requirement already satisfied: six>=1.5 in c:\users\sparisoma viridi\appdata\local\programs\python\python312\lib\site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Downloading pandas-2.1.3-cp312-cp312-win_amd64.whl (10.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.5/10.5 MB 2.9 MB/s eta 0:00:00Downloading pytz-2023.3.post1-py2.py3-none-any.whl (502 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 502.5/502.5 kB 2.6 MB/s eta 0:00:00Installing collected packages: pytz, tzdata, pandas
Successfully installed pandas-2.1.3 pytz-2023.3.post1 tzdata-2023.3

Three packages are installed

pandas-2.1.3 — released on Nov 11, 2023.
pytz-2023.3.post1 — released on Sep 5, 2023,
tzdata-2023.3 — released Mar 29, 2023.

Check it again using pip as previously shown.

D:\python>pip show pandas
Name: pandas
Version: 2.1.3
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author:
Author-email: The Pandas Development Team <pandas-dev@python.org>
License: BSD 3-Clause License

Copyright (c) 2008-2011, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
All rights reserved.

Copyright (c) 2011-2023, Open source contributors.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
  contributors may be used to endorse or promote products derived from  this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Location: C:\Users\Sparisoma Viridi\AppData\Local\Programs\Python\Python312\Lib\site-packages
Requires: numpy, python-dateutil, pytz, tzdata
Required-by:

Input files

Before we can test pandas feature in reading CSV and XLSX files, the files should be created first. Using mouse right click do the following action.

Create a text document, e.g. data_1.txt and add following lines

x,y
0,0
1,1
2,4
3,9
4,16
5,25
6,36
7,49
8,64
9,81
10,100
11,121
12,144
13,169
14,196
15,225

using plain text editor, e.g. Notepad++.

Complete typing the other lines, and then save it with CTRL+S. Close the application.

Rename it to data_1.csv.

Press Yes for confirmation.

Next step it to create XLSX file. Right click with the mouse as follow.

Name the file data_2.xlsx.

Click its icon to open it using Microsoft Excel.

save it and close the application.

Now there are two files data_1.csv and data_2.xlsx, which are ready to be read with pandas.

Reading CSV file

Using pandas, if only want to see the content, it requires only three lines as follow

import pandas as pd
df = pd.read_csv('data_1.csv')
print(df)

saved as read_csv.py, which produces

$ py .\read_csv.py
     x    y
0    0    0
1    1    1
2    2    4
3    3    9
4    4   16
5    5   25
6    6   36
7    7   49
8    8   64
9    9   81
10  10  100
11  11  121
12  12  144
13  13  169
14  14  196
15  15  225

You can also convert each column of DataFrame to List as in following read_csv.colum.py

import pandas as pd
df = pd.read_csv('data_1.csv')

h = [*df.columns]
print('h =', h)

col0 = [*df.get(h[0])]
print(h[0], '=', col0)

col1 = [*df.get(h[1])]
print(h[1], '=', col1)

which gives

py .\read_csv_column.py
h = ['x', 'y']
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
y = [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225]

From the last example you can use the List for further analyzing and plotting.

Reading XLS file

In general the syntax is similar while reading the CSV file as in following read_xlsx.py

import pandas as pd
df = pd.read_excel('data_2.xlsx')

h = [*df.columns]
print('h =', h)

col0 = [*df.get(h[0])]
print(h[0], '=', col0)

col1 = [*df.get(h[1])]
print(h[1], '=', col1)

which produces

$ py .\read_xlsx_column.py
h = ['x', 'y']
x = [0, 1, 2, 3, 4, 5]
y = [0, 1, 8, 27, 64, 125]

Notice that the simplest way is just substitute previous pd.read_csv() with pd.read_excel().

From this story it can be summarized that

The pandas package is imported using import pandas as pd.
Input file filename is read and save to DataFrame df For reading CSV file df = pd.read_csv(filename),
For reading XLSX file df = pd.read_excel(filename).
Convert 2nd column from a DataFrame dfto a List col = [*df.get(df.columns[1])], notice that 1st column is with index 0.