#2 Handling table-like data in Python with DataFrame (Python Financial Analysis)

wsh
Python Financial Analysis
5 min readJul 31, 2021

#2 DataFrame Basics

Python Financial Analysis | Home

Introduction

In the last story, we saw how to setup the environments for writing Python programs, and how to write a Python code to read a CSV file “meta.csv”, which is company information including ticker, sector, industry, exchange, and market caps. Our motivation for financial analysis in Python is that we want to automate things, witch can’t be done with Excel.

Following the last story, we are going to see how to use DataFrame in Python. Remember, the DataFrame is a data type in Python for handling table-like data. In our example, it is the table read from the CSV file “meta.csv”. DataFrame is one of the most important tool that data scientists use. So I have to introduce it before actually doing financial analysis.

You can download the data set from this link
https://drive.google.com/drive/folders/1Ux2u1s5mctYiywS08sv7_3_PbnWd8v0G?usp=sharing

PDF document of this story is also available
https://drive.google.com/drive/folders/15q64knsDYErXiPyy-jK9ViSVN4ysDaoR?usp=sharing

List of articles

1. Python Financial Analysis

1 Read fundamental data from a CSV in Python
2 Handling table like data in Python with DataFrame
3 Make graphs of stock price in Python
4.1 Make custom market index — prerequisites
4.2 Make custom market index — make your own index
4.3 Make custom market index — market cap based index
5.1 Analyze COVID-19 Impacts by Sector in Python — compare weighted average prices
5.2 Analyze COVID-19 Impacts by Market Caps in Python — compare weighted average prices
5.3 Find companies that lost or gained from the COVID19 pandemic

2. Python Data Analysis Basics (easiest ways)

Python “datetime” in the easiest way (how to handle dates in data science with Python)
Python DataFrame slicing in the easiest way (How to find a company from 5000 companies)

DataFrame Basics

0. Import packages

We need to import packages to use their tools. I introduced how to install Python packages, but if you don’t know how to install a package, you can use the following command.

> pip install package_name

It’s common to assign aliases to package names (after “as”), because we don’t type them every time we use that packages.

Full Python code is available below

1. Read a CSV file

As we did in the last story, we first read the CSV file “meta.csv”. The returned object is a DataFrames that has several columns, like “ticker” or “market_cap”.

Full Python code is available below

2. Get a column

The DataFrame “meta” has several columns. But I we want to access just one column of the table, we can do it just like the following example. We give the name of ticker as a string to the DataFrame “meta” with square brackets.

Full Python code is available below

3. Get the 10-th row

The DataFrame “meta” has several rows, where each corresponds to one company. The are about 5200 companies listed on the US markets. This means the table “meta” has that number of rows.
If we want to get the 10-th row of the table, for example, we use the “.iloc” of “meta”. We just give the number of row (index) to that. But note here that the index stats from 0. Thus the 10-th row has an index 9.

Full Python code is available below

4. Get the 10-th ticker

This is example of a combination of two operations. In practical data analysis, it’s quite common to combine multiple DataFrame operations to extract data we want.
If we want to get the 10-th ticker, for example, we fist extract the “ticker” column out of “meta”, which makes another DataFrame that contains just the “ticker” column. Then we get the 10-th row of the table.

Full Python code is available below

5. Find companies with market cap > $1T

This is a little bit practical example. If we want to fine companies whose market caps are greater than $1T dollars, we give meta a condition (booleans) to the DataFrame. We first get the column “market_cap” and specify that these values must be greater than 1e12. It return True for entries (rows) that satisfy the condition, while it returns False to those that don’t. This is also a DataFrame.
Then we give it to “meta” with square brackets. It returns rows that satisfy the condition, which in turn, are companies with market cap over $1T.

You can see the results on the video above.

Full Python code is available below

6. Sort the table

This is another practical example. We can sort the table by a column. In this example, we sort the table by the market cap in descending order. It means that most valuable companies comes at the top of the table. We use the built-in function “sort_values()” of DataFrame. We pass “market_cap” to the argument “by”, and “False” to “ascending”. You can see the result on the video.

Full Python code is available below

7. Get most 5 valuable companies

This is another example of a combination of multiple operations. Using the results of the last example, we can find the most 5 valuable companies by clipping the first 5 row from the table sorted by the market cap. For this kind operations, we usually use the semicolon “:” in Python. If we specify “n:m”, for example, it means we clips the table from the n-th row to (m-1)-th row. Note, the last m-th row is excluded. This is common in programming. Also, we can omit 0 if it begins from 0. Thus “0:n” is equivalent to “:n”.

Full Python code is available below

8. Mathematical operations (and adding a column)

It’s common in data analysis to apply mathematical operations on numerical values. I said in the first story that we use NumPy for math things. Although I give some examples of NumPy functions, you can pass this section, because we would see this kind of things in the future stories. You don’t need to remember them.

This is an example of applying square root on the market caps (I know it’s meaningless, just as an example). We pull off the function “sqrt()” from “np”, which is an alias to the “numpy” package.

In addition to that, we save the results as a new column of “meta”. There is no operation needed to make a new column. You can just access it as like it already exists. The new column name is “sqrt_market_cap”.

Full Python code is available below

The other examples are the “sum()” function, “mean()” function, and “std()” function. We use them if we want to apply statistical analysis. Again, you don’t need to remember them.

Full Python code is available below

9. Save a DataFrame as CSV

Because we added a column to the DataFrame “meta”, we want to save it as another CSV file. Let’s name it “meta_new.csv”. To save it, we use the “to_csv()” function of DataFrame.

Full Python code is available below

Full Python code

Other Links

Python Financial Analysis | Home
Python Data Analysis | Home

--

--