#2 Handling table-like data in Python with DataFrame (Python Financial Analysis)
#2 DataFrame Basics
Python Financial Analysis | Home
Introduction
In the last story, we saw how to setup the environments for writing Python programs, and how to write a Python code to read a CSV file “meta.csv”, which is company information including ticker, sector, industry, exchange, and market caps. Our motivation for financial analysis in Python is that we want to automate things, witch can’t be done with Excel.
Following the last story, we are going to see how to use DataFrame in Python. Remember, the DataFrame is a data type in Python for handling table-like data. In our example, it is the table read from the CSV file “meta.csv”. DataFrame is one of the most important tool that data scientists use. So I have to introduce it before actually doing financial analysis.
You can download the data set from this link
https://drive.google.com/drive/folders/1Ux2u1s5mctYiywS08sv7_3_PbnWd8v0G?usp=sharing
PDF document of this story is also available
https://drive.google.com/drive/folders/15q64knsDYErXiPyy-jK9ViSVN4ysDaoR?usp=sharing
List of articles
1. Python Financial Analysis
1 Read fundamental data from a CSV in Python
2 Handling table like data in Python with DataFrame
3 Make graphs of stock price in Python
4.1 Make custom market index — prerequisites
4.2 Make custom market index — make your own index
4.3 Make custom market index — market cap based index
5.1 Analyze COVID-19 Impacts by Sector in Python — compare weighted average prices
5.2 Analyze COVID-19 Impacts by Market Caps in Python — compare weighted average prices
5.3 Find companies that lost or gained from the COVID19 pandemic
2. Python Data Analysis Basics (easiest ways)
Python “datetime” in the easiest way (how to handle dates in data science with Python)
Python DataFrame slicing in the easiest way (How to find a company from 5000 companies)
DataFrame Basics
0. Import packages
We need to import packages to use their tools. I introduced how to install Python packages, but if you don’t know how to install a package, you can use the following command.
> pip install package_name
It’s common to assign aliases to package names (after “as”), because we don’t type them every time we use that packages.
1. Read a CSV file
As we did in the last story, we first read the CSV file “meta.csv”. The returned object is a DataFrames that has several columns, like “ticker” or “market_cap”.
2. Get a column
The DataFrame “meta” has several columns. But I we want to access just one column of the table, we can do it just like the following example. We give the name of ticker as a string to the DataFrame “meta” with square brackets.
3. Get the 10-th row
The DataFrame “meta” has several rows, where each corresponds to one company. The are about 5200 companies listed on the US markets. This means the table “meta” has that number of rows.
If we want to get the 10-th row of the table, for example, we use the “.iloc” of “meta”. We just give the number of row (index) to that. But note here that the index stats from 0. Thus the 10-th row has an index 9.
4. Get the 10-th ticker
This is example of a combination of two operations. In practical data analysis, it’s quite common to combine multiple DataFrame operations to extract data we want.
If we want to get the 10-th ticker, for example, we fist extract the “ticker” column out of “meta”, which makes another DataFrame that contains just the “ticker” column. Then we get the 10-th row of the table.
5. Find companies with market cap > $1T
This is a little bit practical example. If we want to fine companies whose market caps are greater than $1T dollars, we give meta a condition (booleans) to the DataFrame. We first get the column “market_cap” and specify that these values must be greater than 1e12. It return True for entries (rows) that satisfy the condition, while it returns False to those that don’t. This is also a DataFrame.
Then we give it to “meta” with square brackets. It returns rows that satisfy the condition, which in turn, are companies with market cap over $1T.
You can see the results on the video above.
6. Sort the table
This is another practical example. We can sort the table by a column. In this example, we sort the table by the market cap in descending order. It means that most valuable companies comes at the top of the table. We use the built-in function “sort_values()” of DataFrame. We pass “market_cap” to the argument “by”, and “False” to “ascending”. You can see the result on the video.
7. Get most 5 valuable companies
This is another example of a combination of multiple operations. Using the results of the last example, we can find the most 5 valuable companies by clipping the first 5 row from the table sorted by the market cap. For this kind operations, we usually use the semicolon “:” in Python. If we specify “n:m”, for example, it means we clips the table from the n-th row to (m-1)-th row. Note, the last m-th row is excluded. This is common in programming. Also, we can omit 0 if it begins from 0. Thus “0:n” is equivalent to “:n”.
8. Mathematical operations (and adding a column)
It’s common in data analysis to apply mathematical operations on numerical values. I said in the first story that we use NumPy for math things. Although I give some examples of NumPy functions, you can pass this section, because we would see this kind of things in the future stories. You don’t need to remember them.
This is an example of applying square root on the market caps (I know it’s meaningless, just as an example). We pull off the function “sqrt()” from “np”, which is an alias to the “numpy” package.
In addition to that, we save the results as a new column of “meta”. There is no operation needed to make a new column. You can just access it as like it already exists. The new column name is “sqrt_market_cap”.
The other examples are the “sum()” function, “mean()” function, and “std()” function. We use them if we want to apply statistical analysis. Again, you don’t need to remember them.
9. Save a DataFrame as CSV
Because we added a column to the DataFrame “meta”, we want to save it as another CSV file. Let’s name it “meta_new.csv”. To save it, we use the “to_csv()” function of DataFrame.