Calculate graph similarities with Root Mean Square (RMS)

wsh
Python Data Analysis
4 min readAug 20, 2021

Python Data Analysis (PDA5)

Python Financial Analysis | Home

List of articles

1. Python Financial Analysis

1 Read fundamental data from a CSV in Python
2 Handling table like data in Python with DataFrame
3 Make graphs of stock price in Python
4.1 Make custom market index — prerequisites
4.2 Make custom market index — make your own index
4.3 Make custom market index — market cap based index
5.1 Analyze COVID-19 Impacts by Sector in Python — compare weighted average prices
5.2 Analyze COVID-19 Impacts by Market Caps in Python — compare weighted average prices
5.3 Find companies that lost or gained from the COVID19 pandemic

2. Python Data Analysis (easiest ways)

Python “datetime” in the easiest way (how to handle dates in data science with Python)
Python DataFrame slicing in the easiest way (How to find a company from 5000 companies)
Linear regression on time series data like stock price (fit a line on data)

What is Root Mean Square (RMS)

It’s quite common in data science that we want to measure how much two graphs are close in position and shape. In that case, data scientists usually use a mathematical method called Root Mean Square (RMS) or another method called Fourier Transform. The former one is really straightforward, because it calculates the differences of two graphs directly from their data points. I’ll introduce the latter in the future article.

RMS measures the average error or the graphs. Here, “the average” is a point. If we simply take the sum of error, then the result gets larger if the range of x of graph is larger.

If we use the fact that RMS measures how much two graphs are different, we can use it as a measure of the graph similarity by taking the inverse of RMS. Here in this story, we define the similarity as 1/RMS.

How RMS works

The image above is an example of two graphs defined by functions f(x) and g(x). In programming or computer science, they are always discreate, meaning they are just arrays of data points f1, f2, … or g1, g2, … RMS first subtracts their graphs and then take their square. One of the reasons why we need to take their square is because negative error must be counted in positive. After taking the average of the squared error, we apply square root on it to cancel the side-effect of the square.

Root Mean Square (Wikipedia)

RMS Error (Stanford Univ Lecture Note)

Let’s do it in Python

1. Import packages and read dataset

As we do in other articles, we use “numpy”, “matplotlib”, and “pandas”. If you don’t know how to setup the python environment and hot to read a CSV file, the following story will help you:

PFA1 Read fundamental data from a CSV in Python (Python Financial Analysis)

Before writing the code, download the example dataset form here (Google Drive):
https://drive.google.com/file/d/18E8_9Vaq-0ca062C_etE9RQAYOAhhUtZ/view?usp=sharing

2. Preprocess data

We use four graphs f, g, h, and i as examples. The first function “f” is the base function, which is a line, and the other 3 functions noisy, swinging graphs. The operation “.values” is necessary to convert a Series into a numpy array. If you don’t understand what’s written on the following code, please read these articles first:
Handling table-like data in Python with DataFrame (Python Financial Analysis)
Python DataFrame slicing in the easiest way (How to find a company from 5000 companies)

3. Calculate RMS and similarities

Finally we calculate RMS and similarity of each pair of function. We have three pairs (f, g), (f, h), and (f, i). Remember, the function f is a baseline linear function, and the other three functions are targets of evaluation.

4. “financialanalysis” does everything for you

If you don’t’ want to write code from scratch, install the package “financialanalysis” and use its function “RMSError()”. You can install it with the command “pip install financialanalysis”.

5. Display graphs and results

If you don’t understand what’s written on the code block below, please read this story:
Make graphs of stock price in Python (Python Financial Analysis)

Full Python code

You can download the dataset “example_data_rms_demo.csv” from here:
https://drive.google.com/file/d/18E8_9Vaq-0ca062C_etE9RQAYOAhhUtZ/view?usp=sharing

Other Links

Python Financial Analysis | Home
Python Data Analysis | Home

--

--