Predictive Power Score Implementation in Python

@lee-rowe
Geek Culture
Published in
5 min readSep 7, 2021

--

Photo by Rhett Wesley on Unsplash

The PPS or ppscore library is an asymmetric, data-type-agnostic score that can detect linear or non-linear relationships between two columns. The score ranges from 0 (no predictive power) to 1 (perfect predictive power). It can be used as an alternative to the correlation (matrix). In this blog I will discuss how to use the library as well as give my thoughts on it’s functions. To get started you will want to enter the following code found in the cell below.

pip install -U ppscore

Next let’s start by generating some sample data to work with so we can get some visualizations to look at.

import pandas as pd
import numpy as np
import ppscore as pps

df = pd.DataFrame()
df["x"] = np.random.uniform(-2, 2, 1_000_000)
df["error"] = np.random.uniform(-0.5, 0.5, 1_000_000)
df["y"] = df["x"] * df["x"] + df["error"]

We can use a panda’s .head() function to get a look at what the dataframe were using here contains.

Now that we’ve done that we can begin to dig in the functions of this library, first we will work with testing the x and y variables.

--

--