Distribution Fitting with Python SciPy

Amirarsalan Rajabi
5 min readJun 2, 2020

You have a datastet, a repeated measurement of a variable, and you want to know which probability distribution this variable might come from. Fitting your data to the right distribution is valuable and might give you some insight about it. SciPy is a Python library with many mathematical and statistical tools ready to be used and applied to your data. You can find the whole code HERE .

In order to start the task, we take Dow Jones Index Data Set as an example which is publicly available here¹. after downloading the dataset, you can import the data with Pandas:

import pandas as pd
import matplotlib.pyplot as plt
from six.moves import urllib
import zipfile
from scipy import stats
urllib.request.urlretrieve("http://archive.ics.uci.edu/ml/machine-learning-databases/00312/dow_jones_index.zip","file.zip")
zipfile.ZipFile("file.zip").extractall()
df = pd.read_csv("dow_jones_index.data")
df.head()

This is a weekly dataset of Dow Jones Index. Those dollar signs are getting on my nerves, so I delete them by applying a function to my pandas columns:

def omit_s(x):
return x[1:]

--

--