Using Patsy for Statistical Modeling

Mark Mummert
2 min readApr 10, 2017

--

Python has a very useful library called Patsy that lets programmers create formulas to use with statsmodels regressions. These formulas allow us to be specific about how we want statsmodel to generate a model and can be used to easily create interaction terms. Patsy will also efficiently generate dummy variables for regression models.

Interaction Terms

Data sets often contain variables that are dependent — two or more features describe the same thing and are therefore not independent of each other. Unfortunately, for linear regression X variables should be independent of each other if our model is to be effective. Including an interaction term can account for some of this interaction if we want to include both when creating our model.

We can create interaction terms by multiplying the two variables and the result to another column to the pandas dataframe we’re using for our model, but that can quickly get complicated if we want to account for multiple variables and interactions.

Creating an interaction term with pandas

This is where Patsy comes in to make our job much easier. For a linear regression we can represent the relationship between two variables (A and B) and what we’re trying to predict (y) with y~ A + B + A:B.

In the case of the example above, we the result would look like:

A:B is the interaction term, and we can add as many of these as we like for multiple variables. Below is how this actually looks in python:

Generating Matrices

Patsy can also be used to generate matrices that describe the relationships between variables. A simple y =mx +b linear model will produce two matrices : one with the values of y and one with the variables on the right side of the equation plus an optional intercept that is just a column of 1s.

For more information…

Patsy does has surprisingly good documentation that can be found at http://patsy.readthedocs.io/en/latest/index.html if you’d like more information about its other uses.

--

--