Modeling in Python
I’ve been working on learning the linear modeling process in python, so I made this quick guide to help myself and possibly others see how the process works in code.
Building the model
First, we start with a data frame:
Then we’ll use scikitlearn to split out a testing set for later:
We will build our model using the testing set. First we establish that we are using a linear model:
We use lm.fit to to create the model. Since the we are leaving aside our testing set for later we will only use the training set to actually build the model:
Now that we have our model, assigned to ‘model’ we can do some testing to measure it’s effectiveness at predicting y values for new new x values.
We will tell it to use our X testing data to predict our y data.
Now we can plot our results to see how the model — this will show the true values of y on the x-axis, and the predicted values on the y axis.
Finally, we can score our model:
Another form of testing we can do evaluate our model is k-fold testing.
K-fold testing will split the dataframe into k number of test groups and use each one to test our model against the others.
The code for this is fortunately very simple, and only takes one line to do such a large amount of work.
This returns a number score for each k-fold test we run.
By using these techniques we can generate models that will help us predict information in the real world, assuming that all of the assumptions of the linear model have been met.