Solving the famous “House Prices” Competition from Kaggle using SQL Queries on MindsDB

Eren Akbulut
Geeks-of-Data
Published in
4 min readMar 11, 2023

Before starting the article, I want to mention our “Geeks of Data” Discord channel. You can join and say hello, and exchange ideas about data science, engineering, or analysis fields.🚀 Link

In this tutorial, we’ll show how can you have a model ready to use to make predictions on house prices trained on the “House Prices — Advanced Regression Techniques” competition from Kaggle.

Throughout this process, we’ll use MindsDB. MindsDB allows you to leverage your database’s valuable data to make predictions faster by integrating machine learning directly into the database. You can build, train, optimize, and deploy ML models without using additional platforms. Simply query your data to get forecasts/predictions. This tutorial doesn’t cover the installation part but you can use this link to learn how to start using MindsDB.

House Prices — Advanced Regression Techniques Banner from Kaggle

We are going to use the training data from the competition page, to summarize the goal of the project, let’s say you ask a home buyer about their ideal home, they likely won’t mention factors like the basement ceiling height or the distance from a railroad. However, this dataset from a playground competition demonstrates that many other factors besides bedroom count or white-picket fences impact price negotiations. The dataset consists of 79 features that describe nearly every aspect of residential homes in Ames, Iowa. The challenge is to predict the final price of each home using this information. These features consist of information like MSSubClass (The building class), LotArea, Street, Alley, etc.

To begin using our data it’s enough to upload it as a file to our MindsDB instance, if we had our data on another platform we could very well integrate it using MindsDB’s data-adding features.

3rd Party Import
File Upload

Once the file upload is done we can access our data by merely using an SQL query. Since we uploaded it as a file it’s automatically placed under the “files” directory.

How to inspect our data

Training a model is equally trivial, yes you heard it right “training a model is as trivial as running a basic SELECT query”. :)

Train the Model

We use a special query called CREATE MODEL but CREATE PREDICTOR would also work for the same purpose, here what we’re doing is creating a model called “house_price_predictor” under the directory “mindsdb”, then we show the file that we want to train our model on and finally we specify what feature to predict, in our case it’s “SalePrice”.

We can use several other queries to check what models we have and we can get a detailed overview of the properties they carry.

Select Models
Describe a Model
Description of Our Model

As can be seen above we have an r2 score of .912 for this model, it also gives a score to each column based on their importance to the final results.

Finally, to get predictions we can basically query our model and see the result.

Prediction Query

Above is an example query and isn’t too different than what one would expect, 2 special things to pay attention to here are;

We can query our model to get a prediction and the reasoning behind it using a syntax like

SELECT [target], [target_explain]

Another important aspect is to query our model directly rather than querying the data table we initially introduced.

Prediction Result

As can be seen above our prediction looks reasonable and explains how it came to this conclusion.

Thank you very much for reading and following along, friends. If you want to access content like this and spend time with curious, intelligent, and hardworking colleagues, we also welcome you to our Discord server. 🚀 Link

--

--