Data Analysis Project With Python To Determine The House Sales Price In King County, USA: Part 1

Alphonse Brandon
2 min readMar 18, 2022

--

King County, Washington State, USA — Arial View

Let’s determine the market price of a house given a set of features. We will analyze and predict housing prices using attributes or features such as square footage, number of bedrooms, number of floors, and so on

In this project, I will walk you through the different stages of predictive analysis with python. The steps include:

  1. Data Collection
  2. Data wrangling
  3. Exploratory data analysis
  4. Model Development
  5. Model evaluation and refinement

Okay! I guess we’re set and ready to go. In this first part, we will deal with data collection by downloading our dataset, loading it in a pandas dataframe, and understanding the data through statistical methods

Let’s dive right into it! We begin by importing the python packages and modules we will be using as seen below

Next, let’s download our data, load in a data frame, and view the first 5 columns of the data frame as below

Now, let’s understand our data set by exploring the different data types contained in the columns of the data frame as below

Worthy of note is the fact that it is very important to know the different data types in the different columns of our data frame as it will enable us to carry the right operation on the right columns of our data set

To end this data collection stage, let’s now go ahead to view the statistical summary of our data frame as below

As you might have noticed, this outputs the statistical summary of only numerical columns. Let me also show you how to retrieve summaries from columns that contain string/text data. See the snippet below

As seen above, this gives the categorical description for the entire data frame, there is also a way to access the summaries for only one or more selected columns as we will see soon…

This concludes the first part of this project which dealt with the data collection stage of the data science methodology.

In the next article, I will walk you through the data preparation and preprocessing stages, which will involve cleaning our data and storing it in the desired format.

Please follow me and also subscribe to my medium newsletter below so as to receive notification on part 2 of this project.

Don’t forget to leave a clap and a comment if this article was helpful to you. You can have access to the source code on my Github repository using this link

--

--

Alphonse Brandon

I draw insights from large data sets and build machine learning systems until 4pm. Then I come here and tell you stories about them.