Are you willing to be a Data Scientist?
Lets begin with Data Analysing using CRISP-DM methodologies
As a data analyst now I am on my way to be a data scientist, in this path while extending my skills, I would like to share some main topics that I worked on it, and this post is the first one.
Definition of CRISP-DM
CRISP-DM is cross-industry standard process for data mining , with this methodology, it is provided that the processes progress and be understood step by step
These steps do not have sharp separations that in the process of making sense of the data, new business questions may arise and go back to business understanding phase, or while evaluating the outputs, you may go back to business understanding and business needs may change.
We will go through step by step while analysing Seattle and Boston Airbnb 2016–2017 data.
Business Understanding
First of all we need to understand the business cases and requirements, in this case there would be some questions needed to be answered about Airbnb s
- How are the availabilities of airbnbs in Seattle changing monthly?
- How is price changing city by city of Seattle airbnbs?
- How is price changing when comparing Seattle and Boston?
Data Understanding
We need to understand what we have and how we can answer our questions with this data, do we need more data, or can we answer more questions.
While trying to understand data, new business questions may arise.
In data understanding phase, looking from a big picture perspective, and planning the works to be done and we see that the first thing is preparing the data to be analysed
Data Preparing
I think this is the most challenging phase, that you need to spend most of your time to prepare data for modelling and analysing, you may need to use descriptive or inferential statistics or various data visualization classes
- Checking data types and nulls (making desicions about handling nulls in the data)
2. Using descriptive or inferential statistics
3. Column based transformations (convert, rename, remove, split)
4. Removing unnecessary data with simple or complex filters
and so on.
Data Modelling
If you need to answer some questions with Machine Learning algorithms like predicting price, you need to model the data, and make decision about which model are you going to use to get answer for the question
Also while modelling, you may need to go back to data preparation phase
Evaluation of the Results
After data understanding and preparing part, without modelling and using some descriptive statistics and data visualization classes we can get answers for the questions
In data preparation phase we are using exploratory visualizations, but evaluating outputs with audience we need to use explanatory visualizations
I had answers of my questions as below.
How is availability of Airbnbs changing by months in Seattle?
It seems that availability is upper than 50% for all months of the year, but we can say that availability rate decreases gradually due to the summer season
How is price changing city by city of Seattle airbnbs?
When calculating average monthly prices of airbns in Seattle, it seems that city center has the highest values
How is price changing when comparing Seattle and Boston airbnbs?
It is obvious that Boston prices are higher then Seattle, especially in September
Deployment
This phase is also important because of that reason communication is the most important part of the role of a data scientist. We can get more knowledge and motivation for our job by sharing experiences and results
As I learned being a data scientist is not just data analysis or designing a machine learning model, it requires adding business value to the work you are doing, so let’s continue to improve our skills constantly.
Thank you for reading to the end :)
I shared python codes and data set of this project on Git hub