Complete Life Cycle/Pipelines of Data Science Project In Real Business Problem Statement

Ranjan Sharma
Analytics Vidhya
Published in
3 min readMay 3, 2020

--

Whenever we have a business problem in office/production, we generally convert the business problem statement in to data science problem.

  1. Business and Data Understanding : Gather a business understanding and data understanding by analyzing at granular level.
  2. Integrate and collect data
    i)Preprocess Data
    ii)Data Transformation
    iii)Discretization
    iv)Scaling
    v)categorical variables

3. Feature Engineering :
i)Create some NEW features
ii)Time series data
iii)Performing Statistical and Graphical Data Analysis (EDA)
iv)Imbalance datasets
v) train and test split
vi) Extracting features from text:Bag of words, Tfidf, n-grams,Word2vec, topic extraction in case of NLP

4) Model Creation: on the basis of EDA, patterns observed in EDA, we select ML Algorithm
By seeing dataset no one can decide which model will work best for this , so here we prefer ensemble technique.
So definitely at least try 5 or 6 models and try to see how good accuracy is
Regression problem : Linear, Lasso, Random Forest, Ada Boost, Regressor, XG Boost Regressor.

--

--