Complete Life Cycle/Pipelines of Data Science Project In Real Business Problem Statement
Whenever we have a business problem in office/production, we generally convert the business problem statement in to data science problem.
- Business and Data Understanding : Gather a business understanding and data understanding by analyzing at granular level.
- Integrate and collect data
i)Preprocess Data
ii)Data Transformation
iii)Discretization
iv)Scaling
v)categorical variables
3. Feature Engineering :
i)Create some NEW features
ii)Time series data
iii)Performing Statistical and Graphical Data Analysis (EDA)
iv)Imbalance datasets
v) train and test split
vi) Extracting features from text:Bag of words, Tfidf, n-grams,Word2vec, topic extraction in case of NLP
4) Model Creation: on the basis of EDA, patterns observed in EDA, we select ML Algorithm
By seeing dataset no one can decide which model will work best for this , so here we prefer ensemble technique.
So definitely at least try 5 or 6 models and try to see how good accuracy is
Regression problem : Linear, Lasso, Random Forest, Ada Boost, Regressor, XG Boost Regressor.