Data science process

Zuhailinasir
5 min readJan 14, 2023

--

What is Data science?

Data science is the study of analysing large datasets using cutting-edge tools and methods to reveal hidden patterns, extract useful insights, and guide strategic business choices. There is flexibility in terms of both the origin and presentation of the data used in the analysis. Data scientists construct their forecasting models with the aid of intricate machine learning algorithms. Data science is the study of gaining useful insights from data by using a combination of familiarity with a certain topic, proficiency with computer programming, and an understanding of mathematics and statistics. Researchers in the field of data science use machine learning algorithms to analyse and interpret data from a variety of sources, including but not limited to numbers, text, photos, video, and audio, in order to create AI systems capable of performing activities that would otherwise require human intellect.
Phases of a Data Science Project’s Lifecycle
Continuing with our definition of data science, consider the various stages of the data science process. There are several phases in the data science lifecycle, each with its own set of responsibilities:

1. Conceiving Issues and Gaining Perspective on the Business Worldhttps://25917b2k8hnpeu6eqxuxsj6brt.hop.clickbank.net

During this initial stage of data analytics, stakeholders routinely execute activities such as analysing company trends, conducting case studies of comparable data analytics, and researchingthe business industry’s domain. Each member of the team contributes to a comprehensive analysis of the available tools and infrastructure, as well as the time commitment and technological prerequisites. After these analyses are finished, stakeholders may begin to formulate a working hypothesis for addressing all business difficulties in light of the present market situation.

• Be specific about the issue you’re trying to tackle.
• Determine the worth of the project in its current form.
• Learn to recognise the potential dangers of the endeavour, including any ethical issues that may arise.
• Create and share an adaptable, high-level project plan.

2. Data Preparation

In the next stage, after the first stage of conceiving issues and gaining perspective in the business world, this is usually a decision made by a data scientist or a business/data analyst. As the longest and most laborious part of the process, data preparation is also one of the most crucial. Your model’s quality will be proportional to the information you provide.

• Choosing Useful Information
• Bringing together disparate data sources to form a unified whole
• Data cleansing
• Taking action on the missing data by erasing them or making educated guesses
• Taking care of inaccurate information by erasing it
• Using box plots, we can look for and deal with data outliers.

3. Exploratory Data Analysis (EDA)

Before creating the real model, it is necessary to get some understanding of the answer and the variables that impact it. In order to analyse the data and its characteristics, we generate charts, plots, and heat maps. We should keep a few factors in mind when studying the data, such as the fact that the data should be free of redundancy, missing values, and nulls. When developing models, it is essential to first isolate the most relevant aspects of the data and then exclude any extraneous information that might compromise the reliability of the results.

4. Data Modeling

The term “data modelling” refers to the method through which your company examines and describes the many types of data it generates and gathers, as well as the connections between them. Here, we consider whether the issue at hand is best tackled by using a classification, regression, or clustering approach. After settling on a model family, we must next make an informed decision on which algorithms within that family to put into action. Data quality enables companies to construct stepping stones for further development in the form of benchmarks, baselines, and long-term objectives. This measurement requires data to be structured by data description, data semantics, and consistency constraints. To facilitate the construction of conceptual models and the establishment of links between data objects, an abstract model is necessary; this model is known as a data model.

5. Model Evaluation

We created a model ahead of time. However, our model does work, right? Therefore, we need to determine the present condition of our model in order to improve its performance. Data scientists use either the hold-out or cross-validation technique to assess the efficacy of a model. Holdout assessment is used to put a model to the test on data that was not used during training. As a result, you get an objective measure of how well you’ve learned.
Commonly used measures evaluation metric for model evaluation:

  1. Brier score
  2. False positive rate | Type-I error
  3. False negative rate | Type-II error
  4. True negative rate | Specificity
  5. Negative predictive value
  6. False discovery rate
  7. True positive rate | Recall | Sensitivity
  8. Positive predictive value | Precision
  9. F beta score
  10. Accuracy
  11. F1 score
  12. F2 score
  13. Cohen Kappa
  14. Matthew’s correlation coefficient
  15. ROC curve
  16. ROC AUC score
  17. Precision-Recall curve
  18. PR AUC | Average precision
  19. Log loss
  20. Confusion Martix
  21. MSPE
  22. MSAE
  23. R Square
  24. Adjusted R Square

6. Model Deployment

At this point in our existence, there is nothing left to do except die. In this phase, the means of transporting the model to end users or another system are developed. Depending on the task at hand, this stage could mean a variety of things. The output of your model might be as easy to get as a Tableau dashboard. The process of bringing it to the cloud and making it accessible. Engineering-focused team members, such as data engineers, cloud engineers, machine learning engineers, application developers, and quality assurance engineers, often carry it out.

Python Basic — An introduction to basic programming in python — https://bit.ly/Python-basic-for-beginner

--

--