Phases of Data Analytics Life Cycle

Akanksha Menon
4 min readOct 29, 2021

--

Data Analytics Life Cycle defines the process of the how information is carried out in various phases for professionals working on a project. Its a step-by-step procedure that is arranged in a circular structure. Each phase has its own characteristics and importance.

Different Phases of Data Analytics Life Cycle

Phase 1: Discovery

This is the first initial phase which defines the data’s purpose and how to complete the data analytics life cycle. First identify all the critical objectives and understand the business domain. Accumulate the resources by analyzing the models that are intended to be developed and evaluate the data sources needed.

This phase’s critical activity will include framing the business problem, formulating the initial hypothesis that can be later tested with the data and beginning the data learning.

Phase 2: Data Preparation

This phase will involve collecting, processing and cleansing the data prior to modeling and analysis. One of the main aspects is ensuring data availability for processing. Identifying various data sources and analyzing how much data can be accumulated within a time frame is done. Data Collection methods that are used in this phase are:

  • Data acquisition: Collecting data through external sources.
  • Data Entry: Prepare data points through manual entry or digital systems.
  • Signal reception: Accumulating data from digital devices such as IoT devices and control systems.

Common tools used are — Hadoop, Spark, OpenRefine, etc.

Phase 3: Model Planning

In this phase, the team will analyze the quality of the data and find an appropriate model for the project. An analytic sandbox is used to work with the data and to perform analytics throughout the project duration.

Data can be loaded into the sandbox in three ways:

  • Extract, Transform, Load (ETL) — The data is transformed based on a set of business rules and then loaded into the sandbox.
  • Extract, Load, Transform (ELT) — The data is loaded into the sandbox and then transformed according to a set of business rules.
  • Extract, Transform, Load, Transform (ETLT) — It has two transformation levels and is a combination of ETL and ELT.

After cleaning the data, the team will determine the different techniques, methods, and workflow for building a model in the next phase. The team will first explore the data, identifying the relations between data points to select the key variables, and eventually devising a suitable model.

Common tools used are — R, SAS/ACCESS, SQL Analysis services,etc.

Phase 4: Model Building

The team will develops training, testing and production datasets in this phase. Once this is done, the team will build and execute the models. The data will be tested and various statistical models like regression, decision trees, etc. will be performed to determine whether it corresponds to the datasets. Although the modeling techniques and logic required to develop models can be highly complex, the actual duration of this phase can be short compared to the time spent preparing the data and defining the approaches. Once the data science team can articulate whether the model is sufficiently robust to solve the problem or if it has failed, it can move to the next phase.

Common commercial tools used here are — Matlab, STATISTICA, Alpine Miner, etc.

Common Free tools used are — R, Octave, WEKA, Python, etc.

Phase 5: Communicate Results

This phase determines whether the results are a success or failure. The data analysis results are evaluated and considered how best to formulate the findings and outcomes to various team members and stakeholders, taking into account warning, assumptions. The team will identify key findings, quantify business value, and develop a narrative to summarize and convey the findings to stakeholders. Also make recommendations for future work or improvements to existing processes. Stakeholders must understand how the model affects their processes.

Phase 6: Operationalize

In the final phase, the team will present the full in-depth report with the briefings, coding, key findings and all the technical documents and papers to the stakeholders. In this process, the data from the sandbox is moved and run in a live environment. This approach helps in learning about the performance and constraints of the model in a live environment on a small scale and make the necessary adjustments before deployment. The results are closely monitored, ensuring they match with the expected goals. If the findings fit perfectly with the objective, then the report can be finalized. The model can be then deployed and integrated into the business.

Conclusion

The Data Analytics lifecycle is a circular process consisting of six primary stages that define how the information is created, collected, processed, used, and analyzed. Mapping out business objectives and aiming towards achieving them through the rest of the stages.

--

--