Data Analytics 101 Series — The ‘Analyze’ Phase

Adith Narasimhan Kumar
Analytics Vidhya
Published in
5 min readOct 26, 2023

--

Now that we have asked the right questions, prepared the data, and processed it squeaky clean, it's time to dive into the heart of the entire process. The Analysis phase! But what is analysis? want to know? Let’s get right into it!

A graph showing trends and patterns

What is Analysis?

The formal definition of analysis is

Analysis is the process used to make sense of the data collected.

Data once collected and cleaned has to be made use of. This process of making use of the data by comprehending patterns and meanings from it is called analysis.

What is the goal of the analysis?

The goal of the analysis is to identify trends and relationships within data so you can accurately answer the question you’re asking.

There are 4 phases of analysis:

  1. Organize data
  2. Format and adjust data
  3. Get input on the data from others
  4. Transform and analyze the data

1. Organize data

The organization of data sets is really important for data analysts. Most of the data is organized into tables. Tables are the way to go as they allow you to categorize and manipulate data. Data organized into tables can be found in spreadsheets and SQL Databases.

The importance of categorizing data is seen when there is a shift in focus during the analysis phase. This shift requires a data analyst to focus on a particular section of the data set often narrow in scope.

Common ways to organize data are by sorting and filtering.

Image from Google Data Analytics

Sorting is when you arrange data into a particular order by ranking it based on specific criteria. this makes the data easier to understand and visualize.

Filtering on the other hand gives you only the data that you want and abstracts the data that doesn't meet the criteria fed. Filtering is useful when there is lots of data and when one has to focus on a narrow or specific section of the entire dataset. Depending on the scenario, both tools are quite handy to organize a huge data set.

Another handy option is to use pivot tables to summarize the data and help you organize it by giving a quick glance at the basic info.

2. Format and adjust data

Datasets consist of information collected from multiple sources and it's the data analyst’s job to make sure that the format of the data is aligned and consistent throughout.

image showing data conversion
Image from Google Data Analytics

Analysis of data is easier when the format of the data is clean and consistent. This formatting includes the conversion of data types such as string to int, int to string, etc. But, it can also mean converting real-world data into percentages, dates, strings, or currency.

For example, when you want to sort the dates column of a data set in a spreadsheet, it's important to have that column set to the ‘date’ data type or the spreadsheet may sort it alphabetically and not date-wise.

also, it's possible for a data set to have incorrect units of measurement irrelevant to the analysis. incorrectly formatted data often leads to mistakes in the analysis, time consumption to identify and rectify those mistakes, and affects stakeholders’ decision-making.

Tools such as conditional formatting and data validation help us verify the formatting we’ve performed in spreadsheets.

3. Get input on the data from others

This part of the analysis phase is pretty straightforward and happens very frequently (lol). When you constantly look at the same data over and over, you sort of become immune to the mistakes right in front of you. This is the time when an independent perspective helps.

4. Transform and analyze the data

This is pretty much the heart of the entire analysis phase. Transformation of data to your use case pretty much locks the section of data you want to analyze and hence is the most vital. There are several tools and techniques that can be applied to achieve this. let’s go through each of them.

Data Aggregation:

Till now we have been discussing individual pieces of data. To perform our analysis we have to do something called data aggregation. This is the process of collecting and combining data from multiple sources into one. Data can also be aggregated over a given time period to provide statistics such as sum, average, minimum, and maximums.

Data aggregation helps identify trends, make comparisons, and gain insights.

Data aggregation is made possible by functions.

One of the most important and useful tools for data aggregation is vlookup. vlookup searches a data range for a value and returns the corresponding information associated with the search value.

The syntax for vlookup is

=VLOOKUP(search_key, range, index, [is_sorted])

Here,

search_key: It is the value to search for

range: range to consider for the search

index: column index of the value to be returned. The first column is always numbered 1. If the index is not between 1 and the number of columns in the range, #VALUE! is returned.

is_sorted: This indicates whether the column to be searched is sorted. It’s true by default. It is recommended to set it to false, and if so, an exact match is returned.

Data Calculations

Calculations are one of the more common tasks that data analysts complete during analysis. They help us perform various operations and manipulations on fundamental values to arrive at derived ones. Tools such as functions and formulas help us achieve this in spreadsheets.

Some common formulas in spreadsheets include addition, subtraction, multiplication, and division.

Some common functions include:

sum

= sum(range)

sumif

=SUMIF(range, criterion, sum_range)

sumifs

=SUMIFS(sum_range, criteria_range1, criterion1, [criteria_range2, criterion2, ...])countif, countifs

countif

=COUNTIF(range, criterion)

countifs

=COUNTIFS(criteria_range1, criterion1, [criteria_range2, criterion2, ...])

Another powerful tool for analysis is the pivot table. You can perform a wide range of analysis tasks with your pivot tables. The functionalities range right from quickly drawing meaningful insights from your data, to performing calculations, sorting, and filtering.

Conclusion

To summarize, we have covered the various aspects of the analysis phase, the tools, and the techniques associated with it. Although this article covers the high-level intention and process of the phase, the real-world use case might not be perfect and can be a bit different than the ideal scenario. However, the techniques and the tools used should remain the same.

Hope you liked my article! Do share your thoughts and inputs and I’ll try to have them answered to my knowledge in future articles!

Happy Learning!

Check out my other articles on Blockchain and Machine Learning/Deep Learning. Let me know about any other topics to cover in the future!

Catch my previous article here 👇

--

--