Data Analysis

Alex Souza
blog do zouza
Published in
6 min readApr 18, 2022

Version in Portuguese…

The objective here is to show a step by step of the day-to-day of a professional who works with Data Analysis, like the other publications here on the blog, they are live publications that will be constantly updated and improved (with everyone’s help).

Let’s go… but before starting, I suggest reading the article below that comments on the soft skills necessary for a good data analyst:

  • critical thinking
  • Problem solving
  • Effective Communication
  • Presentation Skill
  • Networking and ability to work in a team
  • Emotional Intelligence and Empathy
  • Lifelong Learner

Now, on to the hard skills

What is Data Analysis?

Data analysis is the art of transforming data into relevant knowledge and insights . That is, comparing or aggregating the raw information to understand what the data tells us.

How to “apply” Data Analysis?

Now that we know a little more about data analysis and its types, I will show you how I apply Data Analysis in my daily life…

CRISP-DM is the abbreviation for Cross Industry Standard Process for Data Mining, which can be translated as Inter-Industry Standard Process for Data Mining. It is a data mining process model that describes approaches commonly used by data mining experts to attack problems.

I organize my thinking and projects based on the CRISP-DM methodology, and following the steps I will describe below:

Business Understanding

Here, in addition to understanding the customer’s problem , his real need (being very curious, asking the right questions, putting himself in the customer’s shoes), I seek to understand the company’s business as a whole, how the whole works (This will help you a lot ). Documentation of the understanding of the business.

Understanding the data

Now, it’s knowing where to find the data that will support you in solving the problem. Here you will collect data from the most diverse data sources, data migrations, ETL processes, aggregated data and etc… Documentation of where to find the data.

Data preparation

Null Values, Empty Fields, Data Quality, Data Standardization, Exploratory Data Analysis is very good here to understand the data you have to work with. Documentation of all standardized, adjusted points (best case scenario, these points are adjusted at the origin).

Modeling

Here we can follow some paths, and I will mention some possible scenarios:

  • sometimes the customer just needs a database containing the information he needs for himself to carry out his analysis and take his insights (self-service). Here the delivery can be an SQL script.
  • sometimes the need is for something more structured for analysis ( Business Intelligence ), that is, a Data Warehouse. Here we would have to assemble or add tables to the existing DW, as requested by the client and He himself could do the analysis (self-service). Here the delivery can be an SQL script.
  • Still in relation to Business Intelligence , sometimes the need is for a specific report or dashboard (Data Visualization), where the delivery is a ready-made analysis. Here we can deliver a dashboard in Power BI, for example, as requested by the customer. Just remembering that in most cases, we will need to create or improve the databases, or tables in the DW to meet the requested need.
  • in some cases, the need may just be an understanding of how the company’s database is, how the company’s data quality is, how complete the database is, and so on. Here the delivery can be an Exploratory Data Analysis using Python for example.
  • the need can be something more predictive, here we would involve machine learning , for example. Here the delivery can be a Python or R notebook, or even an application containing the requested.

Evaluation / Validation

Here is the phase where we send what is requested by the customer, for their evaluation and so that they can check if we are on the right path so that we can either go back in the process or move on to the next and last phase, the deployment phase.

Put into production (Final Delivery)

Here in a very simple and summarized way, is the final version. The version that we will deliver to the client and He will use it in his analyses, be it of any type.

--

--