From data to decision making…

Alex Souza
blog do zouza
Published in
9 min readJan 20, 2022

Hey guys, in this article the idea is to build a path from data to decision making based on posts published on the blog… Hope you like it…

What are Data?

Data are codes that constitute the raw material of information, that is, it is untreated information. Data represent one or more meanings that alone cannot convey a message or represent some knowledge.

In an electoral survey, for example, data is collected, that is, each survey participant provides their opinions and choices about certain candidates, but these opinions do not mean much in the context of the election. Only after being integrated with the other opinions will we have anything significant.

Another example would be in a police investigation for example. Initially, testimonies are collected, clues are analyzed and any type of data that may be useful is searched. However, these data alone will not tell who the criminal is.

What is Information?

Data endowed with relevance and purpose. It has meaning and is organized for some purpose.

What is Knowledge?

Information that can be used. “Knowledge, on the other hand, refers to the ability to create a mental model that describes the object and indicates the actions to implement, the decisions to make.”

What is Wisdom?

Ability to use the acquired knowledge in a simple and dynamic way.

Data Classification

Structured data:

  • Much of this data, which are related and generate information, etc., as we saw above, are stored in databases
  • Well organized to be processed by computer
  • SQL — Relational Databases
  • Example of some of them: SQL Server , Oracle , PostgreSQL , MongoDB and etc

Unstructured data:

As texts created by humans (documents, emails) and therefore not prepared to be understood/consumed by computers

  • NOSQL — Non-Relational Databases
  • Example of some of them: MongoDB

Semi-structured data:

  • JSON (linear), PARQUET e ORC (columnar), XML, HTML…

What are Data sets?

In a very summarized way, they are where we can consume data… so they can be: Spreadsheets, Databases, .csv, .xml, .txt, .json files and etc.

Here are some data sets for use in Machine Learning…

The Datasets that we found for study, for the most part, the data are already treated, pre-processed and ready for analysis, this does not happen in the “real world”, in which all the treatment of these data, pre-processing, cleaning, enrichment, data quality and etc…

What is Data Governance ?

Data Governance is defined as the joint management of policies, processes, people and technologies, which aims to structure and manage information assets, with the objective of supporting decision-making, improving operational efficiency and promoting business profitability. .

One of the most important areas of data governance is Data Quality…

Data Quality — In order for data to generate accurate information and add value to the business, it must be of quality, that is:

  • employees must know the importance of their work and everything they put into a system;
  • standardization of nomenclatures, Portuguese rules must be applied, without slang, without abbreviations, with correct accentuation, etc.;
  • Term of responsibility with the data is often used, that is, the person who typed it is responsible for that data or information. (It may come as an item in your company’s Information Security Policy).

But we cannot forget about Data Security and I want to highlight here the general Data Protection Law — This Law determines a series of precautions and procedures for any individual or legal entity, of public or private law, that promotes the collection, treatment, classification , storage, disposal, transfer and sharing of personal information and data.

What is Data-Driven ?

Analytical culture is not about creating tools, it is about making business decisions based on data.

The basis of Analytical Culture cannot be tools, it needs to be people and process.

Tools must be used to solve well-specified problems, while culture must run in the corridors and ensure that business questions are being answered with the support of information and strategic analysis, that is, data-driven.

With data well-maintained and a data-driven culture in mind, let’s move on to Business Intelligence.

And after all, what is Business Intelligence ?

Business Intelligence (BI) — refers to the process of collecting, organizing, analyzing, sharing and monitoring information that supports business management. It is the set of theories, methodologies, processes, structures and technologies that transform a large amount of raw data into useful information for strategic decision making.

A very important step in BI is the ETL , Extract Transform Load , are software tools whose function is to extract data from various systems (Data Source, which we explained above), transform this data according to rules (Data Quality, defined above) and finally loading the data usually to a Data Mart and/or Data Warehouse (DW), but nothing prevents it from also sending the data to a certain system of the organization.

Example of an ETL software — PDI — Pentaho data integration .

Due to the growing market demand for increasingly advanced analysis (using machine learning, etc.), Data Warehouses have been evolving and a new concept has entered the market, Lakehouses, which supply some limitations of DW, when organizations begin to use unstructured information for their analyses. Check out more at: Lakehouses .

Now we have our data loaded in our Data Warehouse and properly treated, we are going to use Business Intelligence tools to generate analysis and insights, always looking for a clean and objective view (using Storytelling)

Storytelling with Data is admirably written, a masterful display of rare art in the business world. Cole Nussbaumer Knaflic possesses a unique skill — a gift — in telling stories using data. At JPMorgan Chase, she helped improve our ability to explain complicated analyzes to executive management and the regulators we work with. Cole’s book brings together her talents in an easy-to-read guide, with excellent examples anyone can learn from to spur smarter decision-making.”― Mark R. Hillis, Chief Mortgage Risk Officer at Bank of America JPM Chase

Self-Service BI Tools

It’s software that allows anyone to connect to data with just a few clicks and then view and create interactive, shareable dashboards with just a few more clicks. It’s simple enough for any Excel user to learn, yet advanced enough to solve even the most complex analytical problems. Sharing your findings securely with others takes just seconds.

Some of these tools:

Some examples of Dashboards and Reports generated by Self-Service BI tools:

Currently, companies are going beyond Business Intelligence (Descriptive, Diagnostic), going to Artificial Intelligence (Predictive and Prescriptive)… Hence the name Data Science .

What is Data Science?

Data science is a term that escapes any single complete definition, which makes it difficult to use, especially if the goal is to use it correctly. Most articles and publications use the term loosely, with the assumption that it is universally understood. However, data science — its methods, goals, and applications — evolves with time and technology. Data science 25 years ago referred to collecting and cleaning data sets and applying statistical methods to that data. In 2018, data science has grown into a field that encompasses data analytics, predictive analytics, data mining, business intelligence, machine learning, and more.

The professional who is on the rise in the market is the Data Scientist — the name given to the professional in this area, who lives in three worlds: business , mathematics | statistics and IT. Its function is to transform the available data into guides for decisions to be taken. This process of working with data requires that this professional has qualifications in the IT area so that he can access and process the data efficiently and in a timely manner, mathematical skills | statistics to understand the implications of the used and business models so that you can translate all this into reports that enable assertive decisions.

Using Exploratory Data Analysis (to discover patterns, correlations, null and anomalous values, avoiding multi-collinearity and biases in the data, load balancing , outliers, etc.) models.

What is Machine Learning?

Machine Learning is a data analysis method that automates the development of analytical models. Using algorithms that interactively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed to look for something specific.

Here are some widely publicized examples of Machine Learning applications that you may already be familiar with:

  • Google’s self-driving cars that drive themselves? The essence of machine learning.
  • Online recommendation offers like Amazon and Netflix? Machine learning applications in everyday life.
  • Know what customers are saying about you on Twitter? Machine learning combined with linguistic rule creation.
  • Fraud detection? One of the most obvious and important uses in our world today.

TIPOS DE ALGORITMOS

Supervised (The computer is presented with examples of desired inputs and outputs, provided by a “teacher”. The goal is to learn a general rule that maps inputs to outputs.)

Unsupervised (No kind of label is given to the learning algorithm, leaving it alone to find structure in the given inputs. Unsupervised learning can be a goal in itself (discovering new patterns in the data) or a means to an end )

Semi-supervised (Where the teacher provides an incomplete training signal: a training dataset with some (often several) of the desired outputs missing. Transduction is a special case of this principle, where the entire set of the instances of the problem is known at the time of learning, but with part of the objectives missing.)

Reinforcement learning (A computer program interacts with a dynamic environment, in which the program must perform a certain objective (eg, driving a vehicle). The program is provided with feedback on rewards and punishments as it is navigated the problem space. Another example of reinforcement learning is learning to play a certain game just by playing against an opponent.)

More details in: Types of Learning and more here !

Projects using Machine Learning

Here’s how to create your first Machine Learning project — step by step ;

And if you prefer, this one using Deep-Learning ;

And this one using NLP ( News Recommendation )

  • And taking advantage, what is NLP — Natural Language Processing ? It holds great promise for helping to find such deep insights into company content, allowing users to more freely express their information needs and providing accurate answers to increasingly complex questions. However, enterprise NLP systems are often challenged by a number of factors, which include understanding heterogeneous information silos, dealing with incomplete data, training accurate models from small amounts of data, and navigating a changing environment in which new content, products, terms and other information are continually being added.

More on Data Science — check out this post, it has a lot of very detailed information by topic. ( Learning Data Science )

How to get started in the Data Science area?

Many ask me, how to get started in the Data Science area , see this material explaining step by step based on doubts from several people.

Questions like these:

  • What do I need to know, what are the skills , to be a Data Scientist?
  • Do I need to know English? Have a graduation? Have a master’s degree?
  • Where can I find study materials, courses, etc.?
  • What course should I take to become a data scientist ?
  • Do you have vacancies in the market?
  • How do I become more visible and get a job in data science ?
  • How do I earn BRL 20,000.00, or more, per month as I saw on television?

Projects Management

A good project management methodology is important ( PMBOK for example) and for product delivery (a Dashboard or a specific AI Model for a Sector) the SCRUM Framework can be used .

Conclusion

That’s all folks, I hope you like the content and anything you can leave a comment! It cost!!!

Related topics you will see in the articles above:

--

--