Unleashing the Power of Data: A Step-by-Step Guide to Crafting a Winning Data Plan
DATA VISUALIZATION
Back story: The world is full of data, and exploring, cleaning, and visualizing it are crucial steps in working with data. Nevertheless, gaining a deep understanding of the data and the ability to analyze it effectively requires additional skills and knowledge, which can be acquired through taking a course or participating in the Data Career Jumpstart #21DaystoData Challenge, as I did. In this article, you will find a detailed guide what are the step to visualize data with the assistance of Coach Avery.
For more information about Data Career Jumpstart click here
Introduction:
The “circle of life” for data typically refers to the various stages or steps involved in the complete lifecycle of data, from its creation to its eventual archiving or deletion. The steps in the circle of life for data typically include:
- Data creation: This is the stage where data is first created. In the lesson, the following questions to keep in mind when imagining/planning the data are as follows:
A. Why do you need this data? What problem are you trying to solve? and What fields will you need?
2. Data collection: This stage involves gathering the data from its various sources and bringing it into a centralized repository, such as a data warehouse. “Most data must be consciously collected whether automated or not.”
3. Data storage: This stage involves storing the cleaned data in a secure and accessible location, such as a database or data lake.
b. Data should be stored somewhere accessible, cleaned, and structure in a document.
3. Data cleaning: In this stage, the data is cleaned and transformed to ensure that it is in a consistent format and free of errors.
c. To prevent errors in your data pipeline, it’s important to validate the data before storing it. Making sure the data is uniform and free of mistakes will simplify the processing and analysis later on.
4. Data Explore: This stage involves the process of analyzing and understanding data to uncover hidden patterns, trends, and relationships.
d. Getting to know your data involves spending time with it. By becoming familiar with the data, you can better understand its characteristics and how it can be used to meet your needs.
5. Data visualization: This stage involves presenting the data in a visual format, such as graphs, charts, or dashboards, to make it easier to understand and communicate the insights.
e. Visualizing your data is key to human understanding
6. Data modeling: This stage involves the process of creating a representation of data and its relationships through the use of machine learning algorithms, statistics for the purpose of designing and structuring data in a database.
7. Data communication: This stage involves the transfer of data or information from one device to another through a communication channel.
f. Communicating with stakeholders to discuss the process, findings, and results of your data analysis is crucial.
8. Data deployment: This stage involves the process of delivering or moving data from one environment to another.
g. Share your findings and insights with relevant stakeholders to promote transparency and facilitate decision-making.
Process when Solving a Problem
Types of Analytics:
credit: image creating the graph
- Descriptive, which answers the question, “What happened?”
- Diagnostic, which answers the question, “Why did this happen?”
- Predictive, which answers the question, “What might happen in the future?”
- Prescriptive, which answers the question, “What should we do next?”
Define:
Source: Career Foundry
- Descriptive Analytics: looks at what has happened in the past. As the name suggests, the purpose of descriptive analytics is to simply describe what has happened; it doesn’t try to explain why this might have happened or to establish cause-and-effect relationships. The aim is solely to provide an easily digestible snapshot. Example:Google Analytics is a good example of descriptive analytics in action; it provides a simple overview of what’s been going on with your website, showing you how many people visited in a given time period, for example, or where your visitors came from.
- Diagnostic Analytics: seeks to delve deeper in order to understand why something happened. The main purpose of diagnostic analytics is to identify and respond to anomalies within your data. For example: If your descriptive analysis shows that there was a 20% drop in sales for the month of March, you’ll want to find out why. The next logical step is to perform a diagnostic analysis.
- Predictive Analytics: seeks to predict what is likely to happen in the future. Based on past patterns and trends, data analysts can devise predictive models which estimate the likelihood of a future event or outcome. Example: Perhaps you own a restaurant and want to predict how many takeaway orders you’re likely to get on a typical Saturday night. Based on what your predictive model tells you, you might decide to get an extra delivery driver on hand.
- Prescriptive Analytics: looks at what has happened, why it happened, and what might happen in order to determine what should be done next. In other words, prescriptive analytics shows you how you can best take advantage of the future outcomes that have been predicted. What steps can you take to avoid a future problem? What can you do to capitalize on an emerging trend? Example: prescriptive analytics in action is maps and traffic apps. When figuring out the best way to get you from A to B, Google Maps will consider all the possible modes of transport (e.g. bus, walking, or driving), the current traffic conditions and possible roadworks in order to calculate the best route.
Quantitative vs. Qualitative
Classifying your data as qualitative or quantitative is a critical step in identifying and understanding the type of data you are working with. This classification will also help determine the appropriate approach to take when analyzing and modeling your data.
Quantitative research is a type of research that focuses on numerical data and mathematical analysis. It often involves large sample sizes and uses statistical methods to make generalizations about a population based on the findings from the sample.
Quantitative: In mathematics and statistics, “discrete” refers to a set of distinct, separate values or categories, while “continuous” refers to a set of values that are connected and without interruption or gaps. (Discrete: Think counting of people and Continuous: Possible to have decimals height of person)
For example, a discrete variable might be the number of children in a family (which can only take on certain values such as 0, 1, 2, 3, etc.), while a continuous variable might be a person’s height (which can take on any value within a certain range).
Qualitative research, on the other hand, focuses on understanding the experiences, perceptions, and behaviors of individuals and small groups, often through interviews, observations, and other non-numerical methods. It aims to gain insight into the social and cultural context in which the research is taking place, and the reasons why people act in certain ways.
Qualitative: Nominal data refers to data that consists of named categories with no inherent order or ranking. Nominal data is often used to categorize data based on characteristics such as gender, color, or nationality. For example, hair color (blonde, brown, black, red, etc.) is a nominal variable. (Nominal: General categories red, blue, green)
Ordinal data refers to data that consists of categories that have an inherent order or ranking, but the differences between the categories are not necessarily equal. For example, educational degrees (high school, associate’s degree, bachelor’s degree, master’s degree, etc.) are an example of ordinal data. (Ordinal: Order categories that have direction small, medium, large)
Individual Records: It refers to a set of information that pertains to a single entity, such as a person, an object, or an event. An individual record is typically composed of multiple fields, each of which contains specific information about that entity. For example, an individual record for a person might contain fields for the person’s name, address, date of birth, and other relevant information.
In a database or spreadsheet, individual records are often organized into rows, with each row representing a single entity and each column representing a field of information. When working with data, individual records are often used to build larger datasets, which can then be analyzed to gain insights and make decisions.
Aggregate Data( Summary Data) : Aggregate data refers to a summary of a larger dataset, often obtained by calculating summary statistics or grouping individual records into categories, such as count, maximum age, or average IQ.
In conclusion, data visualization plays a crucial role in the data lifecycle. It allows us to see the data and insights in a way that is easy to understand and communicate to others. The types of analytics, descriptive, diagnostic, predictive, and prescriptive, all have their own unique benefits and are used to answer different questions about the data. Understanding the difference between quantitative and qualitative research is also important as they each have their own strengths and limitations. Ultimately, the goal of data visualization is to help us make sense of the data and make informed decisions based on the insights it provides.
If you are interested in Data Structure and Algorithms check my previous blog “The Zigzag Challenge: Solving the Conversion Problem with Algorithms and Data Structures.”
Do comment below and share your thoughts. I hope you have enjoyed learning about A Step-by-Step Guide to Crafting a Winning Data Plan