DATA ANALYSIS 101

Boffin
6 min readJan 3, 2023

--

data being interpreted on a whiteboard
The first step to analyzing data is interpreting data

Introduction:

Data analysis is the process of collecting, cleaning, and exploring data in order to draw insights and make informed decisions. It is a critical skill in today’s data-driven world. The knowledge of data analysis applies to different industries not limited to Healthcare, Finance, Retail, Manufacturing, and Government.

Other industries where data analytics is applied include: Telecommunications, Transportation, Energy, Education, Entertainment, Sports including Non-profits

There are many different types of data that can be analyzed, including numerical data (such as age, income, or quantity), categorical data (such as gender or job title), and text data (such as reviews or comments). Data can be collected in a variety of ways, including surveys, experiments, and observations. Before it can be analyzed, data often needs to be cleaned and prepared, which may involve correcting errors, removing duplicates, or transforming the data into a usable format.

There are many different tools and software programs available for data analysis, such as Excel, R, Python, and SQL. Choosing the right tool depends on the needs of the project and the skills of the analyst. Once the data is prepared and the appropriate tools are chosen, data exploration and visualization can help identify trends, patterns, and relationships in the data.

The importance of data analysis in decision-making and problem-solving cannot be overstated as It allows organizations to make better decisions, optimize processes, and improve outcomes.

Types Of Data

I. Numerical data: Numerical data is data that is represented by numbers. It can be continuous (such as a person’s age or income) or discrete (such as the number of children in a family). Numerical data is often used to measure quantities or amounts.

II. Categorical data: Categorical data is data that is divided into categories or groups. It is often used to describe characteristics or attributes of a group or individual. Examples of categorical data include gender, occupation, or educational level.

III. Text data: Text data is data that is in the form of words or sentences. It is often used to represent qualitative information, such as customer reviews or comments. Text data can be challenging to analyze, as it requires techniques such as natural language processing or text mining to extract meaning.

Collecting and Sourcing data

Data sourcing involves the gathering of data from direct checks or reports or from already available sources. The efficacy of a data is based on the reputation of its source as verified data would usually be confirmed by and from government or international organizations.

I. Surveys: Surveys are a common method of collecting data. They involve asking a group of people a set of questions and recording their responses. Surveys can be administered in person, over the phone, or online. Surveys are useful for gathering data from a large number of people and can be used to gather both numerical and categorical data.

II. Experiments: Experiments are a method of collecting data that involves manipulating one or more variables and observing the effects on a dependent variable. Experiments are useful for identifying cause-and-effect relationships and are often used in scientific research.

III. Observations: Observations involve collecting data by watching and recording the behavior of individuals or groups. Observations can be either structured (following a predetermined set of rules) or unstructured (allowing for flexibility and creativity). Observations are useful for collecting data in naturalistic settings and can be used to gather both numerical and categorical data.

Cleaning and preparing data

I. Correcting errors: Data often contains errors, such as typos, incorrect values, or missing values. Correcting errors is an important step in the data cleaning process, as errors can lead to incorrect or biased results. There are several methods for detecting and correcting errors, including manual checking, validation checks, and using automated tools.

II. Removing duplicates: Duplicate data is data that appears more than once in a dataset. Removing duplicates is important because it can reduce bias and improve the accuracy of the analysis. There are several methods for identifying and removing duplicates, including using tools or functions in software programs or manually reviewing the data.

III. Transforming data into a usable format: Data often needs to be transformed into a usable format in order to be analyzed. This may involve restructuring the data, such as pivoting or reshaping it, or converting the data from one format to another (such as from a PDF to a spreadsheet). Transforming data into a usable format is an important step in the data preparation process as it allows the data to be properly analyzed and visualized.

Choosing the right tools

“Tools” used for data analysis involves software programs useful in the cleaning and analysis of data. These tools are generally classified into 3 groups;

I. Statistical tools: Statistical tools are software programs or applications that are used to perform statistical analysis and modeling. Statistical analysis involves using statistical methods to describe and summarize data, test hypotheses, and make inferences about a population based on a sample. Statistical modeling involves using statistical techniques to build predictive models that can be used to make predictions or forecasts.

· Excel is a popular statistical tool that is widely used in business and other fields. It has a range of built-in statistical functions and features, such as pivot tables, graphs, and pivot charts, that can be used to analyze and visualize data.

· R is a programming language and software environment specifically designed for statistical computing and graphics. It has a large and active community of users and developers, and is widely used in academia and industry.

II. Programming tools: Programming tools are software programs or applications that are used for programming and coding. Programming involves writing instructions in a specific language (such as Python or SQL) that can be executed by a computer to perform tasks. Programming tools provide an interface for writing and testing code, as well as debugging and deploying applications.

· Python is a popular programming language that is widely used in data analysis and other fields. It has a large and active community of users and developers, and has a range of libraries and tools for data manipulation, visualization, and machine learning.

· SQL (Structured Query Language) is a programming language specifically designed for managing data in relational database management systems (RDBMS). It is widely used for tasks such as querying databases, creating tables and views, and updating data.

III. Visualization tools: Visualization tools are software programs or applications that are used to create charts, graphs, and other visualizations of data. Visualization is an important part of data analysis as it helps to make data more accessible and easier to understand, also helps to identify trends, patterns, and relationships in the data that might not be apparent from looking at the raw data.

· Tableau is a popular data visualization tool that is widely used in business and other fields. It has a range of features for creating interactive dashboards, maps, and charts, and has a user-friendly interface that is suitable for both beginners and experts.

· Power BI is a business intelligence and data visualization tool developed by Microsoft. It has a range of features for creating interactive dashboards, reports, and charts, and has a user-friendly interface that is suitable for both beginners and experts.

· Flow is primarily used to create interactive 3D graphics and data visualizations that can be shared and explored online. It is a powerful tool for communicating data insights and creating immersive experiences, and is suitable for a wide range of applications, including data visualization, scientific simulation, and interactive graphics.

Conclusion

In conclusion, data analysis is a powerful tool for understanding and improving the world around us. It involves collecting, cleaning, and preparing data, and using statistical, programming, and visualization tools to explore and visualize the data. Data analysis is used in a variety of fields to inform decisions, optimize processes, and solve problems.

As we have seen, there are many different types of data, including numerical, categorical, and text data. There are also many different methods for collecting data, such as surveys, experiments, and observations. And there are many different tools and techniques available for analyzing and visualizing data, including Excel, R, Python, SQL, Tableau, Flow, and Power BI.

By learning the basics of data analysis, you can gain valuable skills that are in high demand in today’s data-driven world. Whether you are a student, a professional, or just curious about data, there are many resources available to help you get started with data analysis. So why not take the first step and learn more about data analysis today?

--

--