Data Analytics Essentials

Ritchie Pulikottil
The Startup
Published in
13 min readFeb 8, 2021

This article aims at foundational knowledge of Data Analytics. After reading this article, you will be able to explain what Data Analytics actually is, list the key tools and technologies within the data ecosystem, look at the different roles associated with the modern data ecosystem, and will have a conceptual understanding of the data analysis lifecycle.

Modern Data Ecosystem

A modern data ecosystem includes a whole network of interconnected, independent, and continually evolving entities. It includes data that has to be integrated from disparate sources, analyze to generate insights, and finally collaborate with the active stakeholders to present and act on the insights hence obtained.

Let’s start with the data sources. Data is available in a variety of structured and unstructured datasets, residing in text, images, videos, user conversations, social media platforms, the Internet of things (IoT devices), real-time events that stream data, legacy databases, and data sourced from professional data providers and agencies.

When you're working with so many different sources of data, the first step is to pull a copy of the data from the original sources into a data repository. At this stage, you're only looking at acquiring the data you need working with data formats, sources, and interfaces through which this data can be pulled in. The reliability, security, and integrity of the data being acquired are some of the challenges you work through at this stage.

Once the raw data is accumulated at a commonplace, it needs to get organized, cleaned up, and optimized for access by end-users. The key challenges at this stage could involve data management and work with data repositories that provide high availability, flexibility, accessibility, and security.

Finally, we have our business stakeholders: applications, programmers, analysts, and data science use cases, all pulling this data from the enterprise data repository. The key challenges at this stage could include the interfaces, APIs, and applications that can get this data to the end-users in line with their specific needs. For example, data analysts may need the raw data to work with. Business stakeholders may need reports and dashboards. Applications may need custom APIs to pull this data.

It’s important to note the influence of some of the new and emerging technologies that are shaping today’s data ecosystem and its possibilities, for example, cloud computing, machine learning, and big data, to name a few. Thanks to cloud technologies, every enterprise today has access to limitless storage, high-performance computing, open-source technologies, machine-learning technologies, and the latest tools and libraries. Data scientists are creating predictive models by training machine learning algorithms on past data, also big data. Today, we’re dealing with datasets that are so massive and so varied that traditional tools and analysis methods are no longer adequate, paving the way for new tools and techniques and also new knowledge and insights.

Key Players in the Data Ecosystem

It all starts with a data engineer. Data engineers are people who develop and maintain data architectures and make data available for business operations and analysis. Data engineers work within the data ecosystem to extract, integrate, and organize data from disparate sources, Clean transform and prepare data design, store and manage data in the data repositories. They enable data to be accessible in formats and systems that the various business applications, as well as stakeholders like data analysts and data scientists, can utilize. A data engineer must have good knowledge of programming, sound knowledge of systems and technology architectures, and an in-depth understanding of relational databases and non-relational data stores.

Now let’s look at the role of a data analyst. In short, a data analyst translates data and numbers into plain language, so organizations can make decisions, data analysts inspect and clean data for deriving insights, identify correlations, find patterns, and apply statistical methods to analyze and mine data and visualize data to interpret and present the findings of data analysis.

Analysts are the people who answer questions such as, Are the user's search experiences generally good or bad with the search functionality on our site? or What is the popular perception of people regarding our rebranding initiatives? Or is there a correlation between sales, and one product and another? Data analysts require good knowledge of spreadsheets, writing queries, and using statistical tools to create charts and dashboards. Modern data analysts also need to have some programming skills. They also need strong analytical and storytelling skills.

And now let’s look at the role data scientists play in this ecosystem. Data scientists analyze data for actionable insights and build a machine learning or deep learning models that train on past data to create predictive models. Data scientists are people who answer questions such as, How many new social media followers am I likely to get next month, or what percentage of my customers am I likely to lose to competition in the next quarter, or is this financial transaction unusual for this customer? Data scientists require knowledge of mathematics, statistics, and a fair understanding of programming languages, databases, and building data models. They also need to have domain knowledge.

Then we also have business analysts and BI analysts. Business analysts leverage the work of data analysts and data scientists to look at possible implications for their business and the actions they need to take or recommend. BI analysts do the same except. Their focus is on the market forces and external influences that shape their business. They provide business intelligent solutions by organizing and monitoring data on different business functions and exploring that data to extract insights and actionable that improve business performance.

To summarize, in simple terms, data engineering converts raw data into usable data. Data analytics uses this data to generate insights. Data scientists use data analytics and data engineering to predict the future using data from the past, business analysts and business intelligence analysts use these insights and predictions to drive decisions that benefit and grow their business. Interestingly, it’s not uncommon for data professionals to start their career in one of the data roles and transition to another role within the data ecosystem by upskilling.

What is Data Analysis?

Data analysis is the process of gathering, cleaning, analyzing, and mining data, interpreting results and reporting the findings. With data analysis, we find patterns within data and correlations between different data points. And it is through these patterns and correlations that insights are generated, and conclusions are drawn. Data analysis helps businesses understand their past performance and informs their decision-making for future actions. Using data analysis, businesses can validate a course of action before committing to it. Saving valuable time and resources and also ensuring greater success. We will explore four primary types of data analysis, each with a different goal and place in the data analysis process.

Descriptive Analytics helps answer questions about what happened over a given period of time by summarizing past data and presenting the findings to stakeholders. It helps provide essential insights into past events. For example, tracking past performance based on the organization’s key performance indicators or cash flow analysis.

Diagnostic analytics helps answer the question, Why did it happen? It takes the insights from descriptive analytics to dig deeper to find the cause of the outcome. For example, a sudden change in traffic to a website without an obvious cause or an increase in sales in a region where there has been no change in marketing.

Predictive analytics helps answer the question, What will happen next? Historical data and trends are used to predict future outcomes. Some of the areas in which businesses apply predictive analysis are risk assessment and sales forecasts. It’s important to note that the purpose of predictive analytics is not to say what will happen in the future, its objective is to forecast what might happen in the future. All predictions are probabilistic in nature.

Prescriptive Analytics helps answer the question, What should be done about it? By analyzing past decisions and events, the likelihood of different outcomes is estimated on the basis of which a course of action is decided. Self-driving cars are a good example of Prescriptive Analytics. They analyze the environment to make decisions regarding speed, changing lanes, which route to take, etc. Or airlines automatically adjusting ticket prices based on customer demand. Gas prices, the weather, or traffic on connecting routes.

Key steps in the data analysis process

Now let’s look at some of the key steps in any data analysis process.

Data analysis begins with understanding the problem that needs to be solved and the desired outcome that needs to be achieved. Where you are and where you want to end up, need to be clearly defined before the analysis process can begin.

Setting a clear metric is another stage of the process that includes deciding what will be measured and how it will be measured.

The next step would be gathering data. Once you know what you’re going to measure and how you’re going to measure it, you identify the data you require, the data sources you need to pull this data from, and the best tools for the job.

Having gathered the data, the next step is to clean the data, fix quality issues in the data that could affect the accuracy of the analysis. This is a critical step because the accuracy of the analysis can only be ensured if the data is clean. You will clean the data for missing or incomplete values and outliers. For example, customer demographics data in which the age field has a value of 150 is an outlier.

Once the data is clean, you will extract and analyze the data from different perspectives. You may need to manipulate your data in several different ways to understand the trends, identify correlations and find patterns and variations.

After analyzing your data and possibly conducting further research, which can be an iterative loop, it’s time to interpret your results. As you interpret your results, you need to evaluate if your analysis is defendable against objections and if there are any limitations or circumstances under which your analysis may not hold true.

Ultimately, the goal of any analysis is to impact decision making. The ability to communicate and present your findings in clear and impactful ways is as important a part of the data analysis process as is the analysis itself. Reports, dashboards, charts, graphs, maps, case studies are just some of the ways in which you can present your data.

Responsibilities of a Data Analyst

While the role of a Data Analyst varies depending on the type of organization and the extent to which it has adopted data-driven practices, there are some responsibilities that are typical to a Data Analyst role in today’s organizations, these include, acquiring data from primary and secondary data sources, creating queries to extract required data from databases and other data collection systems, filtering, cleaning, standardizing, and reorganizing data in preparation for data analysis, using statistical tools to interpret data sets, using statistical techniques to identify patterns and correlations in data, analyzing patterns in complex data sets and interpreting trends, preparing reports and charts that effectively communicate trends and patterns, creating appropriate documentation to define and demonstrate the steps of the data analysis process.

Valuable skills for a Data Analyst

The data analysis process requires a combination of technical, functional, and soft skills. Let’s first look at some of the technical skills that you need in your role as a Data Analyst. These include:

  1. Expertise in using spreadsheets such as Microsoft Excel or Google Sheets.
  2. Proficiency in statistical analysis and visualization tools and software such as Oracle Visual Analyzer, Microsoft Power BI, SAS, and Tableau, or libraries like matplotlib.
  3. Proficiency in at least one of the programming languages such as R, Python, and in some cases C++, Java, and MATLAB.
  4. Good knowledge of SQL, and ability to work with data in relational and NoSQL databases,
  5. The ability to access and extract data from data repositories such as data marts, data warehouses, data lakes, and data pipelines,
  6. Familiarity with Big Data processing tools such as Hadoop, Hive, and Spark.

Now we’ll look at some of the functional skills that you require for the role of Data Analyst. These include:

  1. Proficiency in Statistics to help you analyze your data, validate your analysis, and identify fallacies and logical errors.
  2. Analytical skills that help you research and interpret data, theorize, and make forecasts.
  3. Problem-solving skills, because ultimately, the end-goal of all data analysis is to solve problems.
  4. Probing skills that are essential for the discovery process, that is, for understanding a problem from the perspective of varied stakeholders and users — because the data analysis process really begins with a clear articulation of the problem statement and desired outcome.
  5. Data Visualization skills help you decide on the techniques and tools that present your findings effectively based on your audience, type of data, context, and end-goal of your analysis.
  6. Project Management skills to manage the process, people, dependencies, and timelines of the initiative.

That brings us to your soft skills as a Data Analyst. Data Analysis is both a science and an art. You can ace the technical and functional expertise, but one of the key differentiators for your success is going to be soft skills. This includes:

  1. Your ability to work collaboratively with business and cross-functional teams.
  2. communicate effectively to report and present your findings;.
  3. tell a compelling and convincing story, and gather support and buy-in for your work.
  4. Above all, being curious is at the heart of data analysis.

Types of Data

Data is unorganized information that is processed to make it meaningful. Generally, data comprises facts, observations, perceptions, numbers, characters, symbols, and images that can be interpreted to derive meaning. One of the ways in which data can be categorized is by its structure. Data can be: Structured; Semi-structured, or Unstructured.

Structured data has a well-defined structure, can be stored in well-defined schemas such as databases, and in many cases can be represented in a tabular manner with rows and columns. Structured data is objective facts and numbers that can be collected, exported, stored, and organized in typical databases. Some of the sources of structured data could include SQL Databases and Online Transaction Processing (or OLTP) Systems that focus on business transactions, Spreadsheets such as Excel and Google Spreadsheets, Online forms, Sensors such as Global Positioning Systems (or GPS), and Radio Frequency Identification (or RFID) tags; and Network and Web server logs. You can typically store structured data in relational or SQL databases. You can also easily examine structured data with standard data analysis methods and tools.

Semi-structured data is data that has some organizational properties but lacks a fixed or rigid schema. Semi-structured data cannot be stored in the form of rows and columns as in databases. It contains tags and elements, or metadata, which is used to group data and organize it in a hierarchy. Some of the sources of semi-structured data could include E-mails, XML, and other markup languages, Binary executables, TCP/IP packets, Zipped files, Integration of data from different sources. XML and JSON allow users to define tags and attributes to store data in a hierarchical form and are used widely to store and exchange semi-structured data.

Unstructured data is data that does not have an easily identifiable structure and, therefore, cannot be organized in a mainstream relational database in the form of rows and columns. It does not follow any particular format, sequence, semantics, or rules. Unstructured data can deal with the heterogeneity of sources and has a variety of business intelligence and analytics applications. Some of the sources of unstructured data could include: Web pages, Social media feeds, Images in varied file formats (such as JPEG, GIF, and PNG), video and audio files, documents and PDF files, PowerPoint presentations, media logs; and surveys. Unstructured data can be stored in files and documents (such as a Word doc) for manual analysis or in NoSQL databases that have their own analysis tools for examining this type of data.

And with that, we have covered the essential basics you would need, to start off with Data Analytics. It’s definitely not enough, so make sure to explore more. We might have further articles on this topic until then take care :)

GitHub

LinkedIn

Twitter

Instagram

--

--