Data Analyst Ready Reckoner

10XTD
10xtd
Published in
6 min readNov 2, 2021

Curated by Nitesh Mishra

Introduction of the role

Data analysis is a process of inspecting, cleansing, transforming, and modelling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today’s business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.

Data mining is a particular data analysis technique that focuses on statistical modelling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data while CDA focuses on confirming or falsifying existing hypotheses. Predictive analytics focuses on the application of statistical models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a species of unstructured data. All of the above are varieties of data analysis.

Responsibilities

  • Designing and maintaining data systems and databases; this includes fixing coding errors and other data-related problems
  • Mining data from primary and secondary sources, then reorganizing said data in a format that can be easily read by either human or machine
  • Using statistical tools to interpret data sets, paying particular attention to trends and patterns that could be valuable for diagnostic and predictive analytics efforts.
  • Demonstrating the significance of their work in the context of local, national, and global trends that impact both their organization and industry
  • Preparing reports for executive leadership that effectively communicate trends, patterns, and predictions using relevant data
  • Collaborating with programmers, engineers, and organizational leaders to identify opportunities for process improvements, recommend system modifications, and develop policies for data governance
  • Creating appropriate documentation that allows stakeholders to understand the steps of the data analysis process and duplicate or replicate the analysis if necessary

Technology it entails

1) Predictive Analytics

One of the prime tools for businesses to avoid risks in decision making, predictive analytics can help businesses. Predictive analytics hardware and software solutions can be utilized for discovery, evaluation and deployment of predictive scenarios by processing big data. Such data can help companies to be prepared for what is to come and help solve problems by analyzing and understanding them.

2) NoSQL Databases

These databases are utilized for reliable and efficient data management across a scalable number of storage nodes. NoSQL databases store data as relational database tables, JSON docs or key-value pairings.

3) Knowledge Discovery Tools

These are tools that allow businesses to mine big data (structured and unstructured) which is stored on multiple sources. These sources can be different file systems, APIs, DBMS or similar platforms. With search and knowledge discovery tools, businesses can isolate and utilize the information to their benefit.

4) Stream Analytics

Sometimes the data an organization needs to process can be stored on multiple platforms and in multiple formats. Stream analytics software is highly useful for filtering, aggregation, and analysis of such big data. Stream analytics also allows connection to external data sources and their integration into the application flow.

5) In-memory Data Fabric

This technology helps in distribution of large quantities of data across system resources such as Dynamic RAM, Flash Storage or Solid State Storage Drives. Which in turn enables low latency access and processing of big data on the connected nodes.

6) Distributed Storage

A way to counter independent node failures and loss or corruption of big data sources, distributed file stores contain replicated data. Sometimes the data is also replicated for low latency quick access on large computer networks. These are generally non-relational databases.

7) Data Virtualization

It enables applications to retrieve data without implementing technical restrictions such as data formats, the physical location of data, etc. Used by Apache Hadoop and other distributed data stores for real-time or near real-time access to data stored on various platforms, data virtualization is one of the most used big data technologies.

8) Data Integration

A key operational challenge for most organizations handling big data is to process terabytes (or petabytes) of data in a way that can be useful for customer deliverables. Data integration tools allow businesses to streamline data across a number of big data solutions such as Amazon EMR, Apache Hive, Apache Pig, Apache Spark, Hadoop, MapReduce, MongoDB and Couchbase.

9) Data Preprocessing

These software solutions are used for manipulation of data into a format that is consistent and can be used for further analysis. The data preparation tools accelerate the data sharing process by formatting and cleansing unstructured data sets. A limitation of data preprocessing is that all its tasks cannot be automated and require human oversight, which can be tedious and time-consuming.

10) Data Quality

An important parameter for big data processing is the data quality. The data quality software can conduct cleansing and enrichment of large data sets by utilizing parallel processing. This software’s are widely used for getting consistent and reliable outputs from big data processing.

Desired persona

Strong analytical and numerical skills are must for a good data analytics professional. Other than that, one needs to have a thorough understanding of computer software(s) like Querying Language (SQL, Hive, Pig), scripting Language (Python, Matlab), Statistical Language (R, SAS, SPSS), and Excel. Data analytics professionals must also possess good interpretive and problem-solving skills to explain the process of data analysis and its outcome.

Below is a list of some of the many different roles that you may encounter when searching for or considering data analysis.

  • Business analyst: analyzes business specific data.
  • Management reporting: reports data analytics to management on business functions.
  • Corporate strategy analyst: this type of role will focus on analyzing company wide data and advising management on strategy direction. This role may also be focused on mergers and acquisitions.
  • Compensation and benefits analyst: usually part of a human resources department that analyzes employee compensation and benefits data.
  • Budget analyst: focuses on the analysis and reporting of a specified budget.
  • Insurance underwriting analyst: analyzes individual, company, and industry data for decisions on insurance plans.
  • Actuary: analyzes mortality, accident, sickness, disability, and retirement rates to create probability tables, risk forecasting, and liability planning for insurance companies.
  • Sales analytics: focuses on sales data that helps to support, improve, or optimize the sales process.
  • Web analytics: analyzes a dashboard of analytics around a specific page, topic focus, or website comprehensively.
  • Fraud analytics: monitors and analyzes fraud data.
  • Credit analytics: the credit market offers a wide need for analytics and information science in the areas of credit reporting, credit monitoring, lending risk, lending approvals, and lending analysis.
  • Business product analyst: focuses on analyzing the attributes and characteristics of a product as well as responsibility for advising management on the optimal pricing of a product based on market factors.
  • Social media data analyst: social media and growing tech companies rely on data to build, monitor, and advance the technology and offerings that social media platforms rely on.
  • Machine learning analyst: machine learning is a developing technology that involves programming and feeding machines to make cognitive decisions. Machine learning analysts may work on a variety of aspects including data preparation, data feeds, analysis of results, and more.

Guidelines while forming up / reviewing the JD for data engineer

Data analysts perform quantitative testing that converts data into applicable insights and offers data-driven analytical support for critical projects. A data analyst who is well versed in R, should be a data visualization expert and can pull out and utilize that data. The data analyst assessment test helps you in evaluating:

  • Candidates’ expertise in R language
  • Fluency in using data science-based R libraries
  • Understanding of MS SQL Server fundamentals
  • The skills of writing MS SQL database queries
  • R programming skills

Basic assessment

To address these responsibilities, data analysts perform many different tasks. Some examples include:

  • Structured Query Language (SQL)
  • Microsoft Excel
  • Critical Thinking
  • R or Python-Statistical Programming
  • Data Visualization
  • Presentation Skills
  • Machine Learning

These skillsets should lead to the following:

  • Identify the data you want to analyze
  • Collect the data
  • Clean the data in preparation for analysis
  • Analyze the data
  • Interpret the results of the analysis

Sources:
1. https://www.mastersindatascience.org/careers/data-analyst/

2. https://targetjobs.co.uk/careers-advice/job-descriptions/454089-data-analyst-job-description#:~:text=A%20data%20analyst%20is%20someone,to%20extract%20the%20data%20needed

3. https://www.rasmussen.edu/degrees/technology/blog/what-does-a-data-analyst-do/

4. https://towardsdatascience.com/10-key-technologies-that-enable-big-data-analytics-for-businesses-d82703891e2f#:~:text=Data%20integration%20tools%20allow%20businesses,%2C%20MapReduce%2C%20MongoDB%20and%20Couchbase

--

--

10XTD
10xtd
Editor for

10XTD is amongst the Growing Networks of Experts & Partners in Digital with an aim to become the World’s Most Trusted “Digital on Demand” Platform