Process of Data Analysis

apurv jain
3 min readJun 2, 2020

--

For more subscribe to my blog @naivedatascientist.co.in

Analysis refers to breaking a whole into its separate components for individual examination. Data analysis is a process for obtaining raw data and converting it into information useful for decision-making by users. Data is collected and analyzed to answer questions, test hypotheses or disprove theories.

Data requirements.

The data is necessary as input to the analysis, which is specified based on the requirements of the one who is directing the analysis or for the customers(the end users of the product). The data required for analysis is based on a question or an experiment. Data may be numerical or categorical (i.e., a text label for numbers).

Data Collection.

Data Collection is the process of gathering information on targeted variables identified as data requirements. Data Collection ensures that data gathered is accurate such that the related decisions are valid. Data is collected from a variety of sources. The requirements may be communicated by analysts to keepers of the data, such as information technology personnel within an organization.

Data Processing.

The data collected must be processed or organized for analysis. This includes structuring the data into the required format for the relevant Analysis Tools. For instance, these may involve organizing the data into rows and columns in a table format (i.e., structured data) for further analysis.

Data cleaning

Once processed and organised, the data may be incomplete, contain duplicates or errors. Data Cleaning is the process of preventing and correcting these errors. It includes identifying inaccuracy of data, overall quality of existing data, removing the duplicate entries and column segmentation. Quantitative data methods can be used for outliers and null values treatment that would be subsequently excluded in analysis.

Exploratory data analysis

Once the data is cleaned, it can be analyzed. Analysts may use a variety of methods and techniques attributed as exploratory data analysis (E.D.A) to understand the patterns and messages contained in the data. Data Visualization may also be used to examine the data in graphical format, to obtain additional insight regarding the patterns within the data. Descriptive statistics such as the average or median, may be generated to help understand the data.

Modeling and algorithms.

Mathematical formulas or processes called algorithms may be applied to the data to identify relationships among the variables, such as correlation or causation . Generally the models may be developed to evaluate a particular variable(target variable) in the data based on other variables(independent or predictor variables) in the data, with some residual error depending on model accuracy.

Communication.

After the analysis, the results are to be reported in a format as required by the users to support their decisions and further action. The feedback from the users might result in additional analysis and enhancement of analysis. The analyst may consider data visualization techniques to help clearly and efficiently communicate the message to the audience, such as tables and charts, which help in communicating the message clearly and efficiently to the users

Data product.

A data product is a computer application that takes data inputs and generates outputs, feeding them back into the environment. It may be based on a model or algorithm.

--

--

apurv jain

Graduated as a Marine Engineer from one of India’s most prestigious Engineering College (under IIT-JEE), having inclination toward Quantitative and Statistical