Kickoff the Data Industry ?

Harshita Singh
Towards Data Engineering
3 min readMar 26, 2021

Shunya se shuru karte hai !! (translation: Starting from zero) :)

Image Source : Google Images

“In God we trust; all others must bring data” William Edwards Dimming

So after writing my first article on Medium, one of my friend asked me to mention the pre-requisites section which could help absolute beginners, which was fairly a good point, so I thought of writing it down separately and making it more detailed here.

Target Audience —

  1. Students.
  2. Freshers.
  3. Entry Level People.
  4. Noob Data Person.

Most important thing to understand are the major roles we have in the Data industry —

Image Source : Google

Data Analyst — People who works on analyzing & visualizing the prepared data i.e. dashboarding, reporting, charts etc. on different platforms available(PowerBI, Izenda, Tableau, QlikView,etc).

Data Engineers — People who are majorly responsible for working on the raw data and giving it a proper format/structure(ETL) after doing all the transformations to be used efficiently. Also responsible to build E2E data pipelines.

Data Scientists — People who work on the transformed data to do future analysis and recommendations, who are typically working on statistics, equations to identify the patterns and trends for business requirements. Also responsible for building model pipelines.

Secondly,

Understand the formats of data file we have in the world. Tables are just the beginning you see. Lets see the most used data file formats we have(structured + semi-structured + unstructured) -

Google Images
  1. CSV (Comma Separated Values) — Most easy + important ones to start with.
  2. Plain Text
  3. JSON
  4. Parquet
  5. Avro
  6. ORC (Optimized Row Columnar)

Thirdly,

The tech stack we need to know before we begin working as Data Analyst -

  1. Language — Python(basics)
  2. Query Language — SQL(very important).
  3. Platform — Tableau, PowerBI.
  4. Others — DBMS (strongly recommended), core CS subjects knowledge, DWH concepts (facts & dimensions)

The tech stack(most fundamental) as a Data Engineer -

  1. Language — Python(most important), JAVA.
  2. Query Language — SQL(very important), pandas(fundamentals).
  3. Platform Knowledge — Jupyter Notebook, Sublime Text, VSCode.
  4. Knowledge of using MS Excel(fundamentals).
  5. Others — DBMS concepts(strongly recommended),core CS subjects knowledge, DWH concepts (facts & dimensions)

The tech stack(most fundamental) as a Data Scientist -

  1. Language — Python(most important), Scala(not fundamental)
  2. Query Language — SQL(very important), pandas(fundamentals).
  3. Most important part — Machine Learning + AI + Statistics.
  4. Knowledge of using MS Excel(intermediate level).
  5. Others — DBMS concepts(strongly recommended),core CS subjects knowledge, DWH concepts (facts & dimensions)

All these points are the most basic skillset to begin working in the data industry in any role.

Will soon cover the data engineering track in details in the next story.

Constructive feedbacks are appreciated !!

Keep Reading…. :)

--

--

Harshita Singh
Towards Data Engineering

Understanding Data Everyday || Data Engineer || Graduate Student in CS