Declassified Secret Docs of Data Science

Aamod Garg
6 min readOct 26, 2018

--

Are you “Sentient” enough?

The danger skull of data science can be pretty intimidating considering the fact that data scientists are usually master’s level students and have significant programming skills. Anyone (including me!) has therefore a high barrier of entry into the massive world of data science.

Now data science is an extremely broad term and should not be taken lightly. But since we are talking in such a grim light, let me lighten your mood by this cute duckling gif.

Now since your mood is aligned, let’s delve into Data Science. We now enter into the mammoth number of terms to wade through and where to begin. Hopefully this will help and create a data structure (stacked venn diagram for the detail-oriented out there) in your minds:

who doesn’t like these ?!?

As you can data science has a lot of things going on for itself, that is why you don’t have data science going for you (don’t worry, neither for me ;-) ). There are many branches we can delve into over here, but let’s concentrate a few in order to get a handle on the data stuff.

The big 3 distinctions are:

  1. Data Analysis
  2. Data Modeling
  3. Data Engineering

Data Analysis

This is the first type and its first only for the reason that is usually where the data is already received and we try to make something of it. This is like having the names and scores of all kids in a class and data analysis would tell us the story of who is the most studious and who flunks the most.

Data Analysis starts with the collection of data. This is the first step in the data science world of complexity. This can be in many many forms, some of them are as follows:

  1. surveys
  2. experiments
  3. records
  4. uploads
  5. manual filing
  6. transcriptions and the list goes on and on and on.

There might be millions of data points and you can store it in your personal computer. But when this crosses into billions and trillions (and usually more than 10 TB), then there needs to some sort of storage infrastructure to handle this gargantuan (like this word a lot, because has my last name garg in it) data. This is where Big Data fits in and that’s a story for another day and another night.

Further, Data Analysis is the part where we need to know our bases. We need to have domain expertise in order to interpret data and convert it into a language a computer understands. This conversion is synonymous to a language translation. When a translator hears French, they listen to it, understand it, convert the meaning to English and then speak out. Same process is for data.

Also, the last part of speaking out in English works for data and can be handled more aesthetically. Translating the data into visualistic format makes the data much more manageable and gives us insight into it more easily than a bunch of words.

Data Modeling

This category is more popularly known by everybody as machine learning. This is carried out by assigning descriptions to the current information, which can further be used to estimate values of the same variety as and when obtained.

Image result for types of machine learning gif
How I see Machine Learning

Machine learning (ML) can be done in a supervised or unsupervised manner. Supervised learning involves labels and we know how the categorisation of the data will be done. This can not be said for the unsupervised ML as the outcome is unknown and the goal is to discover the distribution.

Data modeling/ML is also used in cases where the data we have is very limited or is too complex. In this case, we use custom algorithm development. It is used in cases when one has to come up with a model using assumption and where there are so many variables involved.

Data Engineering

This is the third column in the data science world. Data management is basically the collection of data and how one can create, read, update and delete data. It is all about handling data and seeing where it is stored.

Data Production is the basically what the editing department does in a movie studio. After a actor shots have been done, the editor takes all the scenes and compiles them into a beautiful movie; otherwise the movie would just be a series of clips without cohesiveness. Similarly, after the code has been written, the code is modified so that it can be used by anybody without glitches.

Code Production

And last is the Software Engineering, and as you all know it is very important and is the most sought after job in the world currently; therefore, it is a highly paid position too.

Stereotypes

The Beginner

This “data scientist” knows only the fundamentals of all the three types of data science. They are beginners and can potentially go deeper in any of the types of DS.

Hired for entry level positions.

The Diva

This is the type of data scientist that has proficiency in the three fields but think they are above the task of doing the data mechanics and getting their hands dirty with data formatting.

Nobody hires them.

The Detective

They are focused on finding the right kind of data and drawing the best conclusion based on the data they in front of them. They are good at sniffing out anomalies and masters of data analysis. Therefore, they have to be very perceptive.

Hired in finance and business development.

The Psychic

They are proficient in data modeling and have all the tools in the ML toolkit. They are guestimators. That’s definitely so Raven !

Hired by AI/ML companies.

The Coder

This is your techie from the Silicon Valley and is a master in data engineering and programming. They have the highest salaries today!

Hired by everybody.

The Unicorn

They are supposed to be a master of all things data science.

This is not impossible but highly improbable because it is like having 3 careers stacked on top of another.

Hired by nobody because they don’t exist.

Look for more content on 🎉https://www.aamod.info/🎉

This was just the high-level interpretation of data science. There are many more complexities in play in this vast world of data science.

Visit my LinkedIn to see me as a Product Manager!

This was a kind of a fun take and in no way should be used to demarcate any job titles.

Special thanks to Brandon Rohrer.

Aamod Garg aka me.

--

--