Who is a Data Scientist?

Sahiti Kappagantula
Edureka
Published in
8 min readJun 5, 2013

Who is a Data Scientist?

The other day, I read an article on venturebeat.com that revealed how advanced data analytics helped Obama win the 2012 presidential elections! This and more stories like Bank of America benefiting from its data-intensive technologies or Wipro putting in $30 million in a US-based data science firm or Paypal hiring data scientists give a clear reflection that Data Scientist is the sexiest job of the 21stcentury as quoted by Harvard.

After hearing so much about Data Science, let’s get into some basics!

What is Data Science all about?

Some call it a Civil Engineering of data, and others call it a Discipline in itself; after all, what is Data Science all about?

Data Science is a term that came into popularity by EMC2. It is a process of extracting valuable insights from “data”.

Who is a Data Scientist? How to Become a Data Scientist? Edureka

As we are living in the Big Data Era, Data Science is becoming a very promising field to harness and process huge volumes of data generated from various sources. Data Science is a vast discipline in itself, consisting of specialized skill-sets such as statistics, mathematics, programming, computer science, and so on. Data science consists of several elements, techniques, and theories including math, statistics, predictive analysis, data modeling, data engineering, data mining, and visualization.

The discipline of data science hasn’t evolved overnight. In fact, it has been there for years in the form of business analytics or competitive intelligence, but it is now only that its true potential has been realized. The main purpose of Data Science is to extract and interpret data effectively and present it in a simple, non-technical language to the end-users.

Thus, Data Science is all about constructing useful information, thereby, converting it into data-driven products!

Is he/she someone struggling with data all day and night or experimenting in his/her laboratory with complex mathematics? After all, ‘Who is a Data Scientist’?

There are several definitions available on Data Scientists. In simple words, a Data Scientist is one who practices the art of Data Science. The highly popular term of ‘ Data Scientist’ was coined by DJ Patil and Jeff Hammerbacher. Data scientists are those who crack complex data problems with their strong expertise in certain scientific disciplines. They work with several elements related to mathematics, statistics, computer science, etc (though they may not be an expert in all these fields).

Data Scientists are Business Analysts or Data Analysts, with a difference!

Though the initial training or basic requirements are similar for all these disciplines, Data Scientists require:

Whether an agricultural scientist wants to know the percentage increase in the yield of wheat this year as compared to last year’s (and the reasons associated with it) or if a financial company wants to classify its customers based on their creditworthiness (before granting loans) or whether a retail organization wants to reward extra points to its loyal customers, all need data scientists to process a large volume of both structured and unstructured data in order to make crucial business decisions.

The main challenge that today’s Data Scientists face is not to find solutions to the existing business problems but to identify the problems that are most crucial to the organization and its success.

Why Data Scientists are called ‘Data Scientists’?

The term “Data Scientist” has been coined after considering the fact that a Data Scientist draws a lot of information from the scientific fields and applications whether it is statistics or mathematics. They make a lot of use of the latest technologies in finding solutions and reaching conclusions that are crucial for an organization’s growth and development. Data Scientists present the data in a much more useful form as compared to the raw data available to them from structured as well as unstructured forms.

Just like any other scientific discipline, data scientists always need to ask and find answers to the data available to them. They are required to make a clearly defined plan and work towards achieving the results within a limited time, effort, and money.

Three components of Data Science:

1. Organizing the data:
Organizing is where the planning and execution of the physical storage and structure of the data take place after applying the best practices in data handling.

2. Packaging the data:
The packaging
is where the prototypes are created, the statistics is applied and the visualization is developed. It involves logically as well as aesthetically modifying and combining the data in a presentable form.

3. Delivering the data:
Delivering is where the story is narrated and the value is received. It makes sure that the final outcome has been delivered to the concerned people.

What skills does a Data Scientist possess?

Role of a Data Scientist is indeed a challenging one! Though the skill-sets and competencies that Data Scientists employ differ extensively, to be an efficient Data scientist, he should:

  1. Be very innovative and distinctive in his approach in applying various techniques intelligently to extract data and get useful insights in solving business problems and challenges.
  2. Have the ability to locate and construe rich data sources.
  3. Have hands-on experience in Data mining techniques such as graph analysis, pattern detection, decision trees, clustering, or statistical analysis.
  4. Develop operational models, systems, and tools by applying experimental and iterative methods and techniques.
  5. Analyze data from a variety of sources and perspectives and find out hidden insights.
  6. Perform Data Conditioning — that is, converting data into a useful form by applying statistical, mathematical tools, and predictive analysis.
  7. Research, analyze, execute, and present statistical methods to gain practical insights.
  8. Manage large amounts of data even during hardware, software, and bandwidth limitations.
  9. Create visualizations that will help anyone understand the trends in data analysis with ease.
  10. Be a team leader and communicate effectively with other business analysts, Product Managers, and Engineers.

A Data Scientist is like a webmaster, who not only needs to be a jack of all trades but also a master of at least one of the above fields.

So, what does a Data Scientist do?

A data scientist has a dual role — that of an “ Analyst” as well as that of an “ Artist”! Data scientists are very curious, who love a large amount of data, and more than that, they love to play with such huge data to reach important inferences and spot trends! This is what distinguishes a Data Scientist from a traditional Data Analyst. A Data scientist not only refers to one particular source such as a social media site or a log file but various other sources with the aim to find out a hidden insight that can prove to be very significant for the organization. They perform “ what if “ analysis, ask questions and look at the data from different angles and transform the big data into the next big idea!

Conway Diagram:

This is the Conway Venn Diagram on Data Science illustrated by the famous Data Scientist Drew Conway. This diagram presents Data science as a combination of much-in-demand skills such as hacking skills, math skills, and knowledge of statistics including substantive expertise.

Data Science is also an Art!

Data science is not only a science or a technique, it is also an ‘Art’. Data Science is an art of listening to your intuitions while facing a huge amount of data, classifying it, evaluating it, and reaching conclusions. Not everyone is blessed with this art! Data scientists need to be really creative in visualizing the data in various graphical forms and present the highly complex data in a very simple and friendly way! If a data scientist is able to convert terrifying Petabytes of structured as well as unstructured data (images, videos, log files, etc) into a very easy and simple format, he is an — ‘Artist’!

After all, only a skillful Data Scientist can manage McDonald’s Database or videos uploaded on Youtube, or Tesco’s huge volume of data or GE’s Healthcare data or managing the data related to thousands of blood samples of patients at Apollo or unstructured data generated from X-rays!

“The US faces a shortage of 140,000 to 190,000 people “with deep analytical skills, as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.” — Mckinsey Global Institute

As Data Science is an emerging field, there are a plethora of opportunities available worldwide.
Just browse through any of the job portals; you will be taken aback by the number of job openings available for Data scientists in different industries, whether it is IT or healthcare, retail or Government offices or academics, life sciences, oceanography, etc. Venture Capitalists have never shown such excitement in investing money as in the case of data-driven start-ups.

Whether you call them Data scientists or Data Gurus or by some other fancy name, the fundamentals remain the same! The world is in acute need of smart and creative people who can dive deep inside the ocean of Big Data and save the world from ignorance and provide valuable insights into businesses and help the World Economy grow!

With this, we come to the end of this article. If you have any queries regarding this topic, please leave a comment below and we’ll get back to you. If you wish to check out more articles on the market’s most trending technologies like Python, DevOps, Ethical Hacking, then you can refer to Edureka’s official site.

Do look out for other articles in this series which will explain the various other aspects of Data Science.

1.Data Science Tutorial

2.Math And Statistics For Data Science

3.Linear Regression in R

4.Data Science Tutorial

5.Logistic Regression In R

6.Classification Algorithms

7.Random Forest In R

8.Decision Tree in R

9.Introduction To Machine Learning

10.Naive Bayes in R

11.Statistics and Probability

12.How To Create A Perfect Decision Tree?

13.Top 10 Myths Regarding Data Scientists Roles

14.Top 5 Machine Learning Algorithms

15.Data Analyst vs Data Engineer vs Data Scientist

16.Types Of Artificial Intelligence

17.R vs Python

18.Artificial Intelligence vs Machine Learning vs Deep Learning

19.Machine Learning Projects

20.Data Analyst Interview Questions And Answers

21.Data Science And Machine Learning Tools For Non-Programmers

22.Top 10 Machine Learning Frameworks

23.Statistics for Machine Learning

24.Random Forest In R

25.Breadth-First Search Algorithm

26.Linear Discriminant Analysis in R

27.Prerequisites for Machine Learning

28.Interactive WebApps using R Shiny

29.Top 10 Books for Machine Learning

30.Unsupervised Learning

31.10 Best Books for Data Science

32.Supervised Learning

Originally published at https://www.edureka.co on June 5, 2013.

--

--

Sahiti Kappagantula
Edureka

A Data Science and Robotic Process Automation Enthusiast. Technical Writer.