A beginner's guide to Data Science.

Manish Sahu
AI Space
Published in
3 min readJul 28, 2020
Image Source - https://unsplash.com/

Why Data Science?

We live in exciting, even fast revolutionary times. As our daily activity changing from physical to the digital world and resulting we generate tremendous amounts of data. Actually we are surrounded by data. Website track’s every user every click, smartphones keep track of location and speed, smart cars collect driving habit, smart home collects living habits. Companies collect purchasing habits, Netflix recommends the movie based upon user history of watching movies, etc. Sensors and machines collect, store, and process information about the environment around us. Hence, in simple terms, Data Science is the art of collecting the data then perform some data wrangling and transformation and finally use models to predict the behavior of the new instance. A data scientist is a person having the ability to leverage the existing data sources and create new ones in order to extract meaningful and actionable insights.

Skills needed for Data Science.

There are loads of data science libraries, frameworks, modules, and toolkit that efficiently implement the most data science techniques. Most common libraries used are pandas, NumPy, seaborn, matplotlib, sci-kit learn, some of the frameworks include Keras and TensorFlow, and more. To become a good data scientist we have to be good in the following area.

● Business domain knowledge — Business domain knowledge helps data scientists with feature engineering in order to extract or make new features based upon the existing features.

● Web scraping to collect the data — Every data scientist must know web scraping techniques as there is no data science without data.

● Statistics and probability — statistics and probability helps data scientists to better understand the pattern and behavior of data.

● Programming knowledge — Programming is the backbone of data science. Without programming knowledge, you can’t implement the theory part. Hence, It’s a must-know skills for every data scientist.

● Machine Learning — Machine learning is a collection of algorithms that helps to make sense out of data. We generally deal with structured data in Machine Learning. Humans are not good at understanding structured data.

● Deep Learning — Deep learning is a part of Machine Learning where we generally deal with unstructured data. Humans are quite good at understanding unstructured data. Deep learning generally deals with huge amounts of data.

● Natural Language Processing — Natural Language Processing helps to deal with text data and make sense out of it. Like- Google uses NLP in text translation.

● Structured Query Language(SQL) — Structure Query helps to format the data for preprocessing.

● Model deployment — Used in production.

● Data Storytelling — Storytelling is more kind of Art that every data scientist must know.

Although data science is not limited to these skills and there is no limit to becoming good at it. The more you learn more you become good at it.

Real world Applications of Data Science.

To understand how this technology works one must understand the data science application and deliverables. In data science space we usually solve the real-world problems that exists around us. Some of the application areas are as follows.

● Customer sentiment on a particular product or service, either it would be positive, negative, or neutral that helps companies to improve their product or service and make more revenue.

● To know your customer well, i.e. it helps to build the existing customer base.

● It helps to automate the process and decision-making.

● Movie and videos recommendation based on user behaviors.

● It helps in sales forecasting.

● Anomaly detection.

● Traffic control.

● Pattern finding.

Apart from the above, there are endless applications of data science. The data science process may vary depending upon the end requirements.

Data Science Pipelines.

Generally, we follow the traditional path to build an end to end data science pipelines.

  1. Problem definition
  2. Hypothesis generation.
  3. Data Collection and storage.
  4. Exploratory data analysis (EDA).
  5. Data cleaning and preprocessing.
  6. Model building.
  7. Model optimization.
  8. Fine-tuning of the model.
  9. Model deployment.

Data science can have a major impact on any business's success. Hence, a good data scientist can help to grow business.

Thanks for reading.

--

--

Manish Sahu
AI Space
Editor for

Data Science Practitioner | Data Storyteller