Baby Steps into Data Science 01 — Introduction

Editor — Ishmael Njie & Sulayman Saleem

DataRegressed Team
DataRegressed
4 min readMar 5, 2018

--

During this series, we will aim to identify the pathway into the field of Data Science.

First of all, what exactly is Data Science?

Data Science is said to be a field that combines and unifies various concepts in statistics, data analysis and computer science, with the objective of understanding and interpreting data for the purpose of making decisions. Basically, on the surface, being able to come to a conclusion about a given data set and make profitable decisions. Various computational techniques that come under Data Science to solve such problems include: Machine Learning, Natural Language Processing, Deep Learning and Big Data.

There are many concepts that are considered to be part of the idea of Data Science, but in this series, we will outline the ‘main’ concepts that we feel will give you an initial idea of what constitutes as a notion in Data Science. We will touch on the following aspects:

Mathematics — With a bit of Statistics, can we use Maths to form any assumptions about our data?

Programming — An illustration of why programming languages are essential in the field.

Big Data — What is ‘Big Data’, and how can it be effective in a Data Science project?

Data Visualisation — How can we represent our Data in a way that is clear and simplistic?

Machine Learning — The real driving force behind Data Science. What can a Machine Learning algorithm actually tell us?

With an understanding of these concepts, we hope that your understanding of the field of Data Science will become clearer and possibly help you decide whether or not to pursuing a career in Data Science/Data Analytics.

What is our problem?

Businesses all have objectives to achieve through the data at their disposal.

There are many applications of Data Science concepts, some include:

  • Recommending a movie for you to watch on Netflix.
  • Forecasting company profits.
  • The price of a house can be predicted as it is measured against features such as: number of rooms, square footage etc.
  • Suggesting a song to add to your Spotify playlist.
  • Analysing product reviews.

If you look at the list above, most of those listed can be categorised as a prediction problem, ie. Predicting certain movies a user may like, predicting a song you may like based off of the songs on your current playlist. In the upcoming chapters, the algorithms behind building such systems will be covered; algorithms such as Regression, K-Means Clustering etc.

Now, in terms of tackling the problem, a good place to start would be to obtain some data. The UCI Machine Learning repository and Kaggle are great places to find all sorts of datasets. KDNuggets has a lot of suggestions also.

In regards to the workplace, companies will more than likely have databases where data is stored and that can be retrieved when needed. The retrieval of data will be completed through the help of programming languages like SQL. We will go into the utilisation of programming languages in an upcoming chapter.

Have a Go

At the end of every chapter, there will be a tip/idea presented to enhance your profile/skills.

For the first “Have a Go” session, there will be an introduction to Kaggle, ‘The Home of Data Science & Machine Learning’.

Kaggle, acquired by Google in 2017, is a platform where competitions take place, demonstrating the notion of predictive analysis.

Kaggle Homepage

It is a site for people of all experiences, Beginner to Expert. Datasets are shared by various parties on everything from health data and science to video games and website analytics. A great part of Kaggle are the Kernels. Kernels are scripts of code. So you can view the scripts of other users pertaining to the relevant dataset and view their results. This can help you to understand the processes that each user took to complete a specified task. It is a great place to learn new techniques and tools that can be applied to other analytical problems.

This is also a great place to start building a portfolio of work that you can show off. You can start off by learning from others and then share your own work for people to view and learn.

The following chapters will go into specific topics surrounding Data Science and how each topic assists in solving a Data Science project.

Thank you for reading and keep an eye out for the next chapter!

--

--