Data Science for Business -Part 1: 5 basic terms

Zahid Parvez
Convergence Tech
Published in
5 min readMay 1, 2020

These days artificial intelligence, machine learning, and data science (or anything related to ‘data’) seem to be thrown around a lot. It is crucial to be able to work cooperatively and communicate with data personal to ensure you get the best outcome.

The chances are there is a probably a data team in your organisation, if not there most definitely is one in your industry. Let’s start at the root, so what is data anyway?

Data, Flicker @justgrimes

Data

In the most basic form, data is any unit of information. In fact, by reading this article, you are producing data. Anything and everything that happened, is happening, or will happen will create some form of data. For example:

  • You just read this word = data
  • Turtles have a shell = data
  • The sun will rise tomorrow = data

Data does not need to be captured, stored, or observed to exist. However, We can’t do much with unobserved data so its essential for us to capture it in some form. You are reading this article and storing it in your brain, and you will, at some point, recall what you have learnt and use it. Similarly, data has to be observed and stored for use in data science.

Data capture can be manual, i.e. by a person typing it into a computer, or be automated, i.e. taking a photo on your phone. In all of these instances, the data is stored in a storage device (i.e. your phone’s storage). The stored data can be analysed, but it is not easy. To take full advantage of data, it must be stored in a database.

An example of data captured using a mobile phone

Database

You probably heard the term “database” long before you heard of ‘data science’ or “artificial intelligence”. That’s because databases existed long before computers were a thing. Have you ever heard of a library?

While a library does not meet the official oxford definition of being “stored in a computer”, it is “an organised set of data”. A computer database is something that stores an organised set of data.

Databases generally fall into 2 categories, relational or non-relational (sorry graph databases). A database is relational when there is a schema or design that specify the structures of tables, and the relationships between tables, that is stored in the database. Data that doesn’t conform to the design cannot be stored in the database without changing the schema first. Non-relational databases do not require schemas or tables to store data.

Portion of the ‘Invoice Data” table in the Northwind Database

The caveat here is that spreadsheets like Microsoft Excel files, or CSV files, are not databases however they work similarly to a database (i.e. tables that are stored outside a database). These are often used in ad-hoc or day to day data science tasks as they are a lot more convenient than setting up a database.

Data Science

Data science refers to a concept that uses the scientific method, mathematics, and computer science to draw knowledge from data.

Example

For a month, you record the time you left home and the time you arrived at work daily. At the end of the month, you use mathematics to figure out the correlation between the time you leave home and the time you get to work. You can now use this knowledge to lower your commute time. You have just done data science — congratulations!

The term ‘data science’ is often used interchangeably with ‘data mining’ or ‘data analysis’ as they are similar concepts.

Someone that practices data science is known as a ‘data scientist’ (sometimes data analysts).

Artificial intelligence

Artificial intelligence (AI) refers to a field of study that attempts to mimic human intelligence using machines (like a computer). Computers are very dumb things; computers cannot do anything a human did not design it to do. In fact, what people perceive as AI is merely the result of mathematical expressions that a human has designed to mimic intelligence.

There is a range of methods to allow machines to mimic intelligence. Data scientists are primarily focused on using a technique called machine learning to apply AI. Machine learning (ML) uses algorithms that change its output based on previously observed data. These algorithms are trained using historical data and then used to make predictions about the future based on current data.

One category of ML algorithms mimic how neurons in our brain behave when presented with data; these algorithms are known as artificial neural networks. These algorithms organise artificial neurons in layers where the first layer reads historical data, and the last layer makes predictions. Algorithms that are organised in more than one layer are known as deep neural networks; this technique is called deep learning.

Using an artificial neural network to find players in a RoboCup match

Data literacy

Data literacy refers to the ability to work with, analyse, and draw knowledge from data. It’s the same thing as literacy with a focus on data. By reading this article, you became a little bit more data literate. By clapping this article and following me, you will open yourself to becoming more data literate in the future.

for business” is a series where I will explore complex data and computer science-related concepts and explain it in everyday terms. Please follow me to get notified when the next post is up!

--

--

Zahid Parvez
Convergence Tech

I am an analyst with a passion for data, software, and integration. In my free time, I also like to dabble in design, photography, and philosophy.