Everything you need to know about key differences between AI, Data Science, Machine Learning and Big Data

Shemmy Majewski
DLabs.AI
Published in
12 min readAug 29, 2018

All around the world, robot imports have increased from around 100,000 in 2000 to roughly 250,000. International Data Corporation expects AI spendings to accelerate, reaching 230$ B in 2021 attaining a Compound Annual Growth Rate (CAGR) of 22,8%.

Wouldn’t it be a waste not to have a slice of that cake for yourself?

You came here to find out something about AI — you’ve heard it’s efficient and helps companies make them run better… or literally resurrect their corpses from the brink of collapse. The thing is, the vast and complicated terminology concerning the topic may be a bit overwhelming (to say the least).Truth be told, the fields overlap a lot…. but they are not interchangeable.

“How could I know, if I should hire someone dealing with machine learning, or AI? Or perhaps I need a data science specialist?

There is no single, magical fix of solving problems and optimizing your business.

You need an expertise of someone who knows how to work around those issues. Someone who has the knowledge and skill to use data-driven solutions, understands the differences between different fields and apply them when needed.

And in this article we’ll do everything we can, to give you an insight into those.

WHAT IS AI?

AI is changing the world as we know it, there’s no denying it. But how does it work? When people ask the team at DLabs about what do we do exactly, they probably imagine us constructing and bringing to life semi-conscious, silicon-based life forms straight from Westworld.

But the reality is completely different.

Artificial intelligence is by far the oldest and the most widely recognized technical term referring to robotics and automation. In short, it refers to the simulation of a human brain function by machines. It relies on creating artificial neural networks mimicking logical reasoning, learning and self-correction.

Artificial Intelligence is all about action and decision making based on available data. Whether it’s self-driving cars, examining medical samples or calculating investment risks — AI is doing tasks previously done by humans but faster and with reduced error rate.

The fields of AI.

Artificial intelligence solutions are not limited to IT only.

According to a recent Deloitte survey, 83 percent of the most aggressive adopters of AI and cognitive technologies said their companies have already achieved either moderate (53%) or substantial (30%) benefits.

The spectrum of AI application is so vast, it’s quickly becoming one of the most influential, digital fields in the world. The spectrum can include, but is not limited to:

  • Planning and decision making,
  • Machine learning,
  • Multi-agent systems,
  • Swarm intelligence, bio-inspired artificial systems,
  • Optical Character Recognition (OCR),
  • Automation of routine decisions,
  • Knowledge representation and reasoning, semantic web,
  • Data mining and information retrieval,
  • Computational neuroscience,
  • Robotics,
  • Human-computer interfaces.

Did you know? Consumers love AI solutions. According to PointSource’s study, one-third of shoppers will spend more money online if they are exposed to tactically-deployed artificial intelligence algorithms. 49% of them claim they are willing to shop more frequently when they are presented with data-driven solutions that are able to suggest and assist them during decision-making processes.

WHAT IS DATA SCIENCE?

Data Science is an interdisciplinary field focusing on processes that derive knowledge and patterns from existing data. It’s an umbrella term, covering a wide range of technologies, such as SQL, Hadoop, statistical analysis, data visualization, dashboards or distributed architecture.

In layman terms, Data Science is a general concept of analyzing business-oriented data, finding meaning and focusing on effective communication. There’s a reason why it is often said that a good data scientist needs to be a mathematician, an IT specialist and a businessman at the same time.

“All right, but what’s the difference between data science and AI? Aren’t they one and the same thing?”

Not necessarily.

Think of it this way: Artificial Intelligence is a tool that helps Data Science get results and solutions for specific problems.

Look at the infographic below:

Data Science uses AI algorithms and statistical data to establish a method of effective work patterns. It extracts valuable data insights from available information and helps with making business-focused decisions.

Did you know? According to IBM, demand for data scientists will increase by 28% by the end of 2020. It means that those businesses that will be able to train their workers and analysts in data science courses will manage to quickly overcome their competition.

3 most influential Data Science applications:

The goal here is to debunk the notion, that data science is some type of obscure black magic, unreachable for anyone below PhD in IT. We’ll give you concrete examples of how it is applied in the real world.

  1. Recommender systems

If you have ever wondered how “Netflix” is able to suggest new movies or series based on what you have already watched, it’s the recommender systems that do all the heavy lifting.

They consist of subclass of information filtering programs that reduce unnecessary “noise” and provide you only with the most appealing options. The filtered data can range from products on e-commencing sites, search engines to dating matches helping you with finding your perfect half.

Recommender systems are more advanced than search algorithms. They offer way more intelligent approach to information filtering, by introducing users to items they might not have otherwise discovered. Usually, they focus on two different systems:

  • Collaborative filtering: it considers user’s or item’s previous behavior; item-item and user-user.
  • Content-based filtering: gives suggestions based on specialized characteristics.

2. Credit scoring

You may not be aware of this, but whenever you apply for a credit card or a bank loan, it triggers a set of decision management rules evaluating how likely you are to repay debts in the future.

These models capture innovative factors and relationships that traditional loan scorecards are unable to achieve. For example, they are able to determine monthly cash flows or if any friends or family members would endorse the applicant.

3. Dynamic pricing

Imagine booking a plane ticket for your next flight. You’re about to finalize the transaction, but something distracts you, so you end up making a few important phone calls, instead. When you finally come back to get the ticket you find its price…..almost doubled! That’s a bummer. Welcome to your first lesson on dynamic pricing.

Business all over the world use data science software to model rates of supply, competitor pricing, demand, or other seemingly unpredictable patterns, such as weather or time. Dynamic pricing has its use in many number of fields in order to maximize expected revenue. Most of the strategies focus on linear models and classification trees that estimate the right (be it highest or lowest) price that consumers are willing to pay for a specific product or service.

WHAT IS MACHINE LEARNING?

Machine learning is a field of AI that is currently driving its development forward.

If you type “machine learning” into Google, there’s a high chance you’ll open a true Pandora’s box of random bits of information scattered across academic papers, YouTube guides, subreddits and sci-fi forums. Finding something trustworthy in this mess will be a true miracle of its own. That’s why in the last part of this article we’ll try to give the most accurate info on the subject you can find.

There are dozens of different definitions on machine learning. Dr. Yoshua Bengio from the University of Montreal, the “godfather” of modern AI studies, claims that:

“Machine learning is a part of research on artificial intelligence, seeking to provide knowledge to computers through data, observations and interacting with the world. The acquired knowledge allows computers to correctly generalize to new settings”.

In short: THE ULTIMATE GOAL of Machine learning is to reach beyond available pieces of training information and interpret data that it has never encountered before.

Machine learning concepts.

The world of ML algorithms is quickly expanding, with hundreds of new strings of code designed every day. According to Mckinsey Global Institute, the total annual external investment in Machine learning reached 5 to 7 billion dollars in 2016.

ML algorithms are usually grouped by either learning style (i.e. supervised, unsupervised, or semi-supervised) or by similarity in form and function (for example, classification, regression, clustering, and decision trees, deep learning and so on).

3 steps of creating an ML algorithm:

Regardless of the style or function, the components of Machine Learning algorithms consist of:

  • Representation — aims to choose a model and data input that are understandable by the computer.
  • Evaluation — aims to choose metrics that would validate the model both internally and externally.
  • Optimization — aims to find the best settings for the model, so it can produce the most significant output.

The algorithms need to be trained in order to function autonomously. They combine several examples of training data and are able to identify subtle correlations between variables. The infographic below represents the Machine learning design process.

Did you know? Contrary to popular belief, the 1st step is the most crucial. Incorrect identification of specific data sets will lead to designing ineffective algorithms.

Applications of Machine learning

Forbes states that Amazon’s current ML algorithm has decreased the “click-to-ship” time by 225%. The system helped with developing same-day shipping options.

The need for ML solutions is becoming more and more apparent. Read the following list to learn more about numerous ML applications:

  1. Data security

We don’t need to convince anyone that malware is a growing problem. Kaspersky published statistics about computer security problems in the 1st quarter of 2018. According to the study, malware designed to steal money through online access to bank accounts were logged on the computers of 204,448 users.

Deep Instincts, an institutional intelligence company, says that different versions of malware codes are basically the same. Their Machine learning system is able to differentiate the disparities and locate which files are potentially dangerous with great accuracy.

2. Financial trading

ML algorithms are getting close to predict how stock markets might behave any given day. Numerous trading companies are utilizing proprietary systems to execute trades in short periods of times and in high volume. Although many of them are mere probabilities, even a trade with low probability but at a high enough volume, might turn into a huge profit.

3. Marketing personalization

The core of any marketing endeavor is to understand your customers and target the right types. There is a 100% chance you’ve personally become a victim of such strategies: after browsing through an online store and visiting a product’s page — and not buying it — you come across an ad of the EXACT same product few days later somewhere on the web or on Facebook. This is only one of many uses of marketing personalization and ML algorithms in action.

Modern businesses are able to personalize and decide which emails customers receive, what they see specifically and what kind of promotions they are offered — all leading potential buyers toward the end of a sales funnel.

4. Natural Language Processing (NLP)

It’s a type of machine learning algorithm that analyzes the human language. NLP operates on a series of coded grammar functions, incorporating statistical ML solutions in order to determine the context of what someone said.

SEOs specialists can utilize NLP via decoding texts to pull keywords associated with a certain product. The language systems are also used by Google’s Assistant and Amazon’s Alexa, making their way in the home of private consumers and big corporations.

WHAT IS BIG DATA?

The term gained a lot of notion back in the early 2000s, but today its meaning became somewhat unclear and interchangeable with AI, or Data Science.

So what exactly is Big data?

As the name suggests, Big data refers to the process of collecting and analyzing large volumes of data sets to discover useful hidden patterns. The information may involve customer choices or market trends that can help business make informed and customer oriented decisions.

Big data can be characterized by 3 “Vs”:

  • Extreme volume of data,
  • Wide variety of data types,
  • Velocity at which big data must be processed.

The need of processing large chunks of data is growing rapidly. According to Forbes, by 2020 the word’s business data universe will grow from 4.4 zettabytes to 44 zettabytes. If that doesn’t blow your mind, here’s another piece of news: we’ll create 1.7 megabytes of information every second for every human being on the planet.

How big is a zettabyte? To give you a better understanding of the scope of data we are talking about, here’s an analogy for you.

Let’s say you can measure data by grains of rice, where 1 grain represents 1 byte.

Seems like a lot of information that can be harvested and put to use, doesn’t it?

Types of database formats that can be analyzed by “Big Data”:

  • Unstructured data — social networks, blogs, tweets, emails, internet traffic, digital images, audio/video feeds, mobile data, sensor data, web pages and many more.
  • Semi-structured — XML files, system log files, text files.
  • Structured data — transaction data, spreadsheets, OLTP, RDBMS databases and other structured data formats.

“All that sounds really similar to Data Science, or even AI. Is there any difference between those three?”

  • AI focuses on mimicking decision making processes.
  • Data science combines various methods and data of diverse volumes in order to derive useful, mostly business-oriented, insights through both structural and predictive analysis.
  • Big data doesn’t analyze, but focuses on processing (with high velocity) extreme volumes and a wide variety of data types.

In short: data science intersects with big data.

Applications of Big Data:

According to Gartner’s Survey from 2015, more than 75% of companies are investing or planning to invest in big data in the next few years.

It’s important to note that when discussing various application of Big Data, we’re talking about data processing. Big Data’s function is to prepare the gathered data for methods based on Data Science, AI and Machine Learning and process them based on specific tasks and notions.

Here are some of the most impactful “connections” and data processing methods concerning Big Data:

  1. Banking and securities

A study conducted by STAC (Security Technology Analyst Center) about top 10 investment and retail banks shows that challenges in the banking industry include: fraud and security warnings, card fraud detection, trade visibility, IT operation analytics, archival of audit information and much more.

Numerous banks and financial enterprises are already using big data to monitor financial market activity. Security Exchange Commission monitores and scans for illegal trading activities in the financial markets by employing natural language processors.

The industry relies on big data when it comes to risk analytics, such as demand enterprise management, fraud mitigation or anti-money laundering. Furthermore, retail traders, banks and hedge funds use the data in high frequency trading, sentiment measurement, or pre-trade decision-support analytics.

2. Healthcare providers

Generally speaking, healthcare databases are riddled with errors and plagued by failures. The inefficient systems make it difficult to link data that can show patterns useful in the medical field.

Some hospitals are using big data collected from phone apps of millions of patients. It allows doctors to utilize evidence-based medicine instead of wasting time by administering random and expensive medical/lab tests.

University of Florida uses free public health data and Google maps to create visual data and track the spread of chronic diseases. The systems allows faster communication and efficient analysis of healthcare information.

3. Insurance

According to a research conducted my Marketforce, 82% of questioned underwriters claim that insurers who will fail to capture the potential of Big Data will become uncompetitive.

Lack of personalized services and the lack of targeted services to new segments are some of the greatest insurance challenges.

Big data provides companies with customer insights. It’s able to analyze and predict users’ behavior patterns based on their social media accounts, or GPS tracking, busting customer retention.

All of the 4 terms, AI, Data Science, Machine Learning and Big Data are interlinked, but not interchangeable. Hopefully this article helped you with differentiating them from one another and gave insight into numerous ways they are influencing the world around you.

--

--

Shemmy Majewski
DLabs.AI

Business. Technology. Life and such. Opinions expressed here are my own. Founder at DLabs.ai — an R&D software house focused on AI.