Data science and AI: 10 terms you need to know

Is your data big enough to be called “big data”? Are you confused about how artificial intelligence and machine learning relate to each other?

We’ve collected some definitions to help you navigate through the maze of today’s tech buzzwords.


Data is at the very heart of the 4th industrial revolution happening today. Data is defined in the Cambridge Dictionary as “information, especially facts or numbers, collected to be examined and considered and used to help decision-making, or information in an electronic form that can be stored and used by a computer.” Data is at the core of most of today’s businesses and it is the most essential component of data science and artificial intelligence.

Data and information security

Data security, also known as information security, refers to the protective measures that are applied to protect digital data from accidental or malicious corruption, destruction and unauthorised access. Encryption (where digital data is turned into a secret code), authentication (where data is accessible only through passwords of biometric data) and data masking (which makes a part of the data invisible to unauthorised viewers, e.g. covering certain digits of a credit card number) are a few examples of data security tools. Data security measurements can be applied to individual devices, databases, websites and entire organisations. Data breaches, where data is accessed by unauthorised individuals or groups have been plaguing databases around the world, causing a loss of consumer trust and they come at a great expense to the groups and companies affected.

Big data

Datasets too large to analyse with traditional software tools on a single computer? We think that is a simple and fair definition. After all, computers and data are both increasing in power, capacity and volume in line with Moore’s law.

Arguably, real “big data” is determined by more than just the amount you have. The volume, variety and velocity all count. Volume can be measured in terabytes (one terabyte is one million million bytes), exabytes (one exabyte is one quintillion bytes) or yottabytes (one yottabyte is one septillion bytes.) Variety refers to the number of types of data and velocity to the speed of data processing. Occasionally additional V’s are also listed in the criteria: these are veracity, variability, validity, vulnerability, volatility, etc.

There is no one specific threshold required for data to be classified as “big”. One thing we know for sure is data just keeps on growing: the trend seems to be that the data available in the world is doubling every 2-3 years.


Data science

Data science, yet to be formally professionalised, is rapidly emerging as an interdisciplinary work area in which all activities associated to creating, handling interpreting connecting and communicating insights from data are addressed. Doing data science ‘right’ is also an essential component of artificial intelligence. Data is after all the fuel source feeding the machines and algorithms (defined procedures that allow a computer to solve a problem or complete a task) which increasingly make decisions all around us. Using the best data, showing where that data comes from and the confidence associated to it’s accuracy and reliability are all key and just one part of any data science ethics checklist.

Artificial intelligence (AI)

According to Andrew Moore, the Dean of Computer Science at Carnegie Mellon University, “Artificial intelligence is the science and engineering of making computers behave in ways that, until recently, we thought required human intelligence.”

Only a few decades ago tasks such as document spell checking, basic calculating or positioning oneself on the map would have required human brain power or actions, and a machine performing these tasks would have been sci-fi level artificial intelligence. No one these days would call them that now. Going forward, things we perceive as AI now might possibly be viewed as basic tasks for future computers and machines.

Artificial intelligence solutions are all around us these days, from “smart devices” regulating lights, temperature and home entertainment to algorithm based solutions matching supply to demand in near real time — think, Uber, Lyft, TaskRabbit and other consumer platforms.

Augmented Intelligence

Augmented intelligence, sometimes referred to as intelligence augmentation or intelligence amplification is an alternative conceptualisation of artificial intelligence. Augmented intelligence is not technically different from artificial intelligence, the term’s advocates argue that machines and technology are not to replace humans, rather they can work in concert to increase our potential and capabilities. Augmented intelligence denotes a much more positive and optimistic view of how modern technology is and will continue to affect humanity. A great example of augmented intelligence being applied is in the legal profession. Machines can mine and analyse tex, contract clauses and seek correlations between them — at a pace no human can match! Instead of an army of paralegals carrying out such work lawyers can focus on advice and consulting with their clients informed by evidence generated by machines.

AI has largely been perceived as a threat to the legal profession. In fact, AI and analytics are helping attorneys become much more knowledgeable, efficient and productive than ever before. That said, we believe the industry will move away from the term, embracing “augmented intelligence” instead. More than just semantics, the shift reinforces the idea that technology exists to help legal professionals perform complex, data-intensive work more efficiently, not replace them.

Machine learning

Machine learning, a subset technique in the field of artificial intelligence, helps create software that can change and improve its performance without the need for humans to explain to it how to accomplish tasks. The goal of machine learning algorithms is to develop programmes that access data and use that data to learn further. Machine learning requires a very large set of data, but at the same time it is able to analyse large data sets going forward. Both the quality and the quantity of the data are crucial during the learning process. Machine learning algorithms are currently widely used in many industries including in the medical field in diagnostics and in finance to make predictions of spending patterns and market movements.

Deep learning (or deep structured learning)

Deep learning, a subset of machine learning, is basically a crude imitation of how the human brain works. A machine processes and trains itself to process large data — e.g. images, sound samples, written text, etc. The inputs are categorised based on previous experience. For example, it can determine that the fed picture contains the face of a certain person, or that the small sound sample was the word “Hello.” The larger the data set fed into the algorithm, the more accurate the results will be. Image classification and facial recognition in photo apps are good examples for deep learning. Modern photo apps now recognise friends, family members and previous locations to allow quick searches and easy filing.

Computer vision (CV)

Computer vision is a multidisciplinary subfield of artificial intelligence and machine learning.

A field of computer science, computer vision’s goal is to “see”, identify and classify or process images in a way similar to how the human eye and brain perform this task.

An example for computer vision is Fujitsu’s new Judging Support System that makes it possible for computers to evaluate and score gymnasts’ routines in real time, without any human input during the process.

Natural language processing (NLP) and natural language generation (NLG)

Natural language processing combines aspects of computer science, linguistics and artificial intelligence. Its main objective is to have a computer, machine, or IoT (Internet of Things) device understand and interpret human language and turn it into data in order to perform a particular task. The input can be spoken language or text. IBM’s Translator Program in the 1960’s was one of the first significant uses of NLP, today’s everyday examples for voice input include Amazon’s Alexa, Apple’s Siri and Google Home where humans can command devices to follow spoken commands.

Natural language generation has been often regarded as a subset of NLP, as it turns the process around while using the same components, but it has now developed into a discipline in its own right. Machines can convert data into written or spoken speech. Current uses include weather forecasting systems that convert weather data into written weather predictions, machine translation, that can happen between machines or people and machines (e.g. a written engine check warning in a car), chatbots and text authoring or summarisation.

If you really want to learn a little more about the tools, techniques, methods and opportunities for data science and AI to impact your life and your organisation why not get a little deeper and book on one of our online courses we offer in collaboration with the Southampton Data Science Academy? Reference “infoNation10datascienceandAIterms” to receive a 20% discount off the course booking fee — a £300 saving! Click here to get in touch.

Bogi Szalacsi is a Senior Associate with infoNation, based in London. You can contact her at and follow her on Twitter: @infoNation5.