Take a step back in history with the archives of PragPub magazine. The Pragmatic Programmers hope you’ll find that learning about the past can help you make better decisions for the future.

FROM THE ARCHIVES OF PRAGPUB MAGAZINE, FEBRUARY 2019

Does Machine Learning Really Involve Data? Refining Our Terms

PragPub

Published in

The Pragmatic Programmers

4 min readFeb 11, 2022

by Frances Buontempo

⏲ The history of machine learning can teach us a lot about its essential features. They may not be what we think.

Many definitions of machine learning start by proclaiming it uses data to learn. I want to challenge this or remind us where the term originally came from and consider why the meaning has shifted.

For a long time, machine learning seemed to be a new technology, but I notice we’re starting to say AI and machine learning interchangeably. Job postings often sneak the word scientist in there too. What is a data scientist? What do any of these words mean?

Image by mikemacmarketing, CC BY 2.0 via Wikimedia Commons

Current trends often come with an air of mystery. I suspect a lot of data science roles involve data entry, in order to clean input data. Not as appealing as the headline role suggests. Several day-to-day techniques being described as machine learning could also be described as statistics.

In fact, look at the table of contents of a statistics book, such as An Introduction to Statistical Learning. Look at a small selection of the topics:

• accuracy

• k-means clustering

• making predictions

• cross-validation

• support vector machines, SVM

• principal component analysis, PCA

Most, if not all, of these topics are covered in an average machine learning course and included in ML software packages. Yet statistics doesn’t sound as exciting as machine learning, to many people.

Wikipedia defines statistics as “a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation.” No mention of learning, though each of these activities form an essential part of data science. The article goes on to discuss descriptive and inferential statistics. Inference involves making predictions: many people use the term machine learning to mean the very same. Can you spot patterns in purchases automatically and suggest other items a customer might be interested in? Can you detect unusual or anomalous behavior, indicating fraud or similar? Again, these are now labeled as AI or machine learning, but usually rely on well-established statistical techniques. Admittedly, today’s faster machines mean number crunching can happen quickly. This has contributed to the resurgence of machine learning.

Many problem-solving algorithms are not about numbers. Some techniques, such as evolutionary computing, including genetic algorithms, don’t fit comfortably into a data-driven view of learning. Do these methods count as machine learning? I’ll leave that for you to think about. My book explores genetic algorithms and several other areas that do not need numbers to learn.

Arthur Samuel came out with the phrase “machine learning,” by which he meant something along the lines of a “field of study that gives computers the ability to learn without being explicitly programmed.” The abstract of his 1959 paper, “Some studies in machine learning using the game of checkers” states:

Two machine-learning procedures have been investigated in some detail using the game of checkers. Enough work has been done to verify the fact that a computer can be programmed so that it will learn to play a better game of checkers than can be played by the person who wrote the program. Furthermore, it can learn to do this in a remarkably short period of time (8 or 10 hours of machine-playing time) when given only the rules of the game, a sense of direction, and a redundant and incomplete list of parameters which are thought to have something to do with the game, but whose correct signs and relative weights are unknown and unspecified. The principles of machine learning verified by these experiments are, of course, applicable to many other situations.

AI and machine learning are both very old terms. I think they encompass a much broader field than data analysis. As a final thought, Alan Turing designed an algorithm to play chess. In effect, he was trying to make an artificial brain, before the term AI was invented or computers, in their modern sense, existed.

I think machine learning is much broader than investigating data. Its history involves attempting to get computers to learn, and specifically to learn to play games. Let the games continue.

Read my book and see what you think.

Frances Buontempo shares her wisdom in her book with The Pragmatic Bookshelf, Genetic Algorithms and Machine Learning for Programmers:

Genetic Algorithms and Machine Learning for Programmers

Self-driving cars, natural language recognition, and online recommendation engines are all possible thanks to Machine…

pragprog.com

About Frances

Frances Buontempo is the editor of ACCU’s Overload magazine. She has published articles and given talks centered on technology and machine learning. With a Ph.D. in data mining, she has been programming professionally since the 1990s. During her career as a programmer, she has championed unit testing, mentored newer developers, deleted quite a bit of code, and fixed a variety of bugs.

FROM THE ARCHIVES OF PRAGPUB MAGAZINE, FEBRUARY 2019

Does Machine Learning Really Involve Data? Refining Our Terms

Genetic Algorithms and Machine Learning for Programmers

Self-driving cars, natural language recognition, and online recommendation engines are all possible thanks to Machine…

About Frances

Written by PragPub