The Scouts of Oompa Kicchu: Part 5.3

Objective: Explain input to AI for lay persons

5 min readMar 26, 2018

Before the advent of deep learning, there were shallow neural networks. For now, let’s just say that shallow neural networks were a lot less sophisticated.

Continuing with the allegory from Part 5.2, this part explains the evolution from perceptrons and the difference between 2 forms of input to AI : structured data and unstructured data.

The allegory is carefully mapped to neural networks, and a good way to develop your intuition without using partial differential equations and probability.

Once Upon a Time: Part 5.2

Objective: AI before neural networks for lay persons

medium.com

The Tower of Wisdom

Given the daily threat of beasts from the forest, the tribe needed a better (more flexible) way to classify threats than the Council of Perceptrons.

If 1 council wasn’t working, why not try 2 councils of elders, they wondered?
A simple idea, and it worked.
The independence of each council as well as their collaboration enabled greater flexibility and more minds to learn and classify beasts and their threatening characteristics.

And so, the Tower of Wisdom was founded. Each council had its own floor. Unfortunately, they couldn’t get more than a few councils to collaborate effectively. For some reason, the ability to learn distinguishing characteristics diminished if they added additional floors.

With just a few floors, there came a major limitation. Given the wide range of beasts and their subtle characteristics indicating threat, there were way too many snapshots for the low tower to develop an accurate intuition and remain flexible. The council decided to change the services of the courageous scouts from taking snapshots to conducting a deeper study of the beasts.

It would take another invention to add more floors to the tower. That invention led to the founding of the High Tower of Propagated Wisdom. That story is for later.

The Scouts

The indomitable scouts were instructed to venture deep into the forest to identify lurking beasts, grade their distinguishing features, run back to the council, and read out their reports. These reports contained a lot less information than snapshots, enabling the 2 storey high Tower to learn and classify flexibly.

Some of the distinguishing features were:

Beast type (from 1 to 500),
Sharpness of teeth (from 1–5),
Length of tail (from 1–10),
penchant for eating people (0 or 1),
likelihood of rampaging village (0–10).

The Tower of Wisdom improved accuracies of classifying beastly threats over perceptrons. However, the scouts’ observations emerged from a process that was more art than science. This trial and error proved costly. Too many clues indicating a threat were missed.

Sadly, many scout lives were lost from misidentification and misclassifications.

Allegory Unveiled

Connecting the dots

Tower of Wisdom = Neural Networks. Deep Learning improved upon neural networks.
Council member = A neuron. Neurons represent patterns that glean unique features from the input. Patterns are captured in weights (Part 4.1)
Councils/Floors of the Tower = Layers of Neurons. Each layer collaborates with the next layer. Non-linear collaboration is the key difference from perceptrons. Non-linearity was discussed in Part 4.2.
Scouts = Product managers and data scientists. Data is the new oil. Product managers with domain expertise are ideally responsible for hauling the oil to their company. This means identifying content sources and annotating features. Together with data scientists, they refine it before feeding it to AI.
Forest = Domain of Interest (beastly threats)
Features of Beasts = Structured Data
Snapshot = Unstructured Data

Feature Engineering

Prior to deep learning, only structured data with human-scrutinized, hand-designed features could be classified by neural networks.

Domain Expert Identifying Features of the input

Scan this real estate example.

Structured data: Residential Listing’s features

For example, the exterior paint color would be an index into a list of 100 possible colors. Could the color of the house affect price? The AI would uncover a relationship if the data contained it.

Data scientists spend an enormous amount of time circling back to domain experts to identify features, improve and augment third-party datasets, filter out wrongly identified or incomplete records, and normalize those features for input into the learning system.

This is called feature engineering, or more colloquially, ‘massaging the data’.

Costly Adventures into the Forest

Typical A.I. specialists, including both Ph.D.s fresh out of school and people with less education and just a few years of experience, can be paid from $300,000 to $500,000 a year or more in salary and company stock, according to nine people who work for major tech companies or have entertained job offers from them. — source: NYT

For many companies, data massaging is one expensive operation, and more of an art than a science.

With the advent of modern deep learning, classifying unstructured data became more viable. Data scientists provide raw data (like images, text, or audio) with minor tweaks. The machine does a much better job at identifying the distinguishing features automagically as they pertain to the classification.

For images (i.e. for machines to see what people see), the input would be an array of numbers representing color intensities.
If nasal scents could be digitized, I have no doubt that neural networks could learn the sense of smell too.

Feature engineering is still common where structured data is involved, such as predicting weather using numeric measurements from say, atmospheric pressure sensors, though satellite imagery would be used ‘raw’.

We just discussed the input provided to a neural network that has already been trained. In the next part (coming soon), we will visit the High Tower of Propagated Wisdom where we will learn more about layered neurons and how the input becomes the output, i.e. the prediction.