Artificial Subconscious: Part 4.2

Sandeep Jain
6 min readFeb 17, 2018

--

Framing the complexity of AI patterns for the layperson

Massive Scale of Neural Networks

AI in the Real World

In Part 4.1, you saw a simple function with 2 weights.

f(x) = 2x — 5, or more generally, f(x) = wx+b

Neural networks are a form of AI that can approximate any function, and they are currently the rage in the industry. In the real world, there are many orders of magnitude of complexity to every aspect of the simple, linear function above. In simple terms, there are a whole lot more functions (f), weights (w), inputs (x), and outputs (y) involved. But the basic concepts from Part 4.1 remain the same.

The fact remains that patterns are stored in weights such as 2 and -5, strewn across all these sub-functions. Some neural networks store their intuition in 155 million weights in total. HUGE. One great power of neural networks is that they form rich gists of millions of data records into a very high dimensional function that is beyond the level of human perception. Instead of a straight line, a N-dimensional hyper-surface carves a path through the data, like a snowboarder on fresh snow. It can take weeks to train such neural networks, in spite of hardware advances.

Neural networks use ‘gradient descent’ to discover the function governing the system

Once the learning is complete, these weights are portable. They can be easily transferred from a server farm of a gazillion machines to a smartphone. Applying these weights to make predictions is much much faster that learning them.

Sense of Scale (optional read)

Review this to get a sense of scale.

Linear sub-functions inside blocks inside non-linear sub-functions
  1. There are 1000s of linear sub-functions, like f(x) from Part 4.1, each with a large number of weights and a bias (like 2 and -5). The sub-functions are chained together to approximate the overall function governing the system.
  2. Each linear sub-function can take 1000s of inputs (like x), like 1600 pixel valuesof a 40x40 digital photo, and there can be millions of photos in the training set.
  3. The linear sub-functions are organized into 100s of blocks, with many sub-parts called layers.
  4. Between each block, there is typically a non-linear sub-function g(z) that takes as input, the output of the previous block.
  5. For the final output of the governing function, there can be 100s of output values (like the rich or poor neighborhoods in part 4.1)- one for each classification. Like, who’s face is it amongst the 10000 employees.
  6. All of these sub-aspects of a neural network combine to make the overarching function that governs the system such as computer vision for detecting objects in a driverless car.

The non-linear function, g(z), is particularly important for giving neural networks the power to use a curve to more accurately separate the data, rather than a rigid, unchanging line. Curves handle gray areas with more flexibility. g(z) is formally called the ‘activation function’.

Segue

We have just learnt what a machine learns. The next step is to understand how it learns. To give context to how a machine learns, let’s compare:

  • AI software vs traditional software, and
  • supervised vs unsupervised learning

AI vs Traditional Software

Traditional software and Post-Training AI

AI software handles previously unseen data (for its specialized task). Example: it can recognize a handwritten 7 which it has never been trained on with 0.2% accuracy.

In traditional software, any unforeseen input will immediately result in a system fault.

AI software can learn a function governing a system for which a human being cannot prescribe all the rules and logic — because there are too many, or gray area of exceptions is too large.

All traditional software engineering involves implementation of a function governing a system to handle input and produce output. When implementing a function, the human being determines the weights (e.g. 2 and -5 from the function in part 4.1). The human engineer foresees all the possible input (x), and hand-designs weights that represent the rules and logic of the system. Then, given any input (x), the machine applies weights to produce the output (y).

Training Data (x, y) .. x is the image; y is the identity of the object in the image
Data scientist

AI software engineering is exactly the opposite. Most of the effort is on training the machine. During training, the human, data scientist provides the data (x, y), and the machine figures out the weights (2 and -5). After training, the machine is deterministic. For each input (x), the machine will always output the same prediction(y).

The number of software statements required to implement the AI portion of a solution is usually not more than 100–200 lines, as a guesstimate. Traditional software has many more statements involved to implement the rules and logic.

Unsupervised Learning

These two words, supervised and unsupervised learning, are part of some jargon that you might as well know, in case you get caught off guard in a dinner party conversation. Why you would keep such company in the first place, is beyond me. :)

A couple has their first child. After 6–8 months, they decide to have a party to introduce her to their closest friends and family.

So, imagine the baby sitting on the high chair, looking across the living room at all her guests.

Baby at her first pahty

After a little while of getting used to the crowd, she starts noticing that there are some visitors who have a feminine form like mommy, and some that are more masculine, like daddy. This is a baby. She knows not gender. But, using similarities and dissimilarities, ON HER OWN, she can CLASSIFY people into “like mommy” and “like daddy”. And, then, she finds some outliers (teenagers, other babies, and toddlers). All of this is called Unsupervised Learning.

In AI, this means that during training, the machine receives lots of input, but no output. So, it cannot learn from its mistakes….since there is no answer key. Such a training set is called “unlabeled data”. The machine groups the input by similarities and creates gists/model that describe ‘belonging’ in the various groups. After the training, given a previously unseen data point, the AI can quickly identify if it belongs in the normal groups. An outlier (anomaly at a distance from the gists) might suggest a problem, such as a hack in the Internet of Things.

Supervised Learning

Now, the younger brother of the father comes up to the baby. He is just so happy, playful and all cheers to meet his niece. The niece is also totally delighted to meet her uncle.

Happy Baby

She gets these familiar feelings from her uncle, and says …. “Daada”. Her mother says… “No, no. That’s not your dada. This other person is your daada.” So, the baby looks back and forth between her father and her uncle to learn the difference. The mother plays the supervisory role to provide the answer key. This is called Supervised Learning.

Neural networks approximate functions. A function by definition has an output. During training, the machine learns from making mistakes…the training set has an answer key for each record. For the real estate example in Part 4.1, the training set would contain (price, lot size) as the input(x) and (neighborhood) as the output (y). Such a training set is called “labeled data”. The training set is labeled with the neighborhood.

Learning from mistakes is the foundation of neural networks. We will deep dive into supervised learning for neural networks in the next few articles.

--

--