Series: Artificial Intelligence

Fastai Course Chapter 1 Q&A on Mac

An answer key for the questionnaire at the end of the chapter

David Littlefield

Published in

Mac O’Clock

21 min readMar 8, 2021

The 1st chapter of the textbook provides a broad overview of artificial intelligence. It covers some prerequisites, applications, milestones, terminology, and mechanics of the subject. It also demonstrates the code that’s used for loading datasets, training models, and making predictions.

We’ve spent many weeks writing the questionnaires. And the reason for that, is because we tried to think about what we wanted you to take away from each chapter. So if you read the questionnaire first, you can find out which things we think you should know before you move on, so please make sure to do the questionnaire before you move onto the next chapter.
— Jeremy Howard, Fast.ai

1. Do you need these for deep learning?

Lots of math?

No, advanced mathematics aren’t needed to get started with deep learning. It gets handled in the background by most deep learning libraries. It also only becomes necessary for fine-tuning models which does require working with linear algebra, multivariable calculus, and probability and statistics.

Lots of data?

No, small datasets can produce similar results as large datasets when the data is high-quality. It depends on whether the data is accurate, complete, relevant, valid, timely, and consistent. It also helps to use semi-supervised learning, data augmentation, fine-tuning, and differential learning rates.

Lots of expensive computers?

No, it doesn’t take an expensive computer to get started with deep learning. It can be learned for free using cloud services such as Gradient, Colab, and Kernels which provides access to high-performance graphics cards. It also might have delays and varying performance due to fluctuating availability.

A PhD?

No, a PhD isn’t required to learn and apply deep learning. It trains students to be productive as scholars and researchers in an academic environment. It also focuses largely on improving the theory and or tools that are used for deep learning which does require a strong understanding of mathematics.

2. Name five areas where deep learning is now the best in the world:

Natural Language Processing (NLP) is a branch of artificial intelligence that’s concerned with giving computers the ability to understand text and spoken words. It combines computational linguistics with statistical, machine learning, and deep learning models. It also enables computers to process human language in the form of text or voice data and to understand its full meaning, complete with the intent and sentiment of the speaker or writer.

Computer Vision is a branch of artificial intelligence that’s concerned with giving computers the ability to derive meaningful information from digital images, videos, and other visual inputs. It enables computers to extract, analyze, and understand the useful information from an image or sequence of images. It also enables computers to use that information to automate tasks such as taking actions and making decisions or recommendations.

Image Generation is a branch of artificial intelligence that’s concerned with giving computers that ability to create new material such as images, music, speech, and text. It enables computers to use separate neural networks that compete with each other to create new material and detect its authenticity. It also continuously learns to make better material and better detections until the new material passes as authentic which produces the final output.

Medicine is a branch of artificial intelligence that’s concerned with giving computers the ability to analyze, understand, and present complex medical data. It enables computers to assist with diagnosis processes, treatment protocol development, drug development, precision medicine, and patient monitoring and care. It also enables computers to analyze large amounts of data from electronic health records for disease prevention and diagnosis.

Biology is a branch of artificial intelligence that’s concerned with giving computers the ability to survey, analyze, and classify biological data. It enables computers to assist with cellular image classification, genomic medicine, biomarker development, and drug discovery. It also enables computers to use comprehensive biological information about a person to predict and diagnose diseases and identify or develop the best therapies.

3. What was the name of the first device that was based on the principle of the artificial neuron?

The Mark I Perceptron is a device that’s considered the first artificial neural network ever created. It was developed in 1957 by Frank Rosenblatt for the purpose of image classification. It was also physically made up of an array of sensory units that were randomly connected to association units whose weights were recorded in potentiometers and adjusted by electronic motors.

4. Based on the book of the same name, what are the requirements for parallel distributed processing (PDP)?

Parallel Distributed Processing (PDP) is a model that’s used in Psychology to explain cognitive processes. It proposes that cognitive processes arise from interactions between a large number of neurons via synaptic connections. It also proposes that the knowledge that governs processing is stored in the strengths of the connections and is acquired gradually through experience.

A Set of Processing Units
A State of Activation
An Output Function for each unit
A Pattern of Connectivity among units
A Propagation Rule for propagating patterns of activities through the network of connectivities
An Activation Rule for combining the inputs impinging on a unit with the current state of that unit to produce a new level of activation for the unit
A Learning Rule whereby patterns of connectivity are modified by experience
An Environment within which the system must operate

5. What were the two theoretical misunderstandings that held back the field of neural networks?

The first theoretical misunderstanding was that neural networks couldn’t classify non-linear patterns. This was based on the book, “Perceptrons,” which proved that a single layer perceptron couldn’t classify non-linear patterns. It also speculated that a multilayer perceptron might overcome this limitation but it was still mistaken as evidence against neural networks.

The second theoretical misunderstanding was that neural networks were too big and slow to be useful. This was based on their reputation of being hard to work with and underperforming compared to the newer two layer perceptrons. It also occurred after multilayer perceptrons had been proven effective but before it was known that neural networks were limited due to the insufficient datasets, computing power, and algorithms at that time.

6. What is a GPU?

The Graphics Processing Unit (GPU) is a specialized processor that’s used to accelerate graphics rendering. It can handle thousands of single tasks like displaying three-dimensional environments in computer games. It can also run neural networks hundreds of times faster than central processing units.

7. Open a notebook and execute a cell containing: `1+1`. What happens?

Answer: Prints the result of the equation

8. Follow through each cell of the stripped version of the notebook for this chapter. Before executing each cell, guess what will happen.

The stripped version of this notebook contains executable code without the text from the textbook. It can be viewed in the appendix of the article which contains screenshots that display the code and results separately. It can also be found in the 01_intro.ipynb file that’s located in the clean subdirectory.

View screenshots

9. Complete the Jupyter Notebook online appendix.

Jupyter Notebook Online Appendix is a notebook file that’s used in Fast.ai to introduce students to Jupyter Notebook. It has explanations, instructions, and screenshots that show how to use the user interface, markdown format, keyboard shortcuts, and code capabilities. It also has two versions that are stored as app_jupyter.ipynb in the root directory and clean subdirectory.

10. Why is it hard to use a traditional computer program to recognize images in a photo?

Developing a traditional computer program to recognize images in a photo would be very hard. It would require identifying every step that’s involved in the process and translating the steps into code. It also isn’t clear how the human brain recognizes images which makes it harder to identify the steps.

11. What did Samuel mean by “weight assignment”?

Weight Assignment is a term that Arthur Samuel used to describe the values that are currently assigned to the weights in an artificial neural network. It can be considered another kind of input because the values affect the model and determine how it operates. It also produces a measurable performance which can be automatically tested and improved by adjusting the values.

12. What term do we normally use in deep learning for what Samuel called “weights”?

The Model Parameter is a variable that’s used in machine learning to store a weight or bias that the model uses to make predictions. It contains a value that’s automatically estimated with an optimization algorithm during the training process and saved as part of the trained model. It also defines how well the model performs on unseen data which is dependent on its dataset.

13. Draw a picture that summarizes Samuel’s view of a machine learning model.

14. Why is it hard to understand why a deep learning model makes a particular prediction?

Deep learning models can have hundreds and thousands of layers which makes it hard to identify which factors are important in determining the final output. It can also have up to thousands and even billions of model parameters that interact with each other which makes it even harder to understand why a deep learning model makes a particular prediction.

15. What is the name of the theorem that shows that a neural network can solve any mathematical problem to any level of accuracy?

Universal Approximation Theorem is a theory that implies an artificial neural network can represent a wide range of interesting functions when given the appropriate weights. It states that a feedforward neural network with one hidden layer that contains a sufficient but finite number of neurons should be able to approximate any continuous function to a reasonable accuracy.

16. What do you need in order to train a model?

Training is a process in machine learning that’s used to build a model that can make accurate predictions on unseen data. It involves an architecture, dataset, hyperparameters, loss function, and optimizer. It also involves splitting the dataset into training, validation, and testing data, making predictions about the data, calculating the loss, and updating the weights.

17. How could a feedback loop impact the rollout of a predictive policing model?

The predictive policing model is created using historical crime data which represents the crimes that were documented rather than all the crimes were committed. It contains biases from the historical crime data which heavily influences predictions about where to focus policing activity. It also leads to more arrests in those areas which perpetuates the positive feedback loop.

18. Do we always have to use 224×224-pixel images with the cat recognition model?

No, but there are limits to how much the image dimensions can be changed. It produces an error if the image is too small which occurs when the neural network runs out of data while reducing the dimensions during the forward propagation. It can also fail to obtain reasonable accuracy if the image is too large which occurs when there isn’t enough layers to learn distinct filters.

19. What is the difference between classification and regression?

Classification is a subcategory of supervised learning that’s used to classify unseen data as one of two or more categories. It applies classification algorithms to the training dataset to identify the shared characteristics of each category. It also compares the characteristics to the unseen data and predicts the probability that the data belongs to each of the categories.

Regression is a subcategory of supervised learning that’s used to predict a continuous value. It applies regression algorithms to the training dataset to identify the line or curve that best represents the relationship between the variables. It also predicts how the dependent variable changes in relation to an independent variable when other independent variables are held fixed.

20. What is a validation set? What is a test set? Why do we need them?

Validation Set is a dataset that’s used in machine learning to provide an unbiased evaluation of the model and tune the model hyperparameters. It contains around 10–20% of the total dataset that’s used for determining whether the model is correctly identifying new data or overfitting. It also includes labels and annotations that are used for supervised learning.

Testing Set is a dataset that’s used in machine learning to provide a final unbiased evaluation of the model after it’s been fully trained. It contains around 10–20% of the total dataset that’s used for further determining whether the model is correctly identifying new data or overfitting. It also includes labels and annotations that are used for supervised learning.

The training process needs the validation set to prevent overfitting. It tests the model’s performance after each epoch to identify where performance drops from overfitting and to tune the hyperparameters to further improve performance. It also needs the test set to test the model’s performance on unseen data to provide an estimate of how it will perform in production.

21. What will fastai do if you don’t provide a validation set?

It automatically creates a validation set using 20% of the total training data.

22. Can we always use a random sample for a validation set? Why or why not?

No, time series forecasting doesn’t work well with random samples. It needs the data to be split into different time periods where most the data is used as historical data for training and only the recent data is used as future data for validation. It also leads to overly optimistic results that generalize poorly because random sampling uses past and future data for the validation set.

23. What is overfitting? Provide an example.

Overfitting is an error in machine learning that occurs when the model performs well on the training data but doesn’t generalize well on unseen data. It can be the result of overtraining, lack of validation, or improper validation, weight adjustments, and optimization attempts. It can also be the result of using training data that contains meaningless information.

24. What is a metric? How does it differ from “loss”?

The Evaluation Metric is a function that’s used in machine learning to test the performance of the model. It can measure the accuracy, precision, and recall of models with balanced datasets, or the Area Under the Curve and Receiver Operating Characteristics of models with imbalanced datasets. It can also return misleading results which is why multiple metrics are used.

The Loss Function is a function that’s used in machine learning to evaluate how well an algorithm performs on the given data. It calculates the loss of each training iteration which measures the mathematical distance between the predicted value and the actual value. It also gets used to calculate the gradient during the training process which is needed to update the weights.

25. How can pretrained models help?

The Pretrained Model is a model that’s used in machine learning to perform a specific task. It has been trained with a large dataset which contains the weights and biases that represent the features of the dataset. It can also be retrained to perform a similar task using transfer learning which produces a model with greater accuracy that requires less data, time, and resources.

26. What is the “head” of a model?

The Head is an analogy in machine learning that’s used to represent the output layer of an artificial neural network. It interprets the model as a backbone plus a head where the backbone refers to the architecture of the model and the head refers to the final layer of the architecture. It also interprets transfer learning as replacing the head of a pretrained backbone.

27. What kinds of features do the early layers of a CNN find? How about the later layers?

The earlier layers of the convolutional neural network are used to detect the low-level features in an image. It learns to detect edges and colors in the first layer which becomes the building blocks to detect textures made from combinations of edges and colors in the second layer. It also continues to learn how to detect more sophisticated features with each additional layer.

The later layers of the convolutional neural network are used to detect the high-level features in an image. It learns to detect sophisticated patterns that resemble textures found in objects such as eyes, ears, and noses. It also eventually learns how to detect objects such as humans and animals which becomes the building blocks to detect the specific objects from the dataset.

28. Are image models only useful for photos?

No, non-image data can be classified with image models to achieve high accuracy as long as the data is transformed into an image. It can involve plotting data by placing similar features together and dissimilar features further apart which groups the neighboring features. It can also uncover hidden patterns and or relationships between sets of features in the data.

29. What is an “architecture”?

The Architecture is a template that’s used in machine learning to build neural networks. It defines the number, size, and type of layers that are used in the neural network which represents the mathematical function that’s used to train the model. It can also represent any type of neural network for supervised, unsupervised, hybrid, or reinforcement learning.

30. What is segmentation?

Image Segmentation is a process in machine learning that’s used to partition an image into distinct regions that contain pixels with similar attributes. It can locate the objects in an image and color-code each pixel that represents a particular object. It can also color-code each pixel that belongs to a certain class or color-code each instance of an object that belongs to the same class.

31. What is `y_range` used for? When do we need it?

The Y Range is a parameter that’s used in Fast.ai to instruct the framework to predict numerical values instead of categorical values. It limits the values that are predicted for the dependent variable when performing regression which normalizes the data. It also manually specifies the maximum and minimum values which forces the model to output values within that range.

32. What are “hyperparameters”?

The Hyperparameter is a variable that’s used in machine learning to tune the model to make accurate predictions. It sets a value for parameters like learning rate, number of epochs, hidden layers, and activation functions which controls the training process. It also must be set manually before the training begins and significantly impacts the performance of the model.

33. What’s the best way to avoid failures when using AI in an organization?

The best way to avoid failure when introducing artificial intelligence into an organization is to understand and use validation and test sets. It can greatly reduce the risk of failure by setting aside some data that’s separate from the data that’s given to the external vendor or service provider. It also lets the organization evaluate the true performance of the model with the test data.

“Hopefully, this article helped you get the 👯‍♀️🏆👯‍♀️, remember to subscribe to get more content 🏅”

Next Steps:

This article is part of a series that helps you set up everything you need to complete the Fast.ai course from start to finish. It contains guides that provide answers to the questionnaire at the end of each chapter from the textbook. It also contains guides that walk through the code step-by-step using definitions of terms and commands, instructions, and screenshots.

Mac:
01. Install the Fastai Requirements
02. Fastai Course Chapter 1 Q&A
03. Fastai Course Chapter 1
04. Fastai Course Chapter 2 Q&A
05. Fastai Course Chapter 2
06. Fastai Course Chapter 3 Q&A
07. Fastai Course Chapter 3
08. Fastai Course Chapter 4 Q&A

Additional Resources:

This article is part of a series that helps you set up everything you need to start using artificial intelligence, machine learning, and deep learning. It contains expanded guides that provide definitions of terms and commands to help you learn what’s happening. It also contains condensed guides that provide instructions and screenshots to help you get the outcome faster.

Linux:
01. Install and Manage Multiple Python Versions
02. Install the NVIDIA CUDA Driver, Toolkit, cuDNN, and TensorRT
03. Install the Jupyter Notebook Server
04. Install Virtual Environments in Jupyter Notebook
05. Install the Python Environment for AI and Machine LearningWSL2:
01. Install Windows Subsystem for Linux 2
02. Install and Manage Multiple Python Versions
03. Install the NVIDIA CUDA Driver, Toolkit, cuDNN, and TensorRT 
04. Install the Jupyter Notebook Server
05. Install Virtual Environments in Jupyter Notebook
06. Install the Python Environment for AI and Machine Learning
07. Install Ubuntu Desktop With a Graphical User Interface (Bonus)Windows 10:
01. Install and Manage Multiple Python Versions
02. Install the NVIDIA CUDA Driver, Toolkit, cuDNN, and TensorRT
03. Install the Jupyter Notebook Server
04. Install Virtual Environments in Jupyter Notebook
05. Install the Python Environment for AI and Machine LearningMacOS:
01. Install and Manage Multiple Python Versions
02. Install the Jupyter Notebook Server
03. Install Virtual Environments in Jupyter Notebook
04. Install the Python Environment for AI and Machine Learning

Glossary:

Deep Learning (DL) is a subcategory of machine learning that uses special algorithms to learn how to perform a specific task with increasing accuracy. It has four learning methods which includes supervised, semi-supervised, unsupervised, and reinforcement learning. It also produces models based on an artificial neural network that contains two or more hidden layers.
[Return]

The Artificial Neural Network (ANN) is a machine learning algorithm that’s used to imitate the way the human brain processes information. It contains interconnected nodes that are organized into layers which include an input layer, zero or more hidden layers, and an output layer. It also processes data by sending it through the layers using inputs, weights, biases, and outputs.
[Return]

Linear Algebra is a subfield of mathematics that’s concerned with vectors, matrices, and linear transforms. It works with datasets and performs data preprocessing, data transformation, dimensionality reduction, and model evaluation. It has also been used heavily in fields such as machine learning, deep learning, natural language processing, and recommender systems.
[Return]

Multivariable Calculus is an extension of calculus that’s concerned with derivatives, integrals, and gradients. It works with multiple variables and determines the rate of change, area under a curve, and slope of a function. It has also been used heavily in machine learning for optimization which improves the performance of a model by minimizing the loss function.
[Return]

Probability is a subfield of mathematics that’s concerned with numerical descriptions of how likely an event will occur and how likely a proposition is true. It involves quantifying, managing, and harnessing uncertainty. It has also been used in machine learning for pattern recognition, model training, algorithm creation, hyperparameter optimization, and model evaluation.
[Return]

Statistics is a field that’s concerned with summarizing data and drawing conclusions from samples of data. It works with large amounts of data and involves data collection, analysis, interpretation, and presentation. It has also been used in machine learning for developing models that understand and quantify their own prediction accuracy on unseen data in the future.
[Return]

Semi-Supervised Learning is a category of machine learning algorithms that uses small amounts of labeled data and large amounts of unlabeled data to teach itself to perform a task better than it could’ve otherwise. It produces a model from the labeled data which it uses to predict the unlabeled data. It then uses the labeled and predicted data to produce a more accurate model.
[Return]

Data Augmentation is a technique that’s used in machine learning to artificially increase the size of a training dataset by creating modified versions of the images in the dataset. It can involve flipping, rotating, scaling, padding, cropping, translating, and transforming images. It can also help prevent overfitting when training machine learning models.
[Return]

Fine-Tuning is a technique that’s used in transfer learning to reduce training time and increase accuracy. It reuses parts of a previously trained model to perform a different but similar task by removing its final layer and replacing it with a new layer to make predictions. It can also use a smaller dataset to train the entire neural network, the last few layers, or the final layer alone.
[Return]

Differential Learning Rates is a technique that’s used in transfer learning to reduce training time and increase accuracy. It splits the network into groups of layers and sets different learning rates for each group. It also trains the initial layers using the lowest learning rate and gradually increases the rate in later layers based on how similar the dataset is to the pre-trained model.
[Return]

Gradient is a cloud service that’s hosted by Paperspace that lets developers develop, train, and deploy their deep learning models at scale. It uses a specialized version of Jupyter Notebook that comes preinstalled with deep learning libraries, cloud storage, and version control management. It also has free graphics cards that can run notebooks for up to 12 hours at a time.
[Return]

Colaboratory (Colab) is a cloud service that’s hosted by Google that allows developers to write and execute Python code through their web browser. It uses a specialized version of Jupyter Notebook that comes preinstalled with deep learning libraries, cloud storage, and GitHub integration. It also has free graphics cards that can run notebooks for up to 12 hours at a time.
[Return]

Kernels is a cloud service that’s hosted by Kaggle that allows developers to compete in competitions and participate in its data science community. It uses a specialized version of Jupyter Notebook that comes preinstalled with deep learning libraries, cloud storage, and datasets. It also has free graphics cards that can run notebooks for up to 6 hours at a time or 30 hours a week.
[Return]

Jupyter Notebook is a program that’s used to create, modify, and distribute notebooks that contain code, equations, visualizations, and narrative text. It provides an interactive coding environment that runs in the web browser. It also has become a preferred tool for machine learning and data science.
[Return]

The Weight is a random number that’s used in machine learning to identify how much influence a feature has on the prediction. It gets multiplied by the inputs to transform data that’s passed between two nodes in the hidden layers of the artificial neural network. It also gets updated and optimized during the training process to increase the accuracy of future predictions.
[Return]

The Dataset is a collection of data that’s used in machine learning to create a model that performs a specialized task with very high accuracy. It usually contains annotated text, audio, images, or videos that are split into training, validation, and test datasets during the training process. It also determines the performance of the model based on the quality and quantity of its data.
[Return]

The Optimizer is a function that’s used in machine learning to update the model parameters to minimize the loss function. It calculates the gradient which determines whether the weights can be increased or decreased to further reduce the loss function. It also links the model parameters and loss function by adjusting the weights in response to the output of the function.
[Return]

Training Data is a dataset that’s used in machine learning to fit the model parameters for making accurate predictions. It contains around 70–80% of the total dataset that’s used exhaustively to build the model and gets used across multiple training cycles to improve the accuracy of the model. It can also include labels and annotations that are used for supervised learning.
[Return]

Predictive Policing is a model that’s used in machine learning to predict potential future crimes. It analyzes historical crime data to help decide where to deploy police and identify individuals who are more likely to commit or be a victim of crime. It also raises civil rights and civil liberties concerns about reinforcing racial biases in the criminal justice system.
[Return]

The Positive Feedback Loop is a circumstance in machine learning where the predictions produce outcomes that reinforce the same predictions. It can affect biases by creating more data that skews those biases even further. It can also affect decision-making systems that may even reshape populations in ways that are more detrimental to the historically disadvantaged groups.
[Return]

Random Sampling is a sampling technique that’s used in machine learning to collect an unbiased sample. It involves selecting observations from the population in a way where each observation has an equal chance of being selected. It can also cause errors if the sample doesn’t end up reflecting the population so it’s mostly used when little is known about the population.
[Return]

Sampling is a process that’s used in machine learning to predict properties about a population based on the results from a subset of the population. It can increase the accuracy of the model and decrease the time, cost, and complexity that’s needed to build it. It can also unintentionally decrease the accuracy of the model if the sample isn’t representative of the population.
[Return]

The Output Layer is the final layer in the artificial neural network that’s used to make the prediction. It receives the inputs from the previous layers, performs the calculations through its neurons, and computes the output. It also uses different activation functions based on the type of problem that’s being solved which includes various classification and regression problems.
[Return]