DEEP LEARNING

Fastai Chapter 1: Questions & Answers

The answer key for the questionnaire at the end of the chapter

David Littlefield
Geek Culture

--

Chapter Summary:

Chapter 1 provides a broad overview of artificial intelligence. It covers some history, prerequisites, theories, applications, milestones, terminology, and mechanics of the subject. It also demonstrates some code that’s used to load the dataset, train the model, and make predictions using different models.

01. Do you need these for deep learning

  1. Lots of Math: The mathematics are usually handled in the background by most deep learning libraries. It only becomes necessary to fine tune the model and exceed state-of-the-art performance which does require linear algebra, multivariable calculus, and probability and statistics.
  2. Lots of Data: Small datasets can achieve similar results as large datasets when the data is high-quality. It would require the data to be accurate, complete, relevant, valid, timely, and consistent. It can also be improved through data augmentation, fine tuning, and differential learning rates.
  3. Expensive Computers: Deep learning can be learned for free using high-performance graphics cards through cloud service providers like Paperspace, Google, and Kaggle. It also includes some time restrictions, slight delays, and varying performance due to fluctuating availability.
  4. A PhD: A degree isn’t necessary to use deep learning technology to solve real world problems. It would be needed to further develop the existing theories and tools that are used for deep learning. It also trains students to be productive as scholars and researchers in an academic setting.

02. Name five areas where deep learning is now the best in the world:

  1. Natural Language Processing (NLP): A branch of artificial intelligence that gives machines the ability to process human language in the form of text and voice data. It can learn to understand the meaning, intent, and sentiment of the writer or speaker. It also combines computational linguistics with statistical, machine learning, and deep learning models.
  2. Computer Vision: A branch of artificial intelligence that gives machines the ability to derive meaningful information from digital images, videos, and other forms of visual inputs. It can learn to extract, analyze, and understand the meaningful information to automate tasks that involve taking actions, making decisions, and or making recommendations.
  3. Content Generation: A branch of artificial intelligence that gives machines the ability to create new material such as images, music, speech, and text. It enables machines to use separate neural networks that compete with each other to create new material and detect its authenticity. It also continuously learns to create better material and make better detections until the new material passes as authentic.
  4. Medicine: A branch of artificial intelligence that gives machines the ability to analyze, understand, and present complex medical data. It enables machines to assist with diagnosis processes, treatment protocol development, drug development, precision medicine, and patient monitoring and care. It also enables machines to analyze large amounts of data from electronic health records to diagnose and prevent diseases.
  5. Biology: A branch of artificial intelligence that gives machines the ability to survey, analyze, and classify biological data. It enables machines to assist with cellular image classification, genomic medicine, biomarker development, and drug discovery. It also enables machines to use comprehensive biological information about a person to predict and diagnose diseases and identify or develop customized therapies.

03. What was the name of the first device that was based on the principle of the artificial neuron?

Mark I Perceptron: A device that’s considered the first artificial neural network ever created. It was developed in 1957 by Frank Rosenblatt for the purpose of image classification. It was also physically made up of various “sensory” units that were randomly connected to “association” units whose weights were recorded in potentiometers and adjusted by electric motors.

Image from Owners Manual

04. Based on the book of the same name, what are the requirements for parallel distributed processing (PDP)?

Parallel Distributed Processing (PDP): A model in Psychology that explains cognitive processes. It proposes that cognitive processes arise from interactions between a large number of neurons via synaptic connections. It also proposes that the knowledge that governs processing is stored in the strength of the connections and is acquired gradually through experience.

  1. A set of processing units
  2. A state of activation
  3. An output function for each unit
  4. A pattern of connectivity among units
  5. A propagation rule for propagating patterns of activities through the network of connectivities
  6. An activation rule for combining the inputs impinging on a unit with the current state of that unit to produce a new level of activation for the unit
  7. A learning rule whereby patterns of connectivity are modified by experience
  8. An environment within which the system must operate

05. What were the two theoretical misunderstandings that held back the field of neural networks?

First Theoretical Misunderstanding: Neural networks couldn’t classify non-linear patterns. It was based on the book, “Perceptrons,” which proved that a single-layered perceptron couldn’t classify non-linear patterns. It also speculated that a multi-layered perceptron might overcome this limitation but that was mistaken by the public as evidence against neural networks.

Second Theoretical Misunderstanding: Neural networks were too big and slow to be useful. It was based on their reputation of underperformance and being hard to work with compared to newer two-layered perceptrons. It also occurred after multi-layered perceptrons were proven effective but before it was understood that neural networks were actually limited due to the insufficient datasets, computing power, and algorithms at the time.

06. What is a GPU?

Graphics Processing Unit (GPU): A specialized processor that accelerates graphics rendering. It can handle thousands of single tasks in parallel like displaying three-dimensional environments in computers games. It can also run neural networks hundreds of times faster than central processing units.

07. Open a notebook and execute a cell containing: 1+1. What happens?

The result is printed in the output cell of the code cell in the notebook.

08. Follow through each cell of the stripped version of the notebook for this chapter. Before executing each cell, guess what will happen.

The stripped down version of the notebook contains the code without the text from the textbook. It can be found in the clean subdirectory in the fastbook directory. It can also be viewed in the appendix of this article.

09. Complete the Jupyter Notebook online appendix.

The Jupyter Notebook online appendix is a notebook that shows students how to use Jupyter Notebook. It contains explanations, instructions, and screenshots about the user interface, markdown format, code capabilities, and keyboard shortcuts. It can also be found in the fastbook directory.

10. Why is it hard to use a traditional computer program to recognize images in a photo?

Developing a traditional computer program to recognize objects in a photo would be hard. It would require identifying each step that’s involved in the process and translating the steps into code. It also isn’t clear how the human brain recognizes images which makes it even harder to identify the steps.

11. What did Samuel mean by “weight assignment”?

Weight Assignment: A term that describes the values that are currently assigned to the weights in the model. It represents the values that decide how the model will operate. It also produces a measurable performance that can be automatically tested and improved by adjusting the values.

12. What term do we normally use in deep learning for what Samuel called “weights”?

Parameter: A variable that stores a weight or bias value in the model that the model uses to makes predictions. It contains a value that’s automatically initialized by the model and updated by the optimizer function during the training process. It also decides what and how well the model performs.

13. Draw a picture that summarizes Samuel’s view of a machine learning model.

The detailed view of a machine learning model according to Arthur Samuel.

14. Why is it hard to understand why a deep learning model makes a particular prediction?

Deep learning models can have hundreds and even thousands of layers which makes it hard to identify what factors determine the final output. It can also have up to millions and even billions of parameters that interact with each other which makes it even harder to identify these factors.

15. What is the name of the theorem that shows that a neural network can solve any mathematical problem to any level of accuracy?

Universal Approximation Theorem: A theory that suggests an artificial neural network can represent a wide range of functions when given the appropriate weights. It states that a feedforward neural network with one hidden layer that contains a sufficient but finite number of neurons should be able to approximate any continuous function to a reasonable accuracy.

16. What do you need in order to train a model?

Training Process: A process that teaches the model how to make accurate predictions on unseen data. It involves selecting a model, hyperparameters, dataset, loss function, and optimization function. It also involves splitting the dataset into training, validation, and test sets, selecting the model, selecting the hyperparameters, making the predictions, calculating the loss, calculating the gradients of the loss, and updating the weights and biases.

17. How could a feedback loop impact the rollout of a predictive policing model?

Positive Feedback Loop: A circumstance where the predictions produce outcomes that reinforce the same predictions. It can exaggerate biases by creating more data that skews those biases even further. It can also affect decision-making systems that may even negatively reshape populations in ways that are more detrimental to the historically disadvantaged groups.

Predictive Policing: A model that uses historical crime data to train the model which represents the crimes that were documented rather than the crimes that were actually committed. It contains historical biases that heavily influences the predictions about where to focus policing activity.

  • It leads to more arrests in those areas and perpetuate the feedback loop.

18. Do we always have to use 224×224-pixel images with the cat recognition model?

No,but there are limits to how much the image dimensions can be changed.

  • It produces an error when the image is too small which occur because the neural network runs out of data while reducing the dimensions during the forward propagation.
  • It produce a low accuracy when the image is too large which occurs because there isn’t enough layers in the model to recognize the distinct patterns in the image.

19. What is the difference between classification and regression?

Classification: A subcategory of supervised learning that classifies the unseen data as one or more category values. It applies a classification algorithm to the training set to identify the different categories. It also predicts the probability the unseen data belongs to each of the categories.

  • Predicts a category value based on the categories in the unseen data.

Regression: A subcategory of supervised learning that predicts the dependent variable as a continuous value. It applies a regression algorithm to the training set to identify the line that best represents the relationship between the dependent and independent variables. It also predicts the dependent variable based on the independent variables in the unseen data.

  • Predicts a continuous value based on the values in the unseen data.

20. What is a validation set? What is a test set? Why do we need them?

Validation Set: A subset of the dataset that isn’t used to train the model. It can be used to produce an unbiased evaluation of the model because the model isn’t allowed to learn or memorize the data in the subset. It can also be used to prevent underfitting and overfitting by fine-tuning the model.

  • The validation set is needed to prevent underfitting and overfitting. It identifies when performance drops which helps fine-tune the model.

Test Set: A subset of the dataset that isn’t used to train or validate the model. It can be used to produce the last unbiased evaluation of the model because the model hasn’t seen or been fine-tuned with the data in the subset. It also only gets used to evaluate the trained and fine-tuned model.

  • The test set is needed to measure the true performance of the model. It estimates how well the model performs on unseen data in production.

21. What will fastai do if you don’t provide a validation set?

It automatically creates a validation set using 20% of the total training data.

22. Can we always use a random sample for a validation set? Why or why not?

Sampling: A process that predicts properties about a population based on a small subset of the population. It can increases the accuracy of the model, and reduce the time, cost, and complexity to build it. It can also reduce the accuracy of the model when the sample doesn’t represent the population.

Random Sampling: A sampling technique that collects an unbiased sample. It selects observations from the population by giving each observation an equal chance of being selected. It can also causes errors when the random sample doesn’t end up representing the population.

  • Time series forecasting doesn’t work with random samples. It produces overly optimistic results that generalize poorly because random samples use past and future data to train and validate the model. It needs the data to be split into time periods where past data is used to train the model and recent data is used as future data to validate the model.

23. What is overfitting? Provide an example.

Overfitting: An error that occurs when the model performs well on the training set but performs poorly on the validation set. It can be caused by overtraining, lack of validation, and improper fine-tuning. It can also be caused by training data that contains too much meaningless information.

24. What is a metric? How does it differ from “loss”?

Metric: A function that evaluates the performance of the model. It usually measures performance differently based on the type of model and dataset. It measures the accuracy, precision, or recall for classification models with balanced datasets. It also measures the area under the curve of receiver operating characteristics for classification models with imbalanced datasets.

  • It provides an interpretation of the performance of the model that’s easy for humans to understand which helps give meaning to the numbers in the context of the goals of the overall project and project stakeholders.

Loss: A metric that measures how incorrect the predictions are. It calculates the distance between the predicted values and label values using one of the loss functions. It can be used to prevent underfitting and overfitting. It can also be used to calculate the gradient to update the weights and biases.

  • It provides an interpretation of the performance of the model that’s easy for the computer to understand which helps to minimize the loss value and monitor for things like overfitting, underfitting, and convergence.

25. How can pretrained models help?

Pretrained Model: A model that’s already been trained on a dataset to perform a specific task. It includes the architecture of the model and weights and biases that were acquired from the training process. It can also be trained further to perform a similar task using transfer learning.

  • It prevents users from having to train the model from scratch which can take days or even weeks depending on the size of the model and dataset.
  • It can be trained to perform a similar task that produces higher accuracy using less data, time, and resources than it would require otherwise.

26. What is the “head” of a model?

Head: An analogy that references the output layer in an artificial neural network. It views the model as a backbone with a head where backbone refers to the architecture of the model and head refers to the final layer. It also views transfer learning as replacing the head of a pretrained backbone.

27. What kinds of features do the early layers of a CNN find? How about the later layers?

Early Layers: Detects the low-level features in an image. It learns to detect edges and colors in the first layer which becomes the building block to detect textures made from combinations of edges and colors in the second layer. It also continues to detect more sophisticated features with each layer.

Later Layers: Detects high-level features in an image. It learns to detect sophisticated patterns that resemble textures in objects like eyes, ears, and noses. It also eventually learns to detect objects like humans and animals which becomes the building blocks to detect specific objects in the dataset.

28. Are image models only useful for photos?

Image models can classify non-image data as long as the data is converted to an image. It groups the neighboring features by plotting data by placing similar features together and dissimilar features further apart. It can also discover hidden patterns and relationships between the features in the data.

29. What is an “architecture”?

Architecture: A template that defines the structure of a neural network. It represents the number, size, and types of layers that are used to build the neural network. It also represents different types of neural networks for supervised, unsupervised, semi-supervised, and reinforcement learning.

30. What is segmentation?

Image Segmentation: A process that partitions an image into distinct regions that contain pixels with similar attributes. It can locate objects in an image and color-code each pixel that represents a particular object. It can also color-code each pixel that belong to the same class or different classes.

31. What is y_range used for? When do we need it?

Y_Range: A parameter that’s passed to various functions in the Fastai library to instruct the function to predict numerical values instead of categorical values. It specifies the maximum and minimum values that can be predicted for the dependent variable when performing regression.

  • It instructs the Fastai library to switch from classification to regression.
  • It normalizes the values to be between the maximum and minimum value.
  • It prevents wasted computation on values that are too large or small.

32. What are “hyperparameters”?

Hyperparameter: A parameter that controls the training process. It includes parameters like learning rate, number of epochs, batch size, number of hidden layers, and activation function. It also must be set manually set and significantly impacts the performance of the model.

33. What’s the best way to avoid failures when using AI in an organization?

The best way to avoid failures when introducing artificial intelligence into an organization is to know how to use validation and test sets. It greatly reduces the risk of failure by setting aside some data that’s separate from the data that’s given to the external vendor or service provider. It also lets the organization evaluate the true performance of the model with the data.

Extra Resources: Want to learn how to use artificial intelligence, machine learning, and deep learning? This blog is covering the Fastai course and interesting repositories related to the field.Fastai:
1. Chapter 1: Your Deep Learning Journey Q&A
2. Chapter 2: From Model to Production Q&A
3. Chapter 3: Data Ethics Q&A
4. Chapter 4: Under the Hood: Training a Digit Classifier Q&A
5. Chapter 5: Image Classification Q&A
6. Chapter 6: Other Computer Vision Problems Q&A
7. Chapter 7: Training a State-of-the-Art Model Q&A
Linux:
01. Install and Manage Multiple Python Versions
02. Install the NVIDIA CUDA Driver, Toolkit, cuDNN, and TensorRT
03. Install the Jupyter Notebook Server
04. Install Virtual Environments in Jupyter Notebook
05. Install the Python Environment for AI and Machine Learning
06. Install the Fastai Course Requirements
WSL2:
01. Install Windows Subsystem for Linux 2
02. Install and Manage Multiple Python Versions
03. Install the NVIDIA CUDA Driver, Toolkit, cuDNN, and TensorRT
04. Install the Jupyter Notebook Server
05. Install Virtual Environments in Jupyter Notebook
06. Install the Python Environment for AI and Machine Learning
07. Install Ubuntu Desktop With a Graphical User Interface (Bonus)
08. Install the Fastai Course Requirements
Windows 10:
01. Install and Manage Multiple Python Versions
02. Install the NVIDIA CUDA Driver, Toolkit, cuDNN, and TensorRT
03. Install the Jupyter Notebook Server
04. Install Virtual Environments in Jupyter Notebook
05. Install the Python Environment for AI and Machine Learning
06. Install the Fastai Course Requirements
Mac:
01. Install and Manage Multiple Python Versions
02. Install the Jupyter Notebook Server
03. Install Virtual Environments in Jupyter Notebook
04. Install the Python Environment for AI and Machine Learning
05. Install the Fastai Course Requirements

Appendix:

1. Answer: Downloads and installs the Fast AI library and dependencies. [Return]
2. Answer: Imports the entire Fast AI library. [Return]
3. Answer: Downloads a dataset and pre-trained model, and retrains the model. [Return]
4. Answer: Prints the result of the equation. [Return]
5. Answer: Loads and displays an image of a cat. [Return]
6. Answer: Displays a widget that can select an image to upload. [Return]
7. Answer: Replaces the selected image with a predefined image. [Return]
8. Answer: Loads the predefined image, makes a prediction, and prints the result. [Return]
9. Answer: Downloads a dataset and pre-trained model, and retrains the model. [Return]
10. Answer: Makes a prediction and displays the results. [Return]
11. Answer: Downloads a dataset and pre-trained model, and retrains the model. [Return]
12. Answer: Makes a prediction. [Return]
13. Answer: Loads a dataset, and specifies the column names and preprocessing steps. [Return]
14. Answer: Trains the model from scratch. [Return]
15. Answer: Loads a dataset, specifies the value range, and retrains the model. [Return]
16. Answer: Makes a prediction and displays the results. [Return]

--

--

David Littlefield
Geek Culture

From: Non-Technical | To: Technical Founder | Writes: To Make It Easier For Everyone | Topics: #Startups #How-To #Coding #AI #Machine Learning #Deep Learning