Huawei ML FAQ — Advanced

Even More annotation from Huawei Mock Exam — AISeries — Episode #04

J3
Jungletronics
12 min readMay 14, 2021

--

Hi, I am back!

This time I took a very hard test from my Huawei AI Course and I wrote this annotation down ♫ For the people who are Still alive ♫ ♪ ♩ :)

The idea is to get even more deeply into AI Theories.

I hope these study materials can help you (and myself:) get the Huawei certificate HCIA easily.

We might get a good grade!

Let’s get it on!

Note: ( )Single choice [ ]Multiply choices


Index:
01# Regarding (GD) Optimizers, please analyze these: True or False?
02# Regarding Support Vector Machines (SVM): True or False?
03# Which of the following are real-world applications of the SVM?
04# Regarding Gated Recurrent Units (GRU): True or False?
05# Regarding Recurrent Neural Networks (RNN) possible application: True or False?
06# Regarding GANs: True or False?
07# Again, regarding GANs: True or False?
08# Regarding Huawei Mindspore: True or False?
09# Regarding Ascend Computing Language (AscendCL): True or False?
10# Regarding MindSpore in GE (Graph Engine): True or False?

01# Regarding (GD) Optimizers, please analyze these: True or False?


a.
[ ] The Momentum method uses the first moment with a decay rate to gain speed.
b.[ ] RMSProp uses the second moment by with a decay rate to speed up from AdaGrad.c.[ ] AdaGrad uses the second moment with no decay to deal with sparse features.d.[ ] Two common tools to improve gradient descent are the sum of gradient (first moment) and the sum of the gradient squared (second moment).

Solution: T, T, T, T


First, read this article from Lili Jiang (A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam).
Now, let's taking the questions step by step:a.Momentum algorithm (or Momentum for short) borrows the idea from physics; Momentum has a shot at escaping local minima; so yes, The Momentum algorithm uses the first moment with a decay rate to gain speed.b.Just Look at the pictures in the article above to understand that RMSProp (Root Means Square Propagation) solves the AdaGrad problem of being incredibly slow by adding a decay factor; so yes, RMSProp uses the second moment by with a decay rate to speed up from AdaGrad.c.The average gradient for sparse features is usually small so such features get trained at a much slower rate. AdaGrad addresses this problem using by the cumulative sum of gradient squared. The learning rate of AdaGrad is set to be higher than that of gradient descent (Gif 1); so yes, AdaGrad uses the second moment with no decay to deal with sparse features, that is the cumulative sum of gradient squared (second moment), instead of sum of gradient (first moment).d.And the last one, two common tools to improve gradient descent are the sum of gradient (first moment) and the sum of the gradient squared (second moment) and that is true \o/. There you have it!
Gif 1. AdaGrad (white) is somewhat faster than the gradient descent (cyan).[1]

02#Regarding Support Vector Machines (SVM): True or False?


a.
[ ] By using kernel functions, SVMs can build nonlinear decision surfaces in the input data space.
b.[ ] After an SVM model is trained, only the support vectors are necessary to classify new samples.c.[ ] For a linearly separable dataset, an SVM will look for the hyperplane that separates the classes with maximum margin.d.[ ] One of the biggest advantages of SVMs is that it has no hyperparameters.

Solution: T, T, T, F


First, Read this article, please: Support Vector Machines(SVM) — An Overview @toward data science by Rushikesh Pupale.
a. SVM can solve linear and non-linear problems and work well for many practical problems; SVM data can be converted to linearly separable data in higher dimension; The idea of SVM is simple: The algorithm creates a line or a hyperplane which separates the data into classes; so yes, SVM can build a non-linear decision surfaces too.b. Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, we maximize the margin of the classifier. Deleting the support vectors will change the position of the hyperplane. These are the points that help us build our SVM; so yes, After an SVM model is trained, only the support vectors are necessary to classify new samples.c. look the previous item.d. Parameters are arguments that you pass when you create your classifier. Hyperparameters are external configurations of models.Hyperparameters are usually fixed before the actual training process begins. How it was decided? broadly speaking, this is done by setting different values for those hyperparameters training different models, and deciding which ones work best by testing them.Hyperparameters are often specified by the practitioner; they can be used in heuristics and they can be set using heuristics.Commons models hyperparameters are: The learning rate for training a neural network; the C and sigma hyperparameters for support vector machines; The k in k-nearest neighbors; number of trees in Random Forest, etc.In SVM, these are hyperparameters: The soft margin constant, C, Gamma, and any parameters the kernel function may depend on (width of a Gaussian kernel or degree of a polynomial kernel).

03# Which of the following are real-world applications of the SVM?


a.
( ) Text and Hypertext Categorization
b.( ) Image Classificationc.( ) Clustering of News Articlesd.( ) All of the above

Solution: d


SVM’s are highly versatile models that can be used for practically all real world problems ranging from regression to clustering and handwriting recognitions.

04# Regarding Gated Recurrent Units (GRU): True or False?


a.
[ ] It only uses sigmoid activation functions.
b.[ ] The final memory at current time step is a combination between the output from the previous layer and the current memory content.c.[ ] The reset gate is the only gate with tanh.d.[ ] The reset and update gates do not employ the output from the previous layer.

Solution: F, T, F, F

The basic work-flow of a Gated Recurrent Unit (GRU Network) is similar to that of a basic Recurrent Neural Network (RNN) as illustrated below (Gifs 2,3 and Fig 1); the main difference between the two is in the internal working within each recurrent unit as Gated Recurrent Unit networks consist of gates which modulate the current input and the previous.GRU supports gating and a hidden state to control the flow of information. To solve the problem that comes up in RNN, GRU uses two gates: the update gate and the reset gate.LSTM consists of three gates: the input gate, the forget gate, and the output gate (Fig 1). Unlike LSTM, GRU does not have an output gate and combines the input and the forget gate into a single update gate.a. GRU can use as  Activation functions: sigmoid function and a hyperbolic tangent;  so no, sigmoid is not the only activation functions used by GRU.b. Yes, the final memory at current time step is a combination between the output from the previous layer and the current memory content.c. The reset gate contains sigmoid activation function (Gig 3); the update gate contains Tanh activation function (Fig 1); so no, reset gate does not contain Tanh.d. The reset and update gates does employ the output from the previous layer (Gif 3); so this statement is false.LSTMS and GRU are used to created as a method to mitigate short-term memory using mechanism called gates; gates are just neural network that regulate the flow of information being passed from one time step to the next.LSTMS and GRU are used in states-of-the-arts Deep Learning applications like speech recognition, speech synthesis, natural language understanding, etc.Please, develop your intuition and leverage it to make wiser, more soul-inspired decisions for your academic life about LSTM and GRU below:)
Gif 2. Simple RNN Cell; Source: [5]
Gif 3. Here’s how the GRU looks in action; Source: [5]
Fig 1. GRU uses two gates: the update gate and the reset gate

05# Regarding Recurrent Neural Networks (RNN) possible application: True or False?


a.
[ ] Many-to-many: video classification where we wish to label each frame of the video.
b.[ ] Many-to-many: reading a sentence in English and then outputing a sentence in French (translation).c.[ ] One-to-many: image captioning takes an image and outputs a sentence of words.d.[ ] Many-to-one: sentiment analysis where a given sentence is classified as expressing positive or negative sentiment.

Solution: All True:)


Recurrent neural networks
(RNN) are complex. They save the output of processing nodes and feed the result back into the model (they did not pass the information in one direction only). This is how the model is said to learn to predict the outcome of a layer. Each node in the RNN model acts as a memory cell, continuing the computation and implementation of operations. If the network’s prediction is incorrect, then the system self-learns and continues working towards the correct prediction during backpropagation.
A recurrent neural network (RNN) is a type of Artificial Neural Network (ANN) commonly used in Speech Recognition and DNA Sequence; Natural Language Processing (NLP); Sentiment Analysis; Image Captioning, etc.RNNs are designed to recognize a data's sequential characteristics and use patterns to predict the next likely scenario.

06# Regarding GANs: True or False?


In GANs, the Generative model learns the joint probability distribution p(x|y), predicting the conditional probability with the Bayes Theorem's help.
In contrast, a Discriminative model learns the conditional probability distribution?Chose option:a.( ) Trueb.( )False

Solution: True


Generative adversarial networks
(GANs) are an exciting recent innovation in machine learning.
GANs are generative models: they create new data instances that resemble your training data. For example, GANs can create images that look like photographs of human faces, even though the faces don't belong to any real person.GANs consists of two networks, a Generator G(x), and a Discriminator D(x). They both play an adversarial game where the generator tries to fool the discriminator by generating data similar to those in the training set. The Discriminator tries not to be fooled by identifying fake data from real data.GANs have a number of common failure modes. All of these common problems are areas of active research. While none of these problems have been completely solved, I'll mention some things that I've annotated from HCIA-AI V3.0 Mock Exam:. An over high learning rate leads to model divergence; . If the discriminator is too well trainned the generator cannot be trainned; . The model colapse results in insufficient diversified imagem; . If data set is small, overfitting is the issue :/

07# Again, regarding GANs: True or False?

For a GAN, convergence is often a fleeting, rather than stable, state?Chose option:a.( ) Trueb.( )False

Solution: True

The fact that GANs are composed by two networks, and each one of them has its loss function, results in the fact that GANs are inherently unstable - diving a bit deeper into the problem, the Generator (G) loss can lead to the GAN instability, which can be the cause of the gradient vanishing problem.How to improve GAN's stability:. Change the cost function for a better optimization goal;. Add additional penalties to the cost function to enforce constraints;. Avoid overconfidence and overfitting;. Better ways of optimizing the model;. Add labels.

08# Regarding Huawei Mindspore ME: True or False?


Which of the following features are provided by MindSpore in ME (Mind Expression)?
a.[ ] Auto diff: operator-level automatic differentialb.[ ] Semi-auto labeling: semi-automatic data labelingc.[ ] Auto tensor: automatic generation of operatorsd.[ ] Auto parallel: automatic parallelism

Solution: All True — see Fig 4 bellow :)

Fig 3. Huawei open-sources AI framework MindSpore to rival Google’s TensorFlow (link) — Graph High-Level Optimization (GHLO), Graph Low-Level Optimization (GLLO), and quantization; Intermediate Representation(IR); Graph Storage Path (Grpah) Execution — For hardware, see Q&A#11 below :)
Fig 4. MindSpore in ME (Mind Expression); Tensor Boost Engine (TBE) enables custom operator development based on Tensor Virtual Machine (TVM). You can develop neural network operators using TBE APIs on a dedicated GUI (read more here and here); Source: Huawei

09# Regarding Ascend Computing Language (AscendCL): True or False?


Which of the following is true regarding the Ascend Computing Language (AscendCL):
a.[ ] It has a Framework Adapter to run some AI frameworks.b.[ ] It has a Framework Adapter to run all AI frameworks.c.[ ] It can run TensorFlow/PyTorch/Caffe/MxNet.d.[ ] It can run Mindspore.

Solution: T,F,T,T

Fig 5. Source: https://support.huawei.com/enterprise/en/doc/EDOC1100155021/d63e3d89/ascendcl-overview

Ascend Computing Language (AscendCL) provides a collection of C language APIs for users to develop deep neural network apps for object recognition and image classification, ranging from device management, context management, stream management, memory management, to model loading and execution, operator loading and execution, and media data processing. You can call AscendCL APIs through a third-party framework to utilize the compute capability of the Ascend AI Processor, or encapsulate AscendCL into third-party libraries to provide the runtime and resource management capabilities of the Ascend AI Processor.

10# Regarding MindSpore in GE (Graph Engine): True or False?

Which of the following features are provided by MindSpore in GE (Graph Engine)?a.[ ] On-device executionb.[ ] Cross-layer memory overcommitmentc.[ ] Deep graph optimizationd.[ ] Device-edge-cloud synergy (including online compilation)

Solution: All True:) — see Fig 4 above

GE (Graph Engine): graph compilation and execution layer High performance; software/hardware co-optimization, and full-scenario  application:Cross-layer memory overcommitment; 
Deep graph optimization;
On-device execution;
Device-edge-cloud synergy (including online compilation).
① Equivalent to open-source frameworks in the industry, MindSpore preferentially serves self-developed chips and cloud services. ② It supports upward interconnection with third-party frameworks and can interconnect with third-party ecosystems through Graph IR, including training frontends and inference models. Developers can expand the capability of MindSpore. ③ It also supports interconnection with third-party chips and helps developers increase MindSpore application scenarios and expand the AI ecosystem.

And that’s it!

Please rest assured that I do my best to bring these annotations to you.

This series has been built with passion by my own while I unrolling in HCIA-AI Course for HCIA-AI V3.0 Mock Exam, and later, very soon, I hope, for Huawei Certification Exam.

Take it! Use it! Learn it!

Thanks! o/

References & Credits

INTRODUÇÃO A MACHINE LEARNING PARA CERTIFICAÇÃO HCIA-AI by crateus.ufc.br

[1]A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam) by Lili Jiang

Support Vector Machines Part 1 (of 3): Main Ideas!!! by StatQuest with Josh Starmer

25 Questions to test a Data Scientist on Support Vector Machines by ANKIT GUPTA

Neural networks by 3Blue1Brown

[5]Ghost Writing with TensorFlow by Wezley Sherman

Related Posts

00#Episode — AISeries — ML — Machine Learning Intro — What Is It and How It Evolves Over Time?

01#Episode — AISeries — Huawei ML FAQ — How do I get an HCIA certificate?

02#Episode — AISeries — Huawei ML FAQ Again — More annotation from Huawei Mock Exam

03#Episode — AISeries — AI In Graphics — Getting Intuition About Complex Math & More

04#Episode — AISeries —Huawei ML FAQ — Advanced — Even More annotation from Huawei Mock Exam (this one:)

05#Episode — AISeries — SVM — Credit Card — Start to Finished — A Complete Colab Notebook Using the Default of Credit Card Clients Data Set from UCI

06#Episode — AISeries — SVM — Breast Cancer — Start to Finished — A Complete Colab Notebook Using the Default of Credit Card Clients Data Set from UCI

07#Episode — AISeries — SVM — Cupcakes or Muffins? — Start To Finished — Based on Alice Zhao post

Cut somebody some slack!

Don’t be so critical of others as we all are under the same pressures and in this together!

I Feel FANTASTIC and I Am Still alive!

Still Alive by Jonathan Coulton

--

--

J3
Jungletronics

Hi, Guys o/ I am J3! I am just a hobby-dev, playing around with Python, Django, Ruby, Rails, Lego, Arduino, Raspy, PIC, AI… Welcome! Join us!