Akira’s Machine Learning News — Issue #34

Akihiro FUJII
Analytics Vidhya
Published in
7 min readNov 13, 2021


Featured Paper/News in This Week.

  • There have been a study in the past that have shown that ViT classifies with a more human-like behavior than CNN, but now a new study has been published that shows that ViT correctly classifies even when perturbed on a patch-by-patch basis.CNN depends on texture, ViT depends on patches, and both seem to be different from human perception.
  • A method to speed up the matrix product by a factor of 100 is proposed. Since the matrix product accounts for a large percentage of the total computation in deep learning, this technique may make it easier to implement large-scale models in society.

— — — — — — — — — — — — — — — — — — –

In the following sections, I will introduce various articles and papers not only on the above contents but also on the following five topics.

  1. Featured Paper/News in This Week
  2. Machine Learning Use Case
  3. Papers
  4. Articles related to machine learning technology

— — — — — — — — — — — — — — — — — — –

1. Featured Paper/News in This Week

ViT does not lose accuracy even when patches are rotated or shuffledarxiv.org

[2110.07858] Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation
Since ViT classifies images by dividing them into patches, the authors found that the accuracy of ViT does not deteriorate even when the images are incorrectly shuffled or rotated in each patch. In order to prevent this, they proposed a data augmentation method that penalizes correct classification with shuffled patch data.

Speeding up the matrix product by addition using a lookup tablearxiv.org

[2106.10860] Multiplying Matrices Without Multiplying
Research to speed up matrix multiplication. Divide a vector into subdivisions and have a lookup table of the subdivisions. By searching the nearest neighbors of the target vector, the matrix product is calculated by summing them. This is 100 times faster than the usual matrix product, and the accuracy is not compromised.

— — — — — — — — — — — — — — — — — — –

2. Machine Learning use case

AI used in Photoshopthenextweb.com

This is an introduction article to the machine learning features used in Adobe’s Photoshop, where machine learning technology is used to automatically mask complex shapes and transform backgrounds using Style Transfer. While many companies are struggling to adopt machine learning technology, Adobe’s example may be a good case study.

— — — — — — — — — — — — — — — — — — –

3. Machine Learning Papers

AST model that can use the same model even when the length of the speech varies greatlyarxiv.org

[2104.01778] AST: Audio Spectrogram Transformer
Proposed a AST (Audio Spectrogram Transformer) for audio using a Transformer. The spectrogram is input to the transformer as a patch. The same model can be used even when the length of the audio differs greatly(1~10s), and SotA performance was achieved on three datasets.

Survey of Activation Functionsarxiv.org

[2109.14545] A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning
This is a comprehensive survey of activation functions: the ReLU LReLU etc. works well in residual networks, and parameterized activation functions have fast convergence.

Curriculum Learning Improves the Efficiency of Pre-Learning Language Modelsarxiv.org

[2108.06084] Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
A paper claiming that curriculum learning was effective in pre-training a large-scale autoregressive language model such as GPT-2. Using the sequence length as an index of difficulty, the model is trained with gradually longer sequence. It enables stable learning and improves learning efficiency.

Achieving top-1 80% accuracy in ImageNet with 12 layers of deptharxiv.org

[2110.07641] Non-deep Networks
A study to improve accuracy by adding subnetworks instead of deepening. By creating a sub-network that shares only input and output, the authors can achieve top-1 80% accuracy in ImageNet with only 12 layers of depth.

Providing datasets of math problemsarxiv.org

[2110.14168] Training Verifiers to Solve Math Word Problems
The authors provided a dataset GSM8K of 8500 elementary school level arithmetic and developed a method to solve it. Since simple fine-tuning can lead to overfitting, they use a method called verifier, which uses a model that generates calculations along the way until the final answer is reached, and then verify it.

— — — — — — — — — — — — — — — — — — –

4. Technical Articles

Explanation of Automatic Differentiationpytorch.org

An article by PFN explaining automatic differentiation. Easy to understand explanation with codes and diagrams.

Videos on Reinforcement Learning by DeepMindwww.youtube.com

A list of videos explaining reinforcement learning by DeepMind. 13 videos, each about 1 to 2 hours long, are available.

— — — — — — — — — — — — — — — — — — –

5. Other Topics

GPT-3 now available to more peopleblogs.microsoft.com

MicroSoft has announced that it will release GPT-3 as an AzureOpenAI service, extending the limited access provided by OpenAI to expose it to more people. Pricing is yet to be determined.

— — — — — — — — — — — — — — — — — — –

Other Blogs

Machine Learning 2020 summary: 84 interesting papers/articles | by Akihiro FUJII | Towards Data Sciencetowardsdatascience.com
In this article, I present a total of 84 papers and articles published in 2020 that I found particularly interesting. For the sake of clarity, I divide them into 12 sections. My personal summary for…

Recent Developments and Views on Computer Vision x Transformer | by Akihiro FUJII | Towards Data Sciencetowardsdatascience.com
This article discusses some of the interesting research and insights in Transformer x Computer Vision research since the advent of Vision Transformer. The four themes of this article are as follows…

Reach and Limits of the Supermassive Model GPT-3 | by Akihiro FUJII | Analytics Vidhya | Mediummedium.com
Reach and Limits of the Supermassive Model GPT-3. In this blog post, I will give a technical explanation of GPT-3 , what GPT-3 have achieved , and what GPT-3 could not do..

Do Vision Transformers See Like Convolutional Neural Networks? (Paper Explained) | by Akihiro FUJII | Oct, 2021 | Towards Data Sciencetowardsdatascience.com
Vision Transformer (ViT) has been gaining momentum in recent years. This article will explain the paper “Do Vision Transformers See Like Convolutional Neural Networks?” (Raghu et al., 2021) published…

— — — — — — — — — — — — — — — — — — –

🌟I post weekly newsletters! Please subscribe!🌟

— — — — — — — — — — — — — — — — — — –

About Me

Manufacturing Engineer/Machine Learning Engineer/Data Scientist / Master of Science in Physics / http://github.com/AkiraTOSEI/

LinkedIn profile

Twitter, I post one-sentence paper commentary.