Akira’s Machine Learning news — #issue 31

Akihiro FUJII

Published in

Analytics Vidhya

6 min readOct 11, 2021

Featured Paper/News in This Week.

A published study shows a sudden improvement in generalization performance from random results: overfitting starts at about 10² steps, and a sudden improvement in generalization performance from random prediction is reported at about 10⁶ steps. Thus, weighted decay seems to be the key to generalization. On the other hand, Yannic Kilcher proposed a hypothesis that “weight decay may enable the models to draw a smooth line for generalization while suppressing abrupt changes,” which I thought was very interesting.
Researchers proposed that the results of transformer models in image systems, such as ViT, may be due to patching rather than transformers. It is an outstanding achievement to get 96% accuracy with full scratch training of CIFAR10 in the ViT system, which requires a large amount of data.

— — — — — — — — — — — — — — — — — — –

In the following sections, I will introduce various articles and papers not only on the above contents but also on the following five topics.

Featured Paper/News in This Week
Machine Learning Use Case
Papers
Articles related to machine learning technology
Other Topics

— — — — — — — — — — — — — — — — — — –

1. Featured Paper/News in This Week

After some time, the neural net suddenly generalizes. — mathai-iclr.github.io

[GROKKING: GENERALIZATION BEYOND OVERFITTING ON SMALL ALGORITHMIC DATASETS]
They found that the smaller the data set, the longer it takes to optimize the neural net. While overfitting occurs in about 10² steps, generalization to a valid set requires about 10⁵ steps, which leads to a sudden increase in accuracy from random results. It was essential to use weight decay for this generalization.

Is patching more critical than the transformer? — openreview.net

[Patches Are All You Need? | OpenReview]
This is a study of a Transformer Encoder-like mechanism using Conv, which can be implemented with six lines of PyTorch and is more efficient than ViT or MLP-Mixer and can achieve 96% accuracy even on small datasets such as CIFAR. From this result, the authors suggest that patching the image was more important than the transformer itself.

— — — — — — — — — — — — — — — — — — –

2. Machine Learning Use case

China will continue to be the “world’s factory” with AI technology — kaifulee.medium.com

China Is Still the World’s Factory — And It’s Designing the Future With AI

For many years now, China has been the world’s factory. Even in 2020, as other economies struggled with the effects of…

kaifulee.medium.com

This is an article on China’s application of AI technology. China is said to be the world’s factory, which will remain true in 2020, stating that it is using AI technology to innovate in manufacturing and other areas as labor costs rise due to a slowing population.

— — — — — — — — — — — — — — — — — — –

3. Papers

Finding candidates for gravitational lensing with self-supervised learning — arxiv.org

[2110.00023] Mining for strong gravitational lenses with self-supervised learning
Research on finding image candidates for gravitational lensing by self-supervised learning. First, they use a pre-trained model with self-supervised learning to find candidates by similarity in known images. After that, they build a classification model using linear regression and other methods. They stated that this could significantly lower the barrier to entry when dealing with survey data and open up many avenues for collaboration.

Transformer optimized with evolutionary algorithm — arxiv.org

[2109.08668] Primer: Searching for Efficient Transformers for Language Modeling
This is a study of the NAS of the Transformer for language models in the evolutionary algorithm. As a result of the search, they found MDHA, which convolves information between heads, and Squared ReLU, which squares ReLU, and Primer equipped with them can reduce training time by 1/3 to 1/4.

Re-evaluating ResNet and re-establishing the baseline for the learning procedure — arxiv.org

[2110.00476] ResNet strikes back: An improved training procedure in timm
This is a study in which ResNet was re-evaluated using the latest regularization and data augmentations. As a result, Top-1 Acc was improved from 75.3% to 80.4%. In addition, ResNet was scored differently by different papers, but their learning procedure was made public through timm, and a new baseline was shared.

ViT can learn semantic domain segmentation information by self-supervised learning — arxiv.org

[2104.14294] Emerging Properties in Self-Supervised Vision Transformers
This is a study of self-supervised learning of ViTs. They propose DINO, which performs self-supervised learning by a distillation-like mechanism, to train ViTs so that their distributions are consistent across multiple cropped images.

Achieve high performance with less or more data. — arxiv.org

[2106.04803] CoAtNet: Marrying Convolution and Attention for All Data Sizes
This is research on combining Transformer and CNN. First, SelfAttention with Relative Positional Encoding, then choose CNN or Transformer layer at the stage level, and build up the stages. Finally, use ImageNet to gain SotA performance and achieve high performance with less or more data.

— — — — — — — — — — — — — — — — — — –

4. Articles related to machine learning technology

Background and Foreground Separation — ai.googleblog.com

Introducing Omnimattes: A New Approach to Matte Generation using Layered Neural Rendering

Image and video editing operations often rely on accurate - images that define a separation between foreground and…

ai.googleblog.com

This blog by Google on “Omnimatte: Associating Objects and Their Effects in Video” (CVPR2021). It states that separating the less correlated parts is possible by letting CNNs learn things like the correlation between people and shadows and it can separate background and foreground .

Differences between ViT’s and CNN — syncedreview.com

Google Brain Uncovers Representation Structure Differences Between CNNs and Vision Transformers |…

Although convolutional neural networks (CNNs) have dominated the field of computer vision for years, new vision…

syncedreview.com

A commentary article on [Do Vision Transformers See Like Convolutional Neural Networks?] discusses the differences between ViT and CNNs. It states that ViT’s skip connections for representation propagation are more influential than ResNet one and may substantially impact performance and representation similarity.

— — — — — — — — — — — — — — — — — — –

5. Other Topics

Introducing TensorFlow Similarity

September 13, 2021 - Posted by Elie Bursztein and Owen Vallis, Google Today we are releasing the first version of…

blog.tensorflow.org

[Introducing TensorFlow Similarity — The TensorFlow Blog]
An introduction to Tensorflow Similarity, which can search nearest neighbor data and can be implemented in 20 lines of code.

— — — — — — — — — — — — — — — — — — –

Other blogs

Machine Learning 2020 summary: 84 interesting papers/articles

In this article, I present a total of 84 papers and articles published in 2020 that I found particularly interesting…

towardsdatascience.com

Recent Developments and Views on Computer Vision x Transformer

On the differences between Transformer and CNN, why Transformer matters, and what its weaknesses are.

towardsdatascience.com

Reach and Limits of the Supermassive Model GPT-3

In this blog post, I will give a technical explanation of GPT-3 , what GPT-3 have achieved , and what GPT-3 could not…

medium.com

About Me

Manufacturing Engineer/Machine Learning Engineer/Data Scientist / Master of Science in Physics / http://github.com/AkiraTOSEI/

Twitter, I post one-sentence paper commentary.

Akira’s Machine Learning news — #issue 31

Featured Paper/News in This Week.

1. Featured Paper/News in This Week

2. Machine Learning Use case

China Is Still the World’s Factory — And It’s Designing the Future With AI

For many years now, China has been the world’s factory. Even in 2020, as other economies struggled with the effects of…

3. Papers

4. Articles related to machine learning technology

Introducing Omnimattes: A New Approach to Matte Generation using Layered Neural Rendering

Image and video editing operations often rely on accurate - images that define a separation between foreground and…

Google Brain Uncovers Representation Structure Differences Between CNNs and Vision Transformers |…

Although convolutional neural networks (CNNs) have dominated the field of computer vision for years, new vision…

5. Other Topics

Introducing TensorFlow Similarity

September 13, 2021 - Posted by Elie Bursztein and Owen Vallis, Google Today we are releasing the first version of…

Other blogs

Machine Learning 2020 summary: 84 interesting papers/articles

In this article, I present a total of 84 papers and articles published in 2020 that I found particularly interesting…

Recent Developments and Views on Computer Vision x Transformer

On the differences between Transformer and CNN, why Transformer matters, and what its weaknesses are.

Reach and Limits of the Supermassive Model GPT-3

In this blog post, I will give a technical explanation of GPT-3 , what GPT-3 have achieved , and what GPT-3 could not…

About Me

Written by Akihiro FUJII