Akira’s Machine Learning News — Issue #33

Akihiro FUJII
Analytics Vidhya
Published in
6 min readNov 10, 2021

--

Featured Paper/News in This Week.

— — — — — — — — — — — — — — — — — — –

In the following sections, I will introduce various articles and papers not only on the above contents but also on the following five topics.

  1. Featured Paper/News in This Week
  2. Machine Learning Use Case
  3. Papers
  4. Articles related to machine learning technology

— — — — — — — — — — — — — — — — — — –

1. Featured Paper/News in This Week

Pre-training performance does not necessarily match the performance of the downstream task.arxiv.org

[2109.10686] Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
This is a study of the relationship between the scale of the model and the accuracy of the downstream task. The pre-training performance gets better as the model gets larger, but it does not necessarily match the performance of the downstream task. The authors proposed the DeepNarrow strategy, which narrows and deepens the model, and succeeded in speeding up the training by 40% while maintaining the performance of the downstream task.

Dataset of 1.4 million images that avoids problems such as copyrights and portrait rightsarxiv.org

[2109.13228] PASS: An ImageNet replacement for self-supervised pretraining without humans
Huge datasets such as ImageNet have problems with licensing and using photos of people without their consent. To solve this problem, the authors have collected data that is available under the CC-BY license, and released PASS, a data set for self-supervised learning that excludes people from the data. They confirmed that it avoids problems such as copyright and can be trained with MoCo, DINO, etc.

— — — — — — — — — — — — — — — — — — –

2. Machine Learning use case

Sustainable AI Systems

An article discussing how to achieve a sustainable AI system. Although the amount of computation is currently increasing, the article suggests using smaller models, decentralizing the regions where computation is done (carbon emitting regions), and optimizing both software and hardware energy.

— — — — — — — — — — — — — — — — — — –

3. Machine Learning Papers

Anomaly detection method using computationally efficient with pre-trained modelsarxiv.org

[2106.08265] Towards Total Recall in Industrial Anomaly Detection
Proposes PatchCore, which uses learned models for anomaly detection. It is characterized by having a core set that aggregates the feature information of each patch of training samples. Achieved SotA performance on MVTech dataset.

Contrastive learning with text and videoarxiv.org

[2109.14084] VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Proposes VideoCLIP, which performs contrastive learning on text and video. The sampling of the video is varied around the time of the sampled text, and contrastive learning is performed with high difficulty samples by clustering. The proposed method outperforms supervised learning for zero-shot inference in downstream tasks.

3D object detection method without parameters to be adjusted manuallyarxiv.org

[2109.08141] An End-to-End Transformer Model for 3D Object Detection
The authors propose a 3D object detection method, 3DETR, which can be trained by End-to-End. The 3DETR deals with object detection of point clouds as a set-to-set problem like DETR, but unlike DETR, it uses only transformers and eliminates parameters that need to be adjusted manually.

Learning with Self-Teaching for Medical Imagesarxiv.org

[2101.05224] Big Self-Supervised Models Advance Medical Image Classification
This is a study on how self-supervised learning on ImageNet, followed by self-supervised learning on medical images again, improves the performance of the subsequent classification task. Since medical images are often taken from multiple angles, the authors proposed Multi-Instance Contrastive Learning, which treats them as the same data.

Fine-tuning CLIP by adding small networks and residual connectionsarxiv.org

[2110.04544] CLIP-Adapter: Better Vision-Language Models with Feature Adapters
They proposed CLIP-Adapter, which finetunes CLIP with less data. It adds a small network after the final layer of each image and language branch, and fine-tunes it. Another feature of the structure is that it is easy to retain the information of the original final layer by residual connection. Good performance can be achieved with less data.

Combining Transformer and CNN to build a network that runs at high speedarxiv.org

[2110.02178] MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
The authors propose MobileViT, a high-speed network for mobile devices that combines a Transformer and a CNN. First, the local information is captured by the CNN, and then the global information is processed by the Transformer. It is 5.7% more accurate than MobileNetv3. It can be used for classification, object detection, and segmentation.

— — — — — — — — — — — — — — — — — — –

4. Technical Articles

Pytorch implementation of the famous algorithmsnn.labml.ai

This is a website that introduces pytorch implementations of the core technologies of many papers, including newer ones such as gMLP, GAN, and reinforcement learning. If you are interested in a particular technology, you may want to check it out here.

— — — — — — — — — — — — — — — — — — –

5. Other Topics

20 AI People to Watchaijourn.com

This is an article introducing 20 influential people in the AI field, with descriptions of their Twitter and LinkedIn accounts.

--

--