Akira’s Machine Learning News — Issue #33
Featured Paper/News in This Week.
- A new dataset for self-supervised learning has been released that can be used for commercial purposes and is portrait rights friendly. As a member of the industry, I am very grateful for such a dataset, as large-scale data such as ImageNet, which is often used in academia, is usually not commercially available!
- It seems that the higher the accuracy of the pre-training, the higher the accuracy of the downstream task does not necessarily become. Therefore, it appears that the scaling strategy for downstream tasks needs to be considered separately from the scaling strategy for pre-training.
— — — — — — — — — — — — — — — — — — –
In the following sections, I will introduce various articles and papers not only on the above contents but also on the following five topics.
- Featured Paper/News in This Week
- Machine Learning Use Case
- Papers
- Articles related to machine learning technology
— — — — — — — — — — — — — — — — — — –
1. Featured Paper/News in This Week
Pre-training performance does not necessarily match the performance of the downstream task. — arxiv.org
[2109.10686] Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
This is a study of the relationship between the scale of the model and the accuracy of the downstream task. The pre-training performance gets better as the model gets larger, but it does not necessarily match the performance of the downstream task. The authors proposed the DeepNarrow strategy, which narrows and deepens the model, and succeeded in speeding up the training by 40% while maintaining the performance of the downstream task.
Dataset of 1.4 million images that avoids problems such as copyrights and portrait rights — arxiv.org
[2109.13228] PASS: An ImageNet replacement for self-supervised pretraining without humans
Huge datasets such as ImageNet have problems with licensing and using photos of people without their consent. To solve this problem, the authors have collected data that is available under the CC-BY license, and released PASS, a data set for self-supervised learning that excludes people from the data. They confirmed that it avoids problems such as copyright and can be trained with MoCo, DINO, etc.
— — — — — — — — — — — — — — — — — — –
2. Machine Learning use case
An article discussing how to achieve a sustainable AI system. Although the amount of computation is currently increasing, the article suggests using smaller models, decentralizing the regions where computation is done (carbon emitting regions), and optimizing both software and hardware energy.
— — — — — — — — — — — — — — — — — — –
3. Machine Learning Papers
Anomaly detection method using computationally efficient with pre-trained models — arxiv.org
[2106.08265] Towards Total Recall in Industrial Anomaly Detection
Proposes PatchCore, which uses learned models for anomaly detection. It is characterized by having a core set that aggregates the feature information of each patch of training samples. Achieved SotA performance on MVTech dataset.
Contrastive learning with text and video — arxiv.org
[2109.14084] VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Proposes VideoCLIP, which performs contrastive learning on text and video. The sampling of the video is varied around the time of the sampled text, and contrastive learning is performed with high difficulty samples by clustering. The proposed method outperforms supervised learning for zero-shot inference in downstream tasks.
3D object detection method without parameters to be adjusted manually — arxiv.org
[2109.08141] An End-to-End Transformer Model for 3D Object Detection
The authors propose a 3D object detection method, 3DETR, which can be trained by End-to-End. The 3DETR deals with object detection of point clouds as a set-to-set problem like DETR, but unlike DETR, it uses only transformers and eliminates parameters that need to be adjusted manually.
Learning with Self-Teaching for Medical Images — arxiv.org
[2101.05224] Big Self-Supervised Models Advance Medical Image Classification
This is a study on how self-supervised learning on ImageNet, followed by self-supervised learning on medical images again, improves the performance of the subsequent classification task. Since medical images are often taken from multiple angles, the authors proposed Multi-Instance Contrastive Learning, which treats them as the same data.
Fine-tuning CLIP by adding small networks and residual connections — arxiv.org
[2110.04544] CLIP-Adapter: Better Vision-Language Models with Feature Adapters
They proposed CLIP-Adapter, which finetunes CLIP with less data. It adds a small network after the final layer of each image and language branch, and fine-tunes it. Another feature of the structure is that it is easy to retain the information of the original final layer by residual connection. Good performance can be achieved with less data.
Combining Transformer and CNN to build a network that runs at high speed — arxiv.org
[2110.02178] MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
The authors propose MobileViT, a high-speed network for mobile devices that combines a Transformer and a CNN. First, the local information is captured by the CNN, and then the global information is processed by the Transformer. It is 5.7% more accurate than MobileNetv3. It can be used for classification, object detection, and segmentation.
— — — — — — — — — — — — — — — — — — –
4. Technical Articles
Pytorch implementation of the famous algorithms — nn.labml.ai
This is a website that introduces pytorch implementations of the core technologies of many papers, including newer ones such as gMLP, GAN, and reinforcement learning. If you are interested in a particular technology, you may want to check it out here.
— — — — — — — — — — — — — — — — — — –
5. Other Topics
20 AI People to Watch — aijourn.com
This is an article introducing 20 influential people in the AI field, with descriptions of their Twitter and LinkedIn accounts.
— — — — — — — — — — — — — — — — — — –
Other Blogs
— — — — — — — — — — — — — — — — — — –
About Me
Manufacturing Engineer/Machine Learning Engineer/Data Scientist / Master of Science in Physics / http://github.com/AkiraTOSEI/
Twitter, I post one-sentence paper commentary.