Image for post
Image for post

In a new paper, a research team lead by Geoffrey Hinton combines the strengths of five advances in neural networks — Transformers, Neural Fields, Contrastive Representation Learning, Distillation and Capsules — to imagine the idea of a vision system, “Glom,” that enables neural networks with fixed architectures to parse an image into a part-whole hierarchy with different structures for each image.


Image for post
Image for post

Transformer architectures have shown great success across machine learning (ML) tasks in natural language processing and beyond, but have mostly been limited to tasks from a single domain or specific multimodal domains. For example, ViT is exclusively for vision-related tasks, BERT focus on language tasks, and VILBERT-MT works only on related vision-and-language tasks.

A question naturally arises: Could we build a single transformer capable of handling a wide range of applications in different domains over multiple modalities? …


Image for post
Image for post

Attention architectures are pushing the frontier in many machine learning (ML) tasks and have become a building block in many modern neural networks. Our conceptual and theoretical understanding of their power and inherent limitations however remains nascent. Researchers from Microsoft and Université de Montréal set out to capture the essential mathematical properties of attention, proposing a new mathematical framework that uses measure theory and integral operators to model attention and quantify the regularity of attention operations.


Image for post
Image for post

Today’s large language models have greatly improved their task-agnostic, few-shot performance, with top models like GPT-3 competitive with state-of-the-art finetuning approaches when provided only a few examples in a natural language prompt. This few-shot, “in-context” learning approach is gaining traction in large part due to its ability to learn without parameter updates. Compared to traditional finetuning methods, few-shot learning enables practitioners to more quickly prototype NLP models, allows non-technical users to create NLP systems, and efficiently reuses models to reduce system memory and complexity.

GPT-3’s accuracy however can be highly unstable across different prompts (training examples, permutation, format). To address…


Image for post
Image for post

Imagine an autonomous vehicle traffic sign detector whose accuracy plummets when dealing with rain or unexpected inputs. With machine learning (ML) an increasingly integral part of our daily lives, it is crucial that developers identify such potentially dangerous scenarios before real-world deployment. The rigorous performance evaluation and testing of models has thus become a high priority in the ML community, where an understanding of how and why ML system failures might occur can help with reliability, model refinement, and identifying appropriate human oversight and engagement actions.

The process of identifying and characterizing ML failures and shortcomings is however extremely complex…


Image for post
Image for post

From predictive text to smart voice control, human-machine interfaces have been significantly improved in recent years. Many scientists envision the next frontier as brain-computer-interface (BCI), direct neural connections that leverage the electrical activity in brains captured via EEG (electroencephalography) signals.

In a bid to develop deep neural networks (DNNs) that can better leverage newly and publicly available massive EEG datasets for downstream BCI applications, a trio of researchers from the University of Toronto has proposed a BERT-inspired training approach as a self-supervised pretraining step for BCI/EEG DNNs.

In the paper BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to…


Image for post
Image for post

Apple has revealed system design details for its federated task processing system, demonstrating the potential of federated evaluation and tuning for on-device ML system personalization.

Federated Learning (FL) enables model training on a large corpus of decentralized data, a de-identification approach that addresses public concerns and other issues regarding data privacy, ownership and locality. FL is seeing significant interest from both research and application perspectives, and its potential deployment on end devices such as smartphones and computers — where it can safeguard privacy and improve user experience — is highly desirable.

Now, a research team from Apple has created a…


Image for post
Image for post

It is widely believed in the deep learning community that growing training sets and model size should improve performance. It is therefore beneficial to develop a deeper understanding of the relationships between training set size, computational scale and model accuracy improvements to advance the state-of-the-art.

Issues involving scaling are critical, as the test loss of neural networks scales as a power-law along with model and dataset size. Questions such as why these power laws emerge and what features of the data and models determine the values of the power-law exponents are of considerable theoretical and practical importance, as these components…


Image for post
Image for post

A transformer-based model that achieves state-of-the-art performance on unsupervised protein structure learning is making waves, with esteemed AI researcher Yann LeCun and others in the machine learning and biology communities celebrating the new study.

The development of protein biomolecules, or protein engineering, requires a holistic understanding of protein structure. As sequence variation within a protein family conveys information on the protein structure, approaches to learning protein structure have tended to separately fit models to each family of sequences. …


Image for post
Image for post

Researchers from UC Berkeley and Google Research have introduced BoTNet, a “conceptually simple yet powerful” backbone architecture that boosts performance on computer vision (CV) tasks such as image classification, object detection and instance segmentation.

In the paper Bottleneck Transformers for Visual Recognition, researchers describe BoTNet as a deep learning architecture that enables hybrid models to use both convolutions and self-attention. The design’s key innovation is replacing the spatial 3 × 3 convolution layer in the final three bottleneck blocks of a residual neural network (ResNet) with Multi-Head Self-Attention (MHSA). …

Synced

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store