Image for post
Image for post

Since their introduction three years ago, transformer architectures have become the de-facto standard for natural language processing (NLP) tasks and are now also seeing application in areas such as computer vision. Although many transformer architecture modifications have been proposed, these have not proven as easily transferable across implementations and applications as hoped, and that has limited their wider adoption.

In a bid to understand why most widely-used transformer applications shun these modifications, a team from Google Research comprehensively evaluated them in a shared experimental setting, where they were surprised to discover that most architecture modifications they looked at do not…


Image for post
Image for post

Content provided by Zhuoran Shen, co-author of the paper Fast Video Object Segmentation using the Global Context Module.

We developed a real-time, high-quality semi-supervised video object segmentation algorithm. Its accuracy is on par with the most accurate, time-consuming online-learning model, while its speed is similar to the fastest template-matching method with sub-optimal accuracy. The core component of the model is a novel global context module that effectively summarizes and propagates information through the entire video. Compared to previous approaches that only use one frame or a few frames to guide the segmentation of the current frame, the global context module…


Image for post
Image for post

Understanding and generalization beyond the training distribution are regarded as huge challenges in modern machine learning (ML) — and Yoshua Bengio argues it’s time to look at causal learning for possible solutions. In the paper Towards Causal Representation Learning, Turing Award honoree Bengio and his research team make an effort to unite causality and ML research approaches, delineate some implications of causality for ML, and propose critical areas for future research.


Image for post
Image for post

The COVID-19 pandemic has sent shockwaves around the planet and continues to affect many aspects of our everyday lives. Amid the disruptions and uncertainties, however, and despite a retail and e-commerce slowdown caused by lockdowns and economic distress, a Research and Markets report identifies a burgeoning sector — noting the “strong order intake of warehouse automation” throughout 2020.

Leading Chinese autonomous mobile robot (AMR) manufacturer and AI research hub Quicktron is among the warehouse automation solution providers meeting this growing demand across a variety of industries.

Founded in 2014, Quicktron is headquartered in Shanghai, China. The company mainly focuses on…


Image for post
Image for post

In a new paper, a research team lead by Geoffrey Hinton combines the strengths of five advances in neural networks — Transformers, Neural Fields, Contrastive Representation Learning, Distillation and Capsules — to imagine the idea of a vision system, “Glom,” that enables neural networks with fixed architectures to parse an image into a part-whole hierarchy with different structures for each image.


Image for post
Image for post

Transformer architectures have shown great success across machine learning (ML) tasks in natural language processing and beyond, but have mostly been limited to tasks from a single domain or specific multimodal domains. For example, ViT is exclusively for vision-related tasks, BERT focus on language tasks, and VILBERT-MT works only on related vision-and-language tasks.

A question naturally arises: Could we build a single transformer capable of handling a wide range of applications in different domains over multiple modalities? …


Image for post
Image for post

Attention architectures are pushing the frontier in many machine learning (ML) tasks and have become a building block in many modern neural networks. Our conceptual and theoretical understanding of their power and inherent limitations however remains nascent. Researchers from Microsoft and Université de Montréal set out to capture the essential mathematical properties of attention, proposing a new mathematical framework that uses measure theory and integral operators to model attention and quantify the regularity of attention operations.


Image for post
Image for post

Today’s large language models have greatly improved their task-agnostic, few-shot performance, with top models like GPT-3 competitive with state-of-the-art finetuning approaches when provided only a few examples in a natural language prompt. This few-shot, “in-context” learning approach is gaining traction in large part due to its ability to learn without parameter updates. Compared to traditional finetuning methods, few-shot learning enables practitioners to more quickly prototype NLP models, allows non-technical users to create NLP systems, and efficiently reuses models to reduce system memory and complexity.

GPT-3’s accuracy however can be highly unstable across different prompts (training examples, permutation, format). To address…


Image for post
Image for post

Imagine an autonomous vehicle traffic sign detector whose accuracy plummets when dealing with rain or unexpected inputs. With machine learning (ML) an increasingly integral part of our daily lives, it is crucial that developers identify such potentially dangerous scenarios before real-world deployment. The rigorous performance evaluation and testing of models has thus become a high priority in the ML community, where an understanding of how and why ML system failures might occur can help with reliability, model refinement, and identifying appropriate human oversight and engagement actions.

The process of identifying and characterizing ML failures and shortcomings is however extremely complex…


Image for post
Image for post

From predictive text to smart voice control, human-machine interfaces have been significantly improved in recent years. Many scientists envision the next frontier as brain-computer-interface (BCI), direct neural connections that leverage the electrical activity in brains captured via EEG (electroencephalography) signals.

In a bid to develop deep neural networks (DNNs) that can better leverage newly and publicly available massive EEG datasets for downstream BCI applications, a trio of researchers from the University of Toronto has proposed a BERT-inspired training approach as a self-supervised pretraining step for BCI/EEG DNNs.

In the paper BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to…

Synced

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store