The Best of AI: New Articles Published This Month (April 2019)

10 data articles handpicked by the Sicara team, just for you

Published in

Sicara's blog

7 min readMay 9, 2019

Welcome to the April edition of our best and favorite articles in AI that were published this month. We are a Paris-based company that does Agile data development. This month, we spotted articles about sparse transformers, data-engineering from Uber and a new generation of neural network. We advise you to have a Python environment ready if you want to follow some tutorials :). Let’s kick off with the comic of the month:

Who are you? How did you get in my house?

1 — Mapping for humanitarian aid and development with weakly and semi-supervised learning

Knowing how populations are distributed on earth can be useful to deliver assistance optimally after a disaster. A research group from Facebook Artificial Intelligence worked on that topic. They released a new map of population density in Africa and plan to extend their results to the whole world.

They collected satellite images as well as data from Open Street Map. Then, they trained their models using a test set created manually by labelers. The authors had to use weakly supervised learning as well as semi-supervised-learning in order to face the different challenges of this difficult task. This work shows how recent development in machine learning can create useful data for humanitarian aid.

Read Mapping for humanitarian aid — from Facebook Artificial Intelligence

2— MorphNet: Towards Faster and Smaller Neural Networks

MorphNet is a framework that optimizes the architecture of your neural network. Given a network designed for a task, MorphNet will generate a smaller and faster network, with better performance. To do so, MorphNet computes which neurons are less useful in a network and get rid of them during shrinking phases. It alternates these shrinking phases with expanding ones in order to get a good performance. At the end of the day, you obtain a smaller, but more relevant network.

If you want to know more, the article provides a link to an open source TensorFlow implementation and to a research paper.

Read MorphNet: Towards Faster and Smaller Neural Networks — from Google AI Blog

3— Generative Modeling with Sparse Transformers

Transformers are a type of neural networks that are used to predict what is next in a sequence. It can work on images or audio records for example. Transformers relies on the concept of attention. It detects which part of the input will be used to generate the output. Recently, researchers from OpenAI, developed a new transformer architecture called Sparse Transformers. This architecture requires much less memory than its predecessors. Besides, it sets new records at predicting what comes next in a sequence.

This post provides a link to their research paper as well as some open source code. You can have a look at these two resources as well if you want to learn more about their work.

Read Generative Modeling with Sparse Transformers— from OpenAI Blog

4 —Scaling Uber’s Customer Support Ticket Assistant (COTA) System with Deep Learning

Deep learning can improve the performance of customer support service, and the Uber COTA system is a wonderful illustration of this. The COTA system uses NLP to understand the demands of users. Besides, it creates templates of answers that help Uber employees to deal with these tickets.

Uber released the second version of COTA. This post, both explains the principles of COTA, and what was improved in COTAv2. If you wonder how deep learning can help big companies such as Uber, this is the perfect article to read.

Read COTAv2 — from the Uber Engineering blog

5 — Pyodide: Bringing the scientific Python stack to the browser

Wouldn’t it be nice to be able to use python and all its scientific libraries within your browser? Well, it is now possible thanks to Pyodide, a framework developed by Mozilla which is designed for this purpose. With Pyodide, you can mix the interactivity and universality of Firefox or Google Chrome with the power of all your favorites tools such as numpy or pandas.

The articles explained in details how this tool was built using WebAssembly. It also reviews its performance and its capabilities.

Read Pyodide— from Michael Droettboom on Mozilla Hacks

6 — Mathematicians Discover the Perfect Way to Multiply

The speed of our computer algorithms is intrinsically limited by the time it takes to make an elementary operation such as an addition or a multiplication. The race to find the fastest way to multiply two big numbers started in the sixties. Anatoly Karatsuba found back then a multiplication method faster than what we are taught at school.

More recently, two researchers found for the first time how to run a multiplication in quasi-linear time. Their methods might not be implemented soon in our hardware. However, this is still an interesting breakthrough for mathematics and computer science.

This article provides a nice recap of this area of research. It will show you how a task as simple as multiplication can lead to tremendous research in mathematics.

Read The Perfect Way To Multiply — from QuantaMagazine

7 — MuseNet

MuseNet is the latest deep neural network that can generate music from OpenAI. It re-uses kernels from Sparse Transformer (also in this best of AI!) and was trained by trying to predict what comes next in a sequence of music.

You can specify a given style (i.e the name of a composer) and a specific tune, and MuseNet will generate a 4-minute sample composition mixing these two requirements. This article shows you the t-SNE 2D-map of the embedding of the music styles learned by the network. You can try MuseNet online to generate music samples!

Read MusNet — from OpenAI

8 — Robots that can sort recycling

Computer vision is a nice technology that might help the recycling industry by creating robots that can sort waste for us. As explained by the authors of this article, it is extremely difficult to detect if a cup is made of plastic or of paper only from a picture of it. As a human, to make this distinction, you often need to grab the cup and feel it.

Meet RoCycle, a robot that does exactly this job. It was created by a MIT team that designed a specific robot hand combined with sensors that can distinguish plastic from paper and from metal. While their accuracy does not seem to reach human level yet, this technology might be able to create fully automatic sorting factories tomorrow.

Read Robots that can sort recycling — from MIT News

9 — How a Google Street View image of your house predicts your risk of a car accident

A lot of information can be derived from Google Street Views. One of them is how likely people are to be involved in a car accident. A team of researchers from Stanford University compared the data of a car insurance company with predictions made by their model based on people house image in google street view. Doing so, they obtained a better result than state-of-the-art insurance models.

The model itself can be found in their research paper and is rather straightforward. The most interesting point of this article is to question how personal data should be used, and what are the consequences for people privacy if insurance companies start using this kind of approach.

Read How a Google Street View image of your house predicts your risk of a car accident — from Technology Reviews

10 — Consistent Data Partitioning through Global Indexing for Large Apache Hadoop Tables at Uber

Companies like Uber, face extremely challenging data engineering problem. Have you ever wondered how they navigate through their huge amount of data efficiently? Do they use open-source frameworks, or do they build their own?

In this article, the authors explain that while Uber relies on Apache Hadoop technology, they had to build their own component Global Index in order to consistently find the location of their data. This is an extremely well-written article that will give you a better grasp of how Uber manages its data.

Read about Consistent Data Partitioning through Global Indexing for Large Apache Hadoop Tables at Uber— from Uber Engineering

We hope you’ve enjoyed our list of the best new articles in AI this month. Feel free to suggest additional articles or give us feedback in the comments; we’d love to hear from you! See you next month.

Read the March edition
Read the February edition
Read the January edition
Read the December edition

Read the original article on Sicara’s blog here.

By the way, we published these articles on our blog in April

How Apache Airflow Distributes Jobs on Celery workers

The life of a distributed task instance

blog.sicara.com

Basics in R Programming

You are about to begin a project on R? Before you watch any tutorial, read these basic standards.