The Best of AI: New Articles Published This Month (August 2018)

10 data articles handpicked by the Sicara team, just for you

Nicolas Jean
Sicara's blog
7 min readSep 12, 2018

--

Welcome to the August edition of our best and favorite articles in AI that were published this month. We are a Paris-based company that does Agile data development. This month, we spotted articles about AutoML, Alexa, Machine Learning and Data Engineering techniques. We advise you to have a Python environment ready if you want to follow some tutorials :). Let’s kick off with the comic of the month:

Update: It turns out the cannon has a motorized base, and can make holes just fine using the barrel itself as a battering ram. But due to design constraints it won’t work without a projectile loaded in, so we still need those drills.

1 — AutoKeras: The Killer of Google’s AutoML

Yes, Machine Learning (ML) is great. And easy to use too, now, thanks to Automated Machine Learning (AutoML). Before AutoML, You had to understand the basic ML models. You had to invest time in tweaking them so that they perform well. Auto-Keras is a framework for Automated Machine Learning based on the popular ML framework Keras. With Auto-Keras, you can achieve state of the art performance without Machine Learning knowledge. I am definitely going to try it! Speaking about Keras, we’re preparing a blog article about Convolutional Neural Networks using Keras. Don’t forget to follow us!

Read AutoKeras: The Killer of Google’s AutoML — from George Seif

2 — Getting Alexa to Respond to Sign Language Using Your Webcam and TensorFlow.js

This article explains how to build an entire system that makes Alexa answer to requests made in sign language! Which building blocks are needed to create such a system? How to train a neural network to make it recognize signs? What problems can you encounter? What tricks can you use to get around them? What tools to use? This very interesting article covers all these questions, and more. Plus, the end result is cool!

Read Getting Alexa to Respond to Sign Language Using Your Webcam and TensorFlow.js — from TensorFlow

3 — 9 Things You Should Know About TensorFlow

TensorFlow is a powerful framework for building machine learning applications. If you want to use TensorFlow, I recommend that you read this article. You will learn that you can use TensorFlow with Keras to build and train neural networks in just a few lines of code. You can use TensorFlow with TPUs for a massive boost in computation power. You have plenty of examples that you can run in your browser to get started. And I still haven’t spoiled all the article!

Read 9 Things You Should Know About TensorFlow — from Cassie Kozyrkov

4 — A small team of student AI coders beats Google’s machine-learning code

The DAWNBench benchmark tracks the speed of learning per dollar of computing power. So this benchmark evaluates “how fast” a model learns from data. Google used to have the best algorithm on this benchmark until a team from Fast.ai beat it! They built real fast AI! This article explains how they did it. It shows that great feats in AI are not the privilege of the biggest companies.

Read A small team of student AI coders beats Google’s machine-learning code — from Will Knight

5 — The Beginner’s Guide to Dimensionality Reduction

Dimensionality Reduction is a useful technique. It allows looking for hidden structure in the data. It’s always useful to understand your data better! This article is the ideal introduction to this technique. You will find all the information you need to start using Dimensionality Reduction, and no more. There is also a nice comparison of 3 popular algorithms, with the pros and cons of each one. And — the icing on the cake — it is beautifully illustrated!

Read The Beginner’s Guide to Dimensionality Reduction — from Matthew Conlen and Fred Hohman

6 — The Best Books on Computer Science for Data Scientists

If you are on the road to improve your data science skills, you may like this advice on what you should read. This list of 5 books makes a lot of sense. As a data scientist, you have to understand computing and algorithms. You have to write code that works and is easy to understand by others. And you have to communicate your findings well too. I already read the number 3 (The Pragmatic Programmer) and loved it. There is an anime about the number 1, please click here if you want to lose 2 minutes of your life.

Read The best books on Computer Science for Data Scientists — from fivebooks.com and then start reading one of the books suggested :)

7 — Welcome to Great Expectations!

If you have a data pipeline in your project, you may have encountered pipeline debt already. Like other kinds of technical debt, a great antidote is testing. But how do you test your data? Great Expectations is a python project that allows writing tests on data. You can test many things on your data: shape, type, range, mean, missing values, date format, and more. Currently, Pandas and SqlAlchemy are supported. Will our great expectations about data quality come true?

Read Welcome to Great Expectations! — from Great Expectations

8 — Deploying a Machine Learning Model as a REST API

I read many tutorials about developing Machine Learning models. I chose to share this article because it focuses on the next step: making the model ready to be integrated in an application. This is no less important than knowing how to build the model in the first place. Exposing your ML model as a REST API allows to integrate it into applications more easily. The article will show you the way of doing it with Flask.

Read Deploying a Machine Learning Model as a REST API — from Nguyen Ngo

9 — How to Run Parallel Data Analysis in Python using Dask Dataframes

If you know Pandas, maybe you have already had the following experience. You load a big dataset into a Pandas dataframe. You start doing some basic operations on it. But then it freezes for a minute or more. Argh! Yes, treating huge amount of data can get slow on a single core. One solution is to parallelize the work on several cores using Dask Dataframes! This article describes how to use Dask Dataframes to speed up considerably the processing of large datasets. We can’t bear waiting too much!

Read How to Run Parallel Data Analysis in Python using Dask Dataframes — from Luciano Strika

10 — 15 Statistical Hypothesis Tests in Python (Cheat Sheet)

I like cheat sheets. I have one for git commands, for vim, etc. And now I found this one I will adopt, and share with you. There are too many statistical tests in the world. I think this cheat sheet can be really helpful in choosing the right one. It lists the most useful tests along with their assumptions and results interpretation. It also shows Python code snippets for each test. You need some prior knowledge about statistical hypothesis tests to use it effectively though.

Read 15 Statistical Hypothesis Tests in Python (Cheat Sheet) — from Jason Brownlee

We hope you’ve enjoyed our list of the best new articles in AI this month. Feel free to suggest additional articles or give us feedback in the comments; we’d love to hear from you! See you next month.

Read the July edition
Read the June edition
Read the May edition
Read the April edition
Read the Mars edition

Read the original article on Sicara’s blog here.

--

--