AI in 2018 for developers

Alex Honchar
The Startup
Published in
10 min readJan 2, 2018

Hi again! In last article I tried to show my vision on what research areas are maturing and can grow big this year. Research is cool, but there must be something from AI world that became mature in 2017 and is ready now to be used in mass applications. This is what this article will be about — I would like to tell about technologies that are good enough to be used in your current work or to build your own startup based on them. Important note: this is the list of AI areas, algorithms or technologies that are ready to use right now. For example, you can see time series analysis in the list, because deep learning is rapidly replacing previous state of the art in signal processing. But you can’t see reinforcement learning here, even it supposed to be cooler, because in my humble opinion you can’t really use it right now in some industrial applications, but it’s amazing and growing research field.

And I just want to remind you that this is series of three articles, where I am sharing my vision about what will happen in the next year in AI from three different points of view in two articles:

Hope you’ll enjoy reading and choose something for yourself!

P.S. I am not talking about recognizing images and some simple computer vision here, ya’ll already doing it for years :)

GANs and Fakes

Even generative adversarial networks were created several years ago, I was pretty skeptical about it. Years were passing, I was still skeptical even I saw a huge progress in generating all that 64x64 images. I became more skeptical after reading mathematical articles telling that GANs don’t really learn the distribution. But this year something changed, first of all new interesting architectures (like CycleGAN) and mathematical improvements (Wasserstein GAN) made me try GANs in practice and they work more or less fine, but after following two applications I am convinced that we can and must use them for generating things.

First of all I liked a lot NVIDIA’s research paper on generating full-HD images that really look realistic (comparing to 64x64 creepy faces a year ago):

But what I really liked (as a perfect teenage dream application) and was impressed with was generating fake porn videos:

I see a lot of applications in gaming industry, like generating landscapes, heroes and even whole worlds with GANs. And I think we must be aware of totally new level of fakes, starting from porn with your relatives and finishing with totally fake people online (and maybe very soon offline?)

Unique format for all neural nets

One of the problem of modern development (not only in AI industry) that we have dozens of different frameworks doing literally same things. Today every big company that does some machine learning has to have it’s own framework: Google, Facebook, Amazon, Microsoft, Intel, even Sony and Uber, and a lot of others open source solutions! And in single AI application we would like to use different frameworks, for example Caffe2 for computer vision, PyTorch for NLP and Tensorflow/Keras for some recommender system. Merging them all takes a lot of development time and it distracts both data scientists and software developers from working on more important tasks.

The solution must be a unique format of neural networks that can be easily obtained from any framework and must be deployed easily by developers and be used easily by scientists. And here we meet ONNX:

In fact it’s just s simple format for acyclic computational graphs, but in practice it gives us opportunity to deploy really complex AI solutions, and what I personally find very attractive — people can develop neural networks in frameworks like PyTorch, that don’t have strong deployment tools and not being depending on Tensorflow ecosystem.

Zoos explosion

Three years ago the most exciting thing in AI world for me was Caffe Zoo. I was doing a lot of computer vision back in the time and I was trying all that models and checking how they work and what they do. A bit later I used that models for transfer learning or features extractors. Recently I have used two different open sourced models just as parts of big computer vision pipeline. What does it mean? It means that practically there is no need to train your own nets, for example, for ImageNet objects recognition or place recognition, these basic things just can be downloaded and plugged-in to your system. Apart of Caffe Zoo, there are also such zoos for other frameworks, but what amazes me a lot that you can just plug-in models in computer vision, NLP and even accelerometer signal processing right for your iPhone:

I think that these zoos will be just growing and taking into attention appearing of such ecosystems as ONNX they will be more centralized (and decentralized with ML blockchain apps too)

AutoML replacing pipelines

Designing a neural network architecture is a painful task — sometimes you get decent result just with stacking convolutional layers, but most of the time you need to design width, depth and hyperparameters very carefully using both intuition and hyperparameter search methods like random search or Bayesian optimization. Especially it’s hard when you’re working not in computer vision, where you can finetune some DenseNet trained on ImageNet, but some 3D data classification or multivariate time series applications.

There were a lot of different attempts to generate neural network architectures from scratch using another neural network, but the nicest, cleanest for me is recent Google Research development:

They used it to generate computer vision model that works better and faster than human designed net! I am sure that there will be a lot of papers and open-sourced code on this topic soon. And I think we will see much more blog posts or startups instead of “we have developed an AI that does…” telling “our AI created AI that learnt otherAI that does…”. At least that’s what I am going to do in my projects and I believe I am not the only one like this.

Formalizing intelligence stack

Of this concept I’ve read a lot in Anatoly Levenchuk’s, Russian system analyst, coach and AI enthusiast, blog. On the image below you can see an example of what can be called an “AI stack”:

http://www.tvmlang.org/2017/10/06/nnvm-compiler-announcement.html

It consists not just from machine learning algorithm and your favorite framework, it’s much deeper and in every level there are their own developments and researches.

I think AI development industry is mature enough to have much more different experts. Having a single data scientist in your team is simply not enough — you need different people for hardware optimization, neural network research, compilers for AI, solution optimization, production implementation. And above them must be different team leads, software architects (that have to designed above drawn stack for each problem separately) and managers. I’ve mentioned this concept to give some sort of vision where technical specialists in AI area can grow in the future (for those who want to become software architects in AI or tech leads — you need to know what to study).

Voice based applications

The list of problems that AI can solve with >95% of accuracy is pretty short: we can recognize images into 1000 categories, we can say if text is positive or negative and we can do a bit more complicated things around that. I think that one more area that is ready to be disrupted with thousands applications is voice recognition and generation. Actually it was pretty good a year ago after releasing of DeepMind’s WaveNet, but now thanks to Baidu’s DeepVoice 3 and recently developed in Google Tacotron2 we’re far away:

Very soon this technology will be released (or replicated by some smart guy) in open source and everyone will be able to recognize voice and generate it with very high accuracy. What’s waiting for us? Better personal assistants, automatic book readers and negotiation transcripters and yes, voice fakes.

A bit smarter bots

There is a big problem with all bots we see today — 99% of them are not AI based at all, they’re just hard-coded. it happened that way because we realized that we can’t simply train some encoder-decoder LSTM with attention on millions of dialogs and have some intelligent system. That’s why most of bots in Facebook Messenger or Telegram have just hard-coded commands or, in the best case, have some LSTM and word2vec based sentence classification neural network. But modern state of the art NLP is a bit beyond that level. Just check out what interesting research is already done in Salesforce:

They’re building NLP interfaces to databases, overcoming modern encoder-decoder autoregressive models, training embeddings not just for words or sentences, but for characters too. moreover, there is an interesting research on optimizing NLP scores as ROUGE using reinforcement learning.

I believe that with these developments we can enhance our bots at least with much more intelligent information retrieval and named entity recognition and, probably, fully deep learning driven bots in some closed domain.

Time series analysis state of the art

After Salesforce second very underrated in public machine learning research lab is Uber AI Labs. Some time ago they published a blog where they showed their approach to time series forecasting. To be honest it flattered me, because I was using in my applications basically the same approach! Have a look, this is amazing example of bringing together statistical features and deep learning representations:

If you need more motivational example here is diagnosing arrhythmias with 34-layer 1D ResNet. And the coolest part is about performance — it doesn’t simply work better than some statistical models, it even outperforms professional cardiologists diagnoses!

I am engaged a lot lately in time series analysis with deep learning and I can personally confirm that neural networks work extremely well, you can easily get 5–10x better performance comparing to “golden standards”. It simply works :)

Optimization beyond built-ins

How do we train our neural networks? Let’s be honest, most of us just use some “Adam()” and standard learning rate. Some smart guys choose the most appropriate optimizer and tune and schedule learning rate. We always underestimate optimization topic because we simply press “train” button and wait until our net converges. But in the world where we all have more or less equal opportunities in terms of computational power, memory and open sourced solutions the winner is the one who can from the same amazon instance and developed in Tensorflow model get the best performance in shortest time — and this it all about optimization.

I encourage you to have a look at amazing Sebastian’s Ruder blog post above with some recent 2017 developments on how to fix standard optimizers that you simply use out of the box and some other simple improvements that are extremely useful.

General hype slowdown

cdn.aiindex.org/2017-report.pdf

What this picture can tell us, especially after reading previous points in this article? It’s not so easy to develop something new and valuable and get a lot of money for it, taking into account how many open source tools and algorithms are released. I think that 2018 won’t be the best year for startups like Prisma — there will be too many competitors and too many “smarties” who can take today open-sourced network and deploy it as mobile app and call it a startup.

This year we have to concentrate on fundamental things rather than quick money — even if we plan to use Google’s Ratacon for voice recognition for some audiobook startup it can’t be a simple web service, but a business with partners, business model to get some investments :)

Conclusion

Let me be simple — we have several technologies ready to be used in real products: time series analysis, GANs, voice recognition, some NLP advances. We shouldn’t design basic architectures for classification or regression anymore, because AutoML will do it for us. I hope that with some optimization improvements AutoML will do it much faster than it was before. And with ONNX and model zoos we will inject basic models to our apps with two lines of code. I think that making AI based apps, at least on current state of the art level became very easy, which is not bad at all for the whole industry! For research areas that can shoot this year check my previous post. Stay tuned :)

P.S.
Follow me also in Facebook for AI articles that are too short for Medium, Instagram for personal stuff and Linkedin!

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by 280,345+ people.

Subscribe to receive our top stories here.

--

--