Ramtin Explains

Sometimes we all wish we had an artificial intelligence expert on hand, to help us parse a difficult article or give us a little more background before we dive into a thought piece. Well, here at Leviathan.ai, our in-house expert on all things AI is Ramtin Seraj. Ramtin is passionate about the accuracy and scalability of machine learning approaches, especially in applications related to Natural Languages.

If you have a question for Ramtin, send it in! We’re starting off with a question from yours truly, Mack Flavelle.

Mack:

So I read this article: it talks about stateless versus stateful models for neural networks in machine learning, the idea being that a stateful model allows the results of the previous tests to be a factor in determining the outcome of future tests.

Then it gives two specific examples (writing like Ernest Hemingway and making a Mario Bros level). In both cases, the first attempt at building new content based on existing data comes out useless, but by the time they’ve run thousands of iterations the results are quite impressive.

Here’s my question — if your algorithm makes a terrible attempt after the first pass, and then uses that terrible attempt as an input for the next attempt, and then extrapolate this 1000x, why don’t you end up with a result that has been driven away from the intended outcome versus getting pulled into the desired outcome?

Ramtin Explains!

Thanks for your question, Mack. I agree this seems really magical, but there are few pieces missing in this article.

Many machine learning models, like Recurrent Neural Networks and HMM models, try to explain the patterns inside a sequence (Structured Prediction Task) using information stored in higher level variables. The more successful they are in explaining these patterns, the more successful they will be in predicting the future observations. These models try to capture information into Hidden Variables (which is similar to memory) by testing themselves over and over, and gradually getting better and better. The “magical” part of these methods is possible because of techniques like Gibbs sampling for training.

The first step towards understanding how this works is understanding why the Expectation Maximization algorithm works. Without going into mathematical proofs, EM iteratively learns values for hidden variables by testing its prediction many times. Many people who work in AI don’t worry too much about understanding the reasons that EM works — instead, they concentrate on understanding its behaviour in different situations. The missing fact is the fact that you may need many random restarts to get to a good enough result. Also, the structure of the model (the number of levels and hidden variables) is something that is selected by testing many different settings. Maybe there’s a life lesson here — if you restart many times, and keep learning from failures, you’ll get to your goals one day!

Another piece to consider is the fact that these examples were really interesting, but the current problem is generalization. These models might seems surprising good at predicting the upcoming label based on previous observations, but if the observation is kind of new, the performance suddenly drops. These models will work great when you have access to a whole lot of data, but unfortunately that’s not the case most of the time.

Another problem with the current Recurrent Neural Networks is they aren’t great for problems with long dependencies. If the current observation is dependent on an observation from far past, for instance if there is a sequence of words, and the current word at the end of sentence is dependent on the first words of the sentence, the model has difficulty capturing the pattern; that’s the main reason that you can not train your model on novels, and receive another novel as output.

Hope that answers your question!

— 0 —

written by Ramtin Seraj