Brave Times in the 2nd Conversational Intelligence Summer School

Published in

DataLab Log

9 min readJul 22, 2019

In this post we (Beatriz & Estêvão), both Data Scientists at the DataLab Serasa Experian, will share our experience in the 2nd Conversational Intelligence Summer School, which we had the opportunity to join in June. The Summer School aimed to bring together researchers from around the world to learn together and discuss the frontiers of knowledge and State-Of-The-Art (SOTA) in the field of conversational intelligence.

The CISS and ourselves

Conversational Intelligence is related to the production of agents that can interact with humans via language. The subject is ingrained in our society’s earliest depictions of artificial intelligence: It is intelligent that which speaks to us. Since then, many attempts to bring natural language to machines have failed. In the modern age of machine learning, the challenge has been understood as much more complex than initially realised, especially because of the multifaceted uses of language. In contemporary research, the challenge is tackled by Deep Learning techniques, that aim to leverage huge amounts of data to solve specific problems. The field advances quickly, and every year there are new techniques that render their predecessors obsolete.

Here at the DataLab, we both worked with Conversational Technologies (the ConvTech squad), and our research aimed at finding new markets and possibilities for these technologies. In our work we have found, sadly, that most Deep Learning techniques can not be readily applied to industrial settings, because of their unreliability — their generativeness enables them to surprise their creators. We had worked with simpler models, putting chatbots into production on Telegram and WhatsApp, to solve specific problems of interacting in a platform, and were very interested in the possibility of learning really SOTA techniques and assess whether some of them can be transposed to industrial settings.

The selection

Because both of us were enrolled in Master’s degrees, we could apply for scholarships. If we were able to reach top-30 between the applicants, we would receive exemption of the $900 inscription and also receive free accommodation and breakfast at the UM Lowell. The process required classifying texts, which is a relatively simple task. However, we would need to code the preprocessing pipeline from scratch, as well as a deep neural network to classify them, with back-propagation and gradient descent. The task may not require hard computational skills, but each of us did very clean and efficient code to make sure our applications would be selected. We were indeed selected, and then rewarded by the DataLab with the airplane tickets to go and return from Massachusetts.

Content and Schedule

Schedule

The Summer School took place from June 23th to June 29th, five days of intensive lectures and immersive coding, and the last day dedicated to the presentations and awards. Our daily routine was pretty busy. It basically consisted of attending to theoretical and practical lectures during the day and working on our group project during the night. Thursday was an exception: after the first lecture in the morning, we had the opportunity to participate in a guided tour where we got to meet Lowell’s touristic spots and museums. However, we still had tutorials late in the afternoon, and worked on our group project during the night. Everyday, lectures started at 9 a.m. and we were expected to work on our project at least until 08.30 p.m — although in practice we stayed up coding until ~ 11 p.m.

Contemporary research

The lectures were divided into three parts. Everyday, from 9 to 11 a.m., we had theoretical lectures regarding neural networks models, following a storyline of improvements in the area of conversational intelligence. We started pretty slow with an introduction to neural networks, passing through convolutional and recurrent models, word embeddings using Tf-IDF, Word2Vec, moving forward to encoder-decoders, Attention and Memory-based models, finally reaching the problem of Dialog Diversity and tackling it with Variational Inference.

Later, from 11:30 a.m. to 1 p.m., we had practical lectures executed on Google Colab notebooks in which we filled missing code. There were two distinct tracks for these practical exercises: one dedicated to PyTorch, and another dedicated to both TensorFlow and DeepPavlov — A TensorFlow library created by the school organizers and focused on conversational AI. We decided to go with the second one, since we had this single opportunity of working on a new library with its own creators.

Jason Weston — Research Scientist Facebook (Speaker of Day 5)

The third section of lectures was dedicated to different invited talks that aimed to expose the current most challenging difficulties in the research of dialogue intelligence. With this we learnt that the problem of evaluation in dialogue systems is not an easy one. Current automatic metrics weakly correlate with human judgements, making the task of creating intelligent conversational agents much more difficult. We have also been introduced to the problem of visual captioning and learnt about the responsibility that is to control biased datasets since the algorithm can learn undesired patterns like assuming instantly that a person with a laptop is a man. We also learnt that although we’re rapidly acquiring better and better results in this area, even huge companies like Facebook feel overwhelmed about how far we still have to go to build real intelligent conversational systems.

The competition

Every day we worked on a project, aiming to build an interesting bot. We were between 40 and 50 participants at the school, and we divided ourselves into about 10 groups. The projects would be presented on Saturday, and the top 2 would present a second time to the UMass Lowell’s chair of Computer Science and receive certificates. The selection would be based on three criteria: Implementation (does the bot work?), Ambition and Performance (how well it works). All this would be evaluated during a 5~10 minutes pitch and a live demo afterwards.

An Agile Project

Time was short, and we needed to make something that met the criteria — a bot that worked. For this purpose, we knew we needed to iterate functioning software. We decided to always work with incremental working versions of conversational bots, as opposed to following a single, waterfall-like, huge project. Hence, we would avoid arriving on Saturday with an almost-ready too-ambitious project. Monday was reserved for group organization and project definitions, and we grouped up with a Romanian called Manuel, whom we had met at Sunday during the travel from Boston to Lowell. Manuel had expertise on embeddings from his PhD, and we agreed to start our project applying GloVe embeddings to tackle a Question Answering problem, using the Stanford Question Answering Dataset (SQuAD).

We agreed to start small, but we didn’t spend too much time working on our next steps. Honestly, because there was a lot of content to assimilate from the lectures, we were okay with the idea of working on a straightforward project, that enabled us to apply some new ideas, without too much competitiveness. We prepared our virtual environments and github, read the SQuAD paper, and started coding.

The first results did not impress us. We had trained a simple neural network with the whole dataset and it took ~10 minutes to converge. In the end, the increment above chance performance was small if any. That was fine, we expected it: the dataset was very unbalanced and it was the zeroth release. Unfortunately, when we thereafter weighted the classes, the improvement was minimal. Another 10 minutes and no result. It was already Tuesday. We would have to iterate faster if we wanted to make something worthy. We would need GPUs.

There were a dozen machines with GPU ready to use in the university, but nobody could get them to work properly — users had not enough space, and got kicked out of it every ~20 minutes. Some were subsampling half a percent of their datasets to fit the narrow time in the GPU, and couldn’t get anything promising. They’re all planning their own workaround. The time was enough to run our zeroth version, but if we made bigger networks or State-of-the-art architectures, it would fall short. We decided we had to plan our next steps, with no more experimenting and exploring.

New planning

We brainstormed and discussed, and decided to avoid training at all. It would take too long, and disable our iterative progress once and for all. We had to pivot, and if our goal was to answer questions based on the SQuAD dataset, then we should get some huge model pre-trained on it. It was a big direction change, and we entertained the consequences: we couldn’t satisfy ourselves with performance metrics anymore, because the model was not ours, and we would have to think of some way of making it part of something bigger…

We needed this change: we had overestimated our time, and still had no idea of how to make an interesting question-answering bot. We followed the pre-trained idea and went back to brainstorming use-cases. We integrated our bot with Telegram, and defined intermediate products that increased progressively in complexity.

The Epic Moment When The Oracle Bot Was Born

In the end, we wanted to combine two pre-trained models in some way. However, we first needed them working. Our first bot answered questions about friends. It had some paragraphs of the wikipedia article for Friendship, and used them as context for the pre-trained (at SQuAD) R-Net. The second bot used only a pre-trained GPT-2, a more complex model, and answered a continuation to user-provided texts. The third bot, still only GPT-2, used hidden seed texts to babble about the future of the user (the user’s name was contained in the seeds). Finally, in the fourth and last bot, we hid the generated future from the users and only answered questions about it, using the first bot’s R-Net. The Oracle Bot was born. The iterative process was over and we felt we had a shot.

Results

It was really late when we finally got the last bot to work. And although we were having a lot of fun talking to our newborn baby Oracle, we knew we could stumble upon some problems due to the delay in the prophecy making process. Our group tested it a few times and it was taking about 2 uneasy minutes to generate each individual prophecy, but once this step was done, the Q&A component was promptly taking action and the Oracle was ready to enlighten our paths.

During the next morning, we presented our creation to our summer school colleagues and judges. We were really excited and all this excitement reflected in our contagious pitch. We shared the Oracle’s Telegram user for everybody and saw a few desperate hands reaching their phones, anxious for some wisdom. “Yes!! We did it!” — we thought. We saw how engaging this idea was, some competitors were even cheering for us!

But then…

We realised that our pretrained-gpt2-running-on-a-colab-notebook bot wouldn’t take that amount of people. Participants started asking what was going on and why it was taking so long. It was frustrating. Fortunately, we had some dialogues in our history to demonstrate its functioning. We sure lost some points because of this unfortunate time consuming demo.

Winner

It was finally time to announce the winners. The judges started by announcing the second place, and….

It was us!

The first place was given to another bot, the Visual Dialog Game by Thomas Depierre, Sashank Santhanam, Mark Seliaev and Jon Ander Campos. The bot displays four pictures and memorizes one of them. After that, the user has to guess which of the pictures the bot has memorized by asking questions like “is there an animal in it?”. It was very cool and was working much faster than ours!

Nevertheless, everything was so fun and we enjoyed so much the whole process, winning the competition would be like icing on the cake. But still, second place was awesome. When we started the whole thing we didn’t even think we had a chance!

If you want to know more about our bot, check this blog post.

We want to thank the DataLab for supporting us and making this experience possible. We also want to thank the CISS organizers for creating such an incredible course. Also, thanks to Manuel Ciosici who joined us and made everything funnier.