Word2Vec and Friends

Interview with Social Data Scientist Bruno Gonçalves

Bruno Gonçalves is currently a Moore-Sloan Fellow at NYU’s Center for Data Science. With strong expertise in using large scale datasets for the analysis of human behavior — his sources range from raw Apache web logs, Wikipedia edits, Twitter posts epidemiological reports and Census data to analyze and model Human Behavior and Mobility — connecting macro and micro elements to solve problems. He’s edited “Social Phenomena: From Data Analysis to Models” and co-authored “Twitterology: The Social Science of Twitter” due to be released in 2018.

We’re proud to present Bruno as a speaker at AI With the Best online developer conference 29–30th April and looking forward to his talk covering the implementation and mathematics behind Word2Vec and the emerging field of <anything>2vec (phrase2vec, doc2vec, dna2vec, node2vec, etc…).

For now we got to ask him a few questions.

Q What personally motivated you to begin your work in machine intelligence?

My career was always devoted to understanding the everyday world and how it both impacts and is affected by our behavior. Fortunately, the same modern technologies that simplify our daily lives also provide a unique window into our view of the world and into our behavior through the data we generate. Data Science and Machine Intelligence are then the natural tools to use if we are to make progress.

Q How has the field of data science changed since you began working in it?

I would say that the biggest change has been the recognition of Data Science as an actual field worthy of persuit. Like many in this area, my background is in the Physics of Complex Systems and, qualitatively, what I’ve been doing for more than a decade hasn’t changed much. I’m still using studying human behavior with the help large datasets and borrowing approaches and ideas from computer science and machine learning. However, the attention and emphasis that has been placed in this kind of work over the last few years has resulted in an important increase in the amount of resources invested and in the development and popularization of many new tools (sklearn, tensorflow…) and techniques (deep learning, tensor factorization…) that have made possible many things that were practically unthinkable just a few years ago.

Q Which is the most surprising application of Word2Vec architecture — node2vec?

The surprising thing about word2vec is its overall (apparent) simplicity. Here is a simple shallow (single hidden layer) neural network that is able to capture something fundamental about the syntax of human language simply by looking at the context in which words are used in large corpora of text. The resulting vector representations of words can then be used to not only explore semantic relations between words, but also to help with translation, query expansion, document summarization, etc… Its simplicity, allied with the usefulness of the embeddings it produces have naturally resulted in a flurry of activity and variations in very diverse fields. For me, personally, the most fascinating application was in dna2vec where it was used to better understand relations between different parts of the genome, the language of life.

Q What advice would you give to budding AI developers?

Newton famously said that (paraphrasing) he was only a boy playing on the sea-shore of the great ocean of truth. I firmly believe that when it comes to Artificial Intelligence and Machine Learning we are still at a similar stage of playing by the shore and that the ocean is still there, untouched. So my advice to AI researchers is to play and maybe try to take a dip in the ocean. Be curious, try new things, and don’t get disparaged by the apparently mathematical complexities. Fundamental ideas are simple if you manage to see through the abstruse notation and practical difficulties.

Q Are you excited about speaking at AI With The Best? What made you want to be a speaker?

Definitely. It will be a unique opportunity to reach a global audience and try to infect them with my enthusiasm for this amazing toolset.

Thank you Bruno!

You can ask Bruno your own questions, and explore more language parsing during his talk Word 2 Vec and Friends at our upcoming AI With The Best, Online Developer Conference 29–30th April.