By Mo Islam, Partner, Threshold Ventures
I just returned from the annual Neural Information Processing Systems conference in Vancouver. A lot has changed since the last one I attended in 2015. NeurIPS (formerly called NIPS) has grown from 3,700 to over 13,000 attendees to become the largest machine learning research conference in the world. I also noticed that the number of corporate and startup companies looking for top-tier machine learning talent has exploded.
A couple of standout talks from the many I heard were: 1) Celeste Kidd’s presentation on human belief formation and how machine learning algorithms influence what we know, and 2) Yoshua Bengio channeling Daniel Kahneman to discuss machine consciousness, agency and task generalization encompassing System 2 deep learning. Most of the talks at NeurIPS are highly technical, but these two are digestible for a general audience. They give us important insight into what the next 10 years of machine learning will look like, and how it will impact our lives.
I attended NeurIPS this year to see how the state-of-art has evolved in machine learning. Undoubtedly, the most exciting wave in the industry right now is the advancement of natural language processing (NLP). I have been following the industry for years, but in the last year we have seen tremendous progress in the technology, and I’m convinced we are entering what will be a golden age for NLP. Similar to the major advancements in computer vision over the last several years, NLP has hit a threshold in performance that unlocks a number of new products and services to be built.
Algorithm development largely funded by big tech companies and distributed through open-source software is quickly advancing NLP technology. One much talked about technology is BERT (Bidirectional Encoder Representations from Transformers), a state-of-the-art pre-training technique for NLP, developed by Google. BERT is huge for the entire industry. NLP is a diverse field that requires many task-specific datasets — the shortage of training data for particular tasks has been a massive cold-start challenge. BERT, trained on the Wikipedia corpus, jumpstarts NLP model building for developers and data scientists, allowing them to use small datasets for fine-tuning specific NLP tasks. It relies on a Transformer, a novel neural network architecture that can directly model relationships between all words in a sentence, and bi-directionality, an old idea that for the first time was used to pre-train a deep neural network.
The competition among the tech giants for NLP preeminence and the natural collaboration engendered by open-source software paved the way for greater progress over the past year. Facebook AI took BERT and created its optimized better half, RoBERTa. Baidu was inspired by BERT and produced the currently best performing model, shown below on the leaderboard for GLUE (General Language Understanding Evaluation), a benchmark for evaluating NLP models. Named ERNIE after Bert’s pal on Sesame Street, the model excelled at Chinese language tasks. Although not as Sesame Street-friendly, Microsoft also came out with a model inspired by BERT named MT-DNN. OpenAI recently released GPT-2, another Transformer-based model, with 1.5B parameters trained on 8M web pages. You can play with it here to complete your sentences thanks to ML engineer Adam King. Continuous improvement in algorithms with research funded by big tech companies, and the distribution of the models through open-source software is key to advancing NLP.
The availability of state-of-the-art pre-trained models is super exciting for massive NLP adoption in new products and services. Developers building features across products in areas like enterprise productivity, customer service, and healthcare records can leverage these pre-trained models to quickly build question-and-answer systems, sentiment analytics, and clinical decision support tools. These are only a few examples. Language is ubiquitous (code is language too!) and I believe NLP has the potential to innervate almost every industry.
The technology is also moving quickly into production. Google is already using BERT to improve 10% of searches in English in the United States, with expansion into more languages and geographies over time as new models are built.
While many of these NLP technologies will be features in products or improve services like Google Search in the background, we will also see the resurgence of conversational A.I. There was an initial wave of NLP applications created by chatbot startups in early 2016. A chatbot funding craze unfolded (still tracked here), but I strongly believed that the underlying NLP technology stack was not mature enough at the time to adequately support those applications. Many of those early-stage companies fizzled out or exited early. I’m excited for a resurgence in sophisticated conversational A.I. systems and agent-based models in the next five years because of this threshold in NLP.
In addition to applications, I’m interested to see more tooling and infrastructure (built on top of open-source) to enable developers and data scientists to put their models into production. The Talk to Transformer GPT-2 web application demo is possible because tooling provided an easy PyTorch implementation for that model. I think there will be more opportunities for startups to provide model management, model governance, and model CI/CD to streamline NLP adoption in production use cases. NLP is ready for prime-time. I think this bodes well for bottoms up, developer-driven, and high-velocity business models in new startups. I anticipate many companies will contribute to this infrastructure, and I don’t think they will mostly be made by Amazon or Google.
At Threshold Ventures we invest in disruptive companies that are at the threshold (get it?) of transformative growth. I believe NLP is at one of those thresholds. NeurIPS provided a great vantage point to how the industry advanced in the last year, and how much potential there remains for new products that benefit from this technology. I look forward to entering the golden age of NLP.