Recently, at LINAGORA, we have combined two open source technologies: OpenPaaS ,— our collaborative platform — , and Kubernetes, the go-to open container orchestrator; now, a little more than one month after OpenPaaS’ launch, our distributed team of developers is still working very hard to smooth it out and to implement new features along the way. Despite all our efforts, in order to make our platform stands out in the long run, we need unique features that will go beyond the fact our platform is made out of love and free software.
Such a feature would be well worth April’s newsletter, don’t you think?
“Bonjour, mon nom c’est LinTO”
Wouldn’t it be convenient to have your James-delivered emails summarized and then red-out-loud, on-demand, by your smart assistant?; to have a text extracted from a Hubl.in-based on-line video conference you just missed?; all of this from within an open, autonomous and collaborative platform like OpenPaaS, whose collected data will be used to train and improve your own deep neural network?
Our research team based in Toulouse, LINAGORA Labs, has just started a three-years long publicly-founded and collaborative effort to develop a potentially ground-breaking technology. Its name? LinTO. Its function? To become the sole open, autonomous and business-first smart assistant.
“Alexa, are you cheating on me? Yes, I do that for a living”
One might wonder: aren’t there any other solutions capable of doing that? Sort of, but most smart assistants have the same shortcomings:
- Proprietary. They are not open source, and when they do, they often rely on calling non-free APIs;
- English-centric. They are focused on the English language, and therefore often treat the French language as a second-class citizen;
- Individual-first. They are made to serve individuals’ needs, and often fall short of business-oriented functionalities;
- Cloud-reliant. They do often rely on remote infrastructures to do the heavy-lifting, and thus cannot be deployed on premises;
- Law-breaking. They are incompatible with the General Data Protection Regulation (GDPR), a new European’ based regulation which will soon be enforceable.
LinTO’s ambition is to address all of those issues by becoming the sole French-first, open, autonomous and business-oriented smart assistant that will truly empower companies, while seemingly integrating with OpenPaaS.
Before we reveal LinTO’s internals, let’s cover some basics about smart assistants in general. If you are already familiar with those technologies, please feel free to skip over to the next section.
Remedial course on speech recognition
What is speech recognition?
Among other areas of interests, LINAGORA is exploring new techniques to better extract strings of words from the human voice, a problem in computational linguistics commonly referred to as speech recognition. Put another way, speech recognition, also known as “automatic speech recognition” (ASR), “computer speech recognition”, or just “speech to text” (STT), is a technology used to automate the recognition and translation of spoken language into texts by computers.
What are the main functions of a smart assistant?
Roughly speaking, a smart assistant needs to do the following four things:
- First, it needs to listen, to capture and transcript the human voice into strings of words, a process commonly referred to as decoding;
- Second, it needs to be able to reason about the transcribed text, to comprehend it in order to give it a context;
- Thirdly, it needs to select the correct answer;
- Finally, the machine needs to read this text aloud, to deliver an audible answer to the user.
How a smart assistant can recognize the human voice?
A smart assistant is relying on a model to recognize a particular sound. This model needs to be trained, using a so-called corpus, which is a set of words and their associated sound. How do you build a corpus? You need to utter the same word multiple times so the assistant can associate the sound that has been produced to its written counterpart. This tedious process must be repeated for each word and sentence. Eventually, one will be able to constitute a corpus. A model could be trained using the said corpus, and this very model will eventually end up in the smart assistant.
In order to accelerate this whole process, it is crucially important to have an already established corpus, a database of words and their associated sounds. Unfortunately, as of today, those sets of data are scarce, even for the English language. The Mozilla Foundation is trying to change that by leading common voice, a crowd-sourced project for creating such as corpus, and which is for the moment only targeting English.
It must be said that as of now, LinTO is at a very early stage in its development, and is mostly showcased as a proof-of-concept. As a result, its underlying building blocks may change in the future. That being said, as of now, LinTO borrows from previous projects developed internally, including Hublot, a bot created for Hubl.in and which is able to do real time audio transcription. Hublot has been built using the Kaldi Speech Recognition Toolkit, a project born at the John Hopkins University intended for use by speech recognition researchers. Hublot’s model, LinSTT, has already been trained with a French corpus. LinTO will reuse LinSTT, our own speech recognition engine, and enrich its corpus and functionalities. Eventually, this corpus will also be open sourced.
As of now, LinTO is intended to be built around a Raspberry Pi. The final product may be equipped with a camera, a USB 4 Mic Array from ReSpeaker and a touch-screen. One of the benefit of the Raspberry Pi is its large ecosystem and affordable price. One of its main inconvenient is the fact that it does not rely on open hardware.
In order to develop this ambitious project, we are teaming up with partners which we will briefly introduce here.
Zelros is a French startup who develops a Natural language understanding (NLU) engine to fuel its own chat bot, targets the French language and uses business-specific vocabularies. They have open sourced their testbed for chat bots.
The RAP team from the Laboratory for Analysis and Architecture of Systems (LAAS) is working on human-based visual perception since 2005, a sub-field of computer vision. Their expertise in this domain will help LinTO to acquire the ability to detect, track and recognize people within a conference room.
Two teams from the Computer Research Institute of Toulouse (IRIT) will participate: MELODI and SAMoVA. MELODI has previously worked on machine learning methods for the detection of speech and also brings its expertise on textual similarity. The SAMoVA team is working on the automatic analysis of audiovisual content and in particular the highlighting of contextual elements that are characteristics of conversational interactions.
Finally, the DaSciM team, from LIX, the computer science laboratory of the “École polytechnique” will bring its knowledge of information retrieval, Deep Learning, real time keyword extraction and disambiguation.
The living lab concept
LinTO is by all means a collaborative research and development project. Thus, it is not only about what technology we use but also about how we collaborate towards successfully achieving this goal. Here, we intend to innovate too, by following some of the principles of the living lab approach. According to Wikipedia, “a living lab is a user-centered, open-innovation ecosystem […] integrating concurrent research and innovation processes within a public-private-people partnership”. The peculiarity of this method is that, to achieve the stated goal, both technical and functional tasks are integrated early in the process.
In the context of LinTO, the general idea is to co-design a solution with end users, so we can track their needs from the get-go, and be confident we are going to match their need at the end. For example, an end user and a project owner will have to work together to precisely define the corpus of documents and commands for the training of specific business models. Then those models will be used to craft a dedicated smart assistant.
“Farewell, Alexa: bonjour, LinTO!”
At Linagora, we are committed to help private and public institutions take the power back from monolithic, opulent and privacy-invasive companies.
OpenPaaS, our open source collaborative platform, is arguably our current best effort to do that, to help organizations break-free from proprietary solutions. And with LinTO in the making, we are confident our platform will become an even worthier actor tomorrow! Stay tuned to follow its development, and let’s meet next month to cover another topic. What will it be? It may well be about an intriguing piece of software called an Entreprise service bus…
Interested in joining LINAGORA? We are hiring!