How to build a tech product: technology

Chris
5 min readOct 12, 2017

by Patrick Weissert, CPO

Imagine an 11 year old car upgraded with conversational Artificial Intelligence & Machine Learning in an advanced piece of hardware that can be installed in 3 minutes — that’s Chris. A deep tech product for low tech consumers. We share our learnings on how to build a tech product in a series of articles. Today: Technology.

Finding the right hardware setup is a mixture of many years of engineering experience, which drives the original hypothesis which components should be used, and then lots of research and prototyping to see whether components do what they should. For example we early on started prototyping with rectangular displays, but it became clear very quickly that they are simply not a good fit for the car.

Display and hand gesture recognition

One of our design principles was that drivers should never have to touch the device while driving, since that is both distracting from the road (look at the device where you touch), and also often requires the driver to lean forward, further reducing the ability to react in a sudden dangerous situation. So we focussed on voice and hand gesture recognition.

Gesture sensors differ vastly in their capabilities, some time-of-flight sensors allow the detection of a wide set of super complex gestures, there is going to be lot of innovation in this space in the next years. For Chris our feel however was that we should not bother drivers with complex gestures, because complex gestures require more cognitive bandwidth, but keep it to very few, simple gesture, which however need to be detected with very high reliability and at a reasonable distance to the device. For this we looked at radiation based passive IR sensors, reflection based IR sensors and electrical near-field 3D gesture controllers.

Keep it simple: Trello + GitHub + BuddyBuild + Slack

Coming out of big development teams we were used to JIRA tickets, fully automated deployment pipelines, build monitors, and all kinds of very nice but heavy infrastructure. To get going we opted to keep it deliberately simple. We have seen so many POs bang their head on the wall over JIRA complexities that we we said let’s start with Trello, and see how far we get. And to our surprise, we got pretty far.

It integrates nicely with GitHub, Slack and Google Drive. We maintained all user stories in a dedicated Trello, broke them out from there into tickets in project Trellos, and from there linked into issues in GitHub. This allowed us to trace any change and commit back to the product vision (the master Trello), and also the ongoing sprint. Both the ticket and the build get then pushed into Slack from where you can download & install the build and start testing. It worked pretty well, and we are still using it today.

Collecting NLP test & training data

One of the very first steps in creating a voice-based product is to collect test and training data. This should be utterances, phrases, sentences of varying degrees of difficulty, within the domains you are targeting (say, emails, music, navigation) and within the context you are in, in our case in the car, not in the car, in the car with open windows, windows closed, high speeds, low speeds, cobblestone roads etc.

This catalogue of phrases, contexts and and domains creates a nice huge matrix of things to record. A good way to do it for us was to have 2 people drive around in a male/female combo (for the different voices), and the co-driver would read the script to the driver, and the driver repeats back the phrase, and the whole conversation gets recorded and later cut up. Of course, some elements of the training can be crowdsource or created through crowdworkers.

Equipped with our test and training set we evaluated different ASR (automatic speech recognition) and NLU (natural language understanding) systems, configurations and setups to get to the magic formula.

One finding for example was that a certain kind of noise reduction can significantly deteriorate the accuracy of the language processing, and so we made sure to include a range of filtering and noise cancellations solutions in the hardware design to be able to continuously optimize the speech recognition even when the devices are already in the market.

“See the whole” as early as possible

To “see the whole” is one of the 7 principles of lean software development, and a critical one in our view. Our product involves software on smartphones, cloud platforms, a variety of SDKs, speech recognition, embedded device software and communication protocols between these. And of course you have app developers, server and full stack developers, embedded software developers and NLP/AI developers and data scientists. In such a setup it’s easy to end up with a bit of a mess once the pieces come together.

That’s why we pushed to as early as possible bring together all systems (app, NLP, embedded), even if the resulting prototype still has very limited functionality (very, very limited!), and most of it is still stubbed and fake. Simple things like “oh, I need a debugging window” only occur once you bring into one piece. It became one of the first features of our prototype app, a simple way to take apart what the ASR heard, what the NLU understood, how long it all took, and so on.

A great way for our AI team to start banging on accuracy and speed for months to come!

Like how we work? Check our job opening.

Follow us on Facebook and Twitter | check our website

Patrick Weissert is founder & chief product officer at German Autolabs. He is a passionate going on weekend-trips and has been waiting for years for someone to build a good voice assistant for cars. Now he is building it himself.

--

--

Chris

Chris is the world’s first digital assistant for drivers, making in-car access to apps and services safer and more convenient. www.chris.com.