A couple of years ago, we decided to start investing in Machine Learning. Around the same time, we also adopted Agile as our approach to software development across all of our business units.
The first public product that arose from our Machine Learning investments was Kontxt, which we announced a few months back. Kontxt leverages Natural Language Processing (NLP) to classify mobile text messages at scale, and it enables specific policies, such as SPAM blocking, and value pricing.
Another development from our Machine Learning investment, made public a few weeks ago, is a facial and object recognition platform, RealNetworks computer vision. RealNetworks has a long history of leadership in image processing and compression. We’ve evolved our algorithmic expertise to include video computer vision. While we are still working on additional use cases, we’ve made significant strides in accuracy and performance, specifically for face recognition.
Our accuracy and performance have been recently measured in Labeled Faces in the Wild (LFW) test, and in the National Institute of Standards and Technology (NIST) test.
First, we scored very high in the Labeled Faces in the Wild test. LFW is collected and maintained by the University of Massachusetts. The test dataset includes 13,000 faces, 1680 of which include two or three of the same faces. The following table reflects our scoring with those of a selection of other companies:
The LFW score of 99.8% puts RealNetworks computer vision in the top tier worldwide, ahead of companies that have invested heavily into computer vision with record high funding rounds, such as Baidu, Megvi-Face++ and Sensetime.ai.
A more recent and reputable scoring, which was just released last week, is the Face Recognition Vendor Test (FRVT), a benchmark organized by the National Institute of Standards and Technology (NIST). NIST serves as the official guideline for U.S. government purchases. The test results are the gold standard for the global security industry.
The test datasets of FRVT are comprised of real-world scenarios provided by the U.S. Department of Homeland Security. For example, they contain a massive number of photographs taken by border control agencies, as well as numerous images from criminal investigations. Compared to other popular tests conducted by academia, such as LFW, FRVT is more real-world, and much larger scale. The test data set is not disclosed. Instead, vendors are required to submit their code in the form of a C-library. NIST executes the test and publishes a rich set of results.
High-level summary of the NIST results
RealNetworks Computer Vision ranked 6th out of 47 participants in wild faces. The four Russian entities and one Israeli company ahead of RealNetworks have tuned their algorithm over five submissions. NIST’s published ranking was an extremely positive result for RealNetworks, because this was our first submission, and we were being compared to vendors who have tuned their algorithms repeatedly over the course of multiple submissions.
Our focus with computer vision has concentrated on the hardest category, that of ‘wild’ face recognition since that’s a cornerstone metric for the use cases we’re enabling. Most of the top companies in the world for facial recognition are significantly behind us in wild faces, requiring more than 4X the processing time, which indicates they don’t have a technically scalable and hence feasible solution. In most cases, they’ve built their systems around the legacy of an older algorithm.
Here is a link to the NIST results
The test results for facial recognition are a great backdrop to the key lessons we learned while developing RealNetworks computer vision.
1. Leverage the right platform: The largest tech companies in the world have made their own tools available for machine learning. You are doing yourself a big disservice if you are not using those tools.
2. Invest in the team: You need a multi-faceted Agile team that knows how to apply contemporary machine learning research to your problem domain.
3. Data, data, data and real data: A few years ago we released a mass market consumer product and service called RealTimes. In RealTimes, we applied our expertise in photo and video processing to automatically generate intelligent real-time slide shows. The algorithm we developed proficiently scanned millions of photo and video libraries for specific people, settings, and behaviors. As a consequence of our investment, we were able to construct, optimize, and refine a recognition algorithm for faces and objects. RealNetworks computer vision uses this algorithm, and hence indirectly leveraged the huge volume of photos and videos coming from RealTimes. The libraries from RealTimes represent the perfect training set for face recognition. Being a broad consumer-oriented application the same face can appear hundreds of times in different light conditions, occlusion and angles.
4. Right training: With lots of data you need to invest in building the right tools to train the deep neural network. Adopt scrupulous data hygiene in your training. We saw an improvement in our Key Performance Indicators (KPIs) as soon as we were able to deploy new training tools that allowed the team to increase productivity and the quality of the training.
5. Cost-efficient configuration: Leverage edge computing and reduce transmission and cloud costs. This is key to creating a highly scalable solution that can rapidly grow without killing the real-world economics. A good overview is available here.
6. Measure your progress — or lack of: With every sprint we internally share a report of the progress we are making on a bevy of KPIs. These are mainly on accuracy but don’t ignore speed and economic metrics. Be honest, report results objectively. Measure yourself as your customer would. Set yourself goals before each sprint.
7. Get scrappy: One of the challenges we encountered was to get smarter about video cameras and how they behave in terms of face detection, recognition etc. Distance, and angle were hard problems to solve. Instead of floundering with a massive testing challenge, we procured the most common cameras, created a lab in our office and started experimenting. We quickly became very smart about understanding the type of camera to test by use case and location layout. We wouldn’t have scaled without building an achievable test plan.
8. Find a specific vertical to start where you can differentiate your product from the competition: The giants in the industry (AWS, Microsoft, and Google) offer a very horizontal platform, at least for now. Trying to compete with them is a lost cause. Going after some specific use cases in one or more verticals where there’s a need to solve industry specific problems is the way to go.
9. Get feedback: As soon as we got something to try out, we created a test environment and we started to distribute the solutions to employees first and a few friendly partners second. We got a productive flow of thoughtful and valuable feedback. We discovered shortcomings. For example, we had some issues with recognizing women under certain lighting conditions, and we failed to recognize smiling in other instances. All feedback was taken seriously and carefully documented, addressed, and retested. In addition, once the first version of the platform was ready, we initiated pilots where we lived side-by-side with our customer.
10. Be lucky! As our CEO says here: if you can choose between be good and be lucky, pick lucky!
In the next few weeks, we will announce the first use case for computer vision, so watch this space. And more importantly wish us luck!