Deep Learning Data Pipeline Management: Building a Web Client to Optimize Data Testing Process

Project done by Seunghwan, BinaryVR software engineering intern

hyprsense
7 min readAug 30, 2018

Have you ever wondered how deep learning/machine learning data pipelines are managed in practical applications? To learn more than just its theoretical notions, Seunghwan started an internship at BinaryVR and was assigned to a project regarding data pipeline management. As he expected, the project was full of opportunities to deal with enormous data and processing cycles with his own hands. In this article, you will learn about his project and the key challenges and concepts he encountered.

What Is This Project About?

The project is highly related to deep learning/machine learning data pipelines. In the same manner as other DL/ML powered technologies, BinaryVR’s facial landmark tracking technology (a.k.a BinaryFace*) is developed by these following major steps — data feeds, data training, and data testing.

* BinaryFace is a real-time facial landmark tracking solution of BinaryVR running on mobile with a RGB camera. To learn more: www.binaryface.com

Seunghwan’s project: building a web client and its tools to manage the data pipeline for the evaluation process in DL/ML to improve the performance of BinaryFace

If you take a closer look at the ‘data testing’ stage, there are three smaller steps in the pipeline. What we do here is to sort out failed cases and compensate the cases by re-training the algorithm.

What Does the Web Client Do?

The purpose of this project is building a web client to effectively manage the ‘data testing’ pipeline. Implemented tools in the web client work as an assistant for evaluators to sort out images with inaccurate tracking results. Images can be pre-annotated images or real-time images from an evaluator’s camera. Later on, they collect and reprocess the sorted data to send them back to the training stage. This helps our DL/ML model to re-learn and fix the algorithm by reflecting their mistakes.

Here is one of our tools, ‘valid landmark checker,’ and how it works:

Unfortunately, his project goal was not only about implementing each tool into the web client. By the nature of a web client, more important factors were optimization and structuration to reduce frontend waiting time for evaluators. Frankly, no matter how beautifully a website is built, we all know that we cannot stand excessive loading times that never seem to end.

Challenges for Building the Web Client

To build a web client for effective data pipeline management, Seunghwan’s key challenge can be summed up in two — optimization and structuration. Here are the main concepts we will be dealing with.

  1. Optimization
    - Asynchronous processing
    - P2P direction connection
  2. Structuration
    - Dependency & Infrastructure Codification

Optimization: Asynchronous Processing

As our web client has to load and save an enormous amount of images and reduce frontend waiting time as much as possible, Seunghwan optimized servers in several ways. First, we integrated asynchronous processing to efficiently split and distribute workflow processing times.

The key difference between synchronous processing and asynchronous processing depends on when you receive a response from a server after a client sends an order. If a client has to wait for the processing time for the order and receive its result, we call that synchronous processing. If the client receives a response that the order is well received right after the order, that is considered to be asynchronous processing. Asynchronous processing allows the client to keep working on other tasks while the server is dealing with the previous task.

Example: Assume that we have three tasks to finish — 1. take out a burrito, 2. grab a latte, 3. take a walk. Synchronous processing would make it so that you need to wait in the taco shop counter until the food is ready for you to pick up. While a chef and a barista are making your burrito/latte, you are wasting your time waiting. On the other hand, asynchronous processing is akin to the restaurant allowing you to shop at other establishments as your order is being prepared. The restaurant/cafe will send notice to you by text when your meal is ready. As you probably can tell, you are the client and the chef/barista are servers. The same could be said to the server side. The servers can work without waiting for the clients to order since they can stack the orders beforehand.

At the macro level of the entire workflow, we designed our CPU and server operations asynchronous to receive tasks with no wasting time. Specifically, asynchronous processing is used for transferring tasks between clients and servers and time consuming image processing.

Optimization: P2P Direct Connection

Another challenging task was implementing the P2P direct connection to enable our real-time BinaryFace trial on a web browser. We wanted to connect the BinaryFace server to clients who want to test our tracker. Like a live video call, when the client sends its video input, our server receives the video and resend reprocessed video after tracking facial landmarks. This way, the client will receive the BinaryFace tracking result on the top of its video streaming.

Usually, there is a relaying server in the middle to connect the two ends. Skype’s video call service is also powered by this relaying server method. The downside is that the connection takes up more time and the relaying server can be overloaded as the communication has to stop by the relaying server all the time. This leads to slow processing times and additional server management costs.

We integrated WebRTC to implement P2P direct connection to solve this problem. The P2P direct connection allows two ends to be connected directly once the initial negotiation is done with the signaling server.

Unfortunately, due to the web browser dependency on video codec support, it was hard for us to receive/send videos. While the server we brought from the open source project (WebRTC) supports VP8 and VP9, Safari only supports H.264. We could add H.264 support, but it would have taken up too much time. Another fundamental threshold was the significant delay even after the integration as the BinaryFace server is comparatively far for some users.

So we concluded to shift our direction into implementing our DL/ML model directly into the browser utilizing WebAssembly. WebAssembly allowed us to run BinaryFace in native codes on the client ends resulting all browsers support and faster speed.

Structuration: Dependency & Infrastructure Codification

Let us move on to the web client structuration. To build a web client as we intended, you need to understand the dependencies of each infrastructure necessary and structure them without causing any interference.

While some might think our web client tools simply mark selected images, they are not as simple as it seems to be. The web client is composed in a complex structure with infrastructures such as a repository and certification for security. Infrastructures are connected to each other in certain rules and relationship which we call dependency. Deep understanding of AWS architecture was entailed as we utilize its beautifully pre-built infrastructures in developing the tools.

Seunghwan also used Terraform to codify infrastructure for better post management. Terraform allows users to define a data-center infrastructure as code so that structuring is partially automized and simplified.

Example: If we want to build a clothing wearing process, wearing a top, pants, socks, and sneakers are infrastructures. Wearing shoes cannot precede wearing socks or if you wear jeans, you cannot wear shorts again. Those rules and relationships are each infrastructure’s dependency. Terraform’s role is like a detector (or your parents) who would infer that you should wear socks before sneakers. Terraform automatically codifies the structure of the clothing wearing process for you.

Overall Review

Now our web client for deep learning pipeline management is ready! We covered how we manage the ‘data testing’ stage pipeline and which tools we built, how we optimized and structured our web client. Thanks for sharing your intern project and experience, Seunghwan!

Seunghwan’s Comment:

The project reinforced my understanding of web client structuration / optimization and infrastructure maintenance. Interning at BinaryVR offered me to learn new technologies that one cannot gain from studying alone. It feels great to know that my code is directly running as a part of the product development process. Topics we mentioned above such as WebRTC or Terraform are hard to know unless you have unique opportunities. I could not thank BinaryVR more for considerate guiding and support.

Explore open positions: https://angel.co/binaryvr/jobs
Send your resume for the internship: contact@binaryvr.com
Learn working at BinaryVR: What Made Engineers from Tech Giants Gather at a Small AI Startup?

We are BinaryVR; aiming for seamless interaction between AI and people’s daily lives in the computer vision field. We develop the world’s top quality facial motion capture solutions, HyprFace and BinaryFace, keeping our core value in constant evolution.

--

--

hyprsense

Hyprsense develops real-time facial expression tracking technology to light up the creation of live animation.