Building and Improving ML Products in a Rapidly Changing Environment

Henry Zhang
Workday Technology
Published in
6 min readOct 10, 2022

An example of how we build and scale ML products to continuously deliver customer value in a world of constant change

Introduction

Machine learning has started to revolutionize enterprise software in the last decade. At Workday, it is our mission to provide top-of-the-line machine learning applications to our customers. One of such applications is the Document Intelligence product which aims to handle the understanding of business documents with maximum automation and accuracy.

The type of business documents vary and the product needs to work for unique data that each customer has. Traditional approaches apply a probabilistic model over many handmade templates for each different document type and their variations (i.e. invoices of different format). Many techniques also require heavy presence of human-in-the-loop which compromises customer data security.

ML products are challenging because of the addition of machine learning model related risks, such as model accuracy and scalability. On top of that, customers are not accustomed to receiving results of uncertainty, and in mass automation use cases such as reading business documents, accurate prediction is critical to the success of the product.

A few years ago at the inception of the product, the Workday ML team started using deep learning approaches to solve the document intelligence problem. To limit scope for a minimum viable product (MVP), we began with a simple document type — receipts. In the receipt problem we aim to extract critical information such as total amount, date, and merchant names, etc.

Receipts Automation

The receipt automation process

We solved a combination of optical character recognition and natural language processing problems by applying the latest deep learning techniques at that time for document understanding. Our general approach to the problem included a customized You Only Look Once (YOLO) model for field-of-interest extraction (document to patches of texts in pixel) and a Convolutional Neural Network (CNN) based model called DenseNet (pixel to ASCII) for text recognition. Finally a combination of regex and sequence to sequence model maps the text to each respective field. Over the product’s adoption period, the team made numerous updates to model architecture and techniques. Most of the techniques were at the time state of the art methodologies.

Field-of-interest Extractions:

Text Recognition:

Text to Field:

High accuracy was achieved in all fields but required the sequence to sequence model to have been trained on each of the fields. This means each field requires sufficient data and a separate training process, which is a severe limitation given some entities are not limited to the scope of what the customer’s data contains. While we continued to improve, we began expanding the horizon into additional document types such as invoice.

Invoice Automation

The general logic flow for Invoice Automation:

After applying the same approach to the invoice problem, three major challenges arrived:

  1. Scalability and Maintenance — Higher volume of documents and many models/regex to maintain.
  2. More diverse fields — More fields need to be extracted which requires expansion of models and the difficult-to-maintain regexes.
  3. New customer requirements — Customers not only require certain fields to be extracted, but also matched on a 1 to 1 basis to an entity within Workday. This is a very challenging problem as each customer can name their entities with absolute freedom. Hence this requires a sophisticated entity resolution model.

At this point the product consists of the following steps:

Meanwhile these problems can be tackled individually and resolved, but there is also one upcoming factor: competition. As the industry advances and adopts more Machine Learning, many cloud platform companies start to offer services that handle more fundamental levels of machine learning, such as optical character recognition (OCR), and even basic information retrieval of documents. How do we remain competitive, continue to innovate and deliver customer value?

Integration with Public Cloud

Third party services can serve as a means to allow the team to focus on additional customer requirements which differentiates that product from competitors, rather than constantly maintaining models and services that do not differentiate, we call this undifferentiated heavy lifting. On top of that, public cloud providers can have more datasets that are larger and more global, such as restaurant and map data, which has the potential to increase the accuracy of the end result.

Using external services does have trade offs, such as potential issues with the quality of the service, needing to communicate with external teams, less control on timeline and priorities, security requirements. The evaluation was done to assess each aspect of the service provider.

After a thorough review and comparison which included accuracy testing, performance testing and negotiations, the general architecture of the product became as below:

Third party services can effectively handle field-of-interest recognition, text recognition and some simple entity recognition with measurable scalability. Using such services, we have partially solved the scalability problem and reduced maintenance on undifferentiated heavy lifting. This allowed the team to have a better focus on new customer requirements and beyond, which greatly accelerated the product timeline, ultimately leading to customer satisfaction and being more competitive.

The Potential Long Term Drawback

In the longer term, there are risks as the product becomes increasingly reliant on the external service provider, such as pricing changes, priority and timeline changes, and failure to meet expectations. Hence it is important to keep in mind that using external providers doesn’t excuse the team from having the ML technical capability and aptitude for these functions. It is also important for the team to be constantly updated on all other alternative options for the functionality in order to adapt to environment changes.

Embedding based Entity Resolution

The entity resolution/matching problem was resolved by using a more recent and high performing approach of deep learning embedding method. This method uses a custom trained BERT based model that projects query string and entity string into the same embedding space. Then, either a model-based matching neural network or cosine similarity can be used to find the closest match. More details will be provided in a follow up article.

Below is a diagram of how Supplier entity recognition is done.

Training and Embedding Generation

To reinforce the reliability and availability of the above in production, Airflow jobs are deployed for training and generating embeddings for each tenant. The overall production flow ended up looking like this:

Details on the entity recognition/matching models and training process will be covered in a future blog post. For now we would like to summarize our learnings.

Learnings and Summary

  1. Being agile and swiftly adapting to the pace of the technology is critical to success.
  2. External service providers are helpful in filling out areas of a product that does not differentiate versus competitors.
  3. In a rapidly growing area such as Machine Learning, more advanced approaches can either significantly boost accuracy performance and/or reduce maintenance cost.
  4. Even though many models and approaches are nowadays off-the-shelf, having the team capability of adapting state of the art models without relying heavily on external development is important to stay competitive. This should influence roadmaps and hiring decisions for machine learning teams that aim to continuously innovate and derive value for customers.

--

--