Convolutional Neural Nets for Rapid Recognition of Mortgage Docs

Jon Tang
Snapdocs Product & Engineering Blog
5 min readSep 21, 2018

At Snapdocs, we help mortgage professionals close over 50,000 mortgages a month all across the U.S. The vast majority of these mortgage closings go smoothly, but sometimes problems come up at the signing table. From our analysis, we found that a large percentage of signing-table disputes happen because of confusion over the loan terms. For example, the cash-to-close or interest rates are not what the borrower expects. At Snapdocs, we want every closing to be perfect and error-free, so we started looking into ways to prevent this problem.

In every closing package (full set of mortgage documents), one of the most important pieces of paperwork is the Closing Disclosure, a five page form that provides the final details about the mortgage loan. A sample of the first page of a Closing Disclosure is shown below.

A sample page 1 of the 5-page Closing Disclosure document. Source: https://www.consumerfinance.gov/owning-a-home/closing-disclosure/ .

As you can see from this sample page, the Closing Disclosure holds a lot of good summary points about the mortgage. However, this form is buried somewhere within the closing package, which is often a 200+ page PDF document. We imagined how quickly identifying the Closing Disclosure page and displaying this information would resolve some of these signing-table disputes.

Challenge 1: Data Pre-processing Speed

We first attempted to build a simple text-based machine learning model to identify the first, second, and fifth pages of the Closing Disclosures, which contain the most relevant data. A text-based classifier seemed like an obvious choice because these pages of the Closing Disclosure contain very distinct keywords.

We quickly realized, however, that the required pre-processing steps for generating the data in production took too long. You first have to convert all pages of a closing document from PDF into high-resolution images. Then the images containing text need to be converted into machine-encoded text using Optical Character Recognition (OCR). The entire process could take several minutes for a 200+ page document, which is longer than we’d like to power some features that are time-sensitive. Additionally, several minutes per package wouldn’t be scalable for our current volume.

Another downside is that many of our closing documents are scanned documents, rather than electronically generated, and therefore are often marred by ink spots and visual noise. Performing OCR on these scans often results in unreliable text for accurate classification.

We stumbled upon an interesting idea after skimming through preview images of closing package pages. As you can see below, Closing Disclosure pages make use of distinctly formatted tables that make it visually distinguishable from non-Closing Disclosure pages. Very small, low-resolution images are sufficient for a human to be able to categorize them quickly at a glance.

Examples of low resolution images of first, second, and fifth pages of the Closing Disclosure, and other non-Closing Disclosure pages.

Our Approach and Results
This motivated us to try using an image-based approach. One advantage is that the only required pre-processing step is the conversion of the PDF into JPEGs. Additionally, we don’t need to generate the images at the high level of resolution (300 dpi recommended) required for accurate OCR. We found that 70 dpi resolution images were good enough, reducing the total pre-processing time from 2 seconds to 20 milliseconds per page.

That’s a drop in total pre-processing time of nearly 107-fold!

A comparison of the data pre-processing steps for using a text-based classifier versus and image-based classifier.

Challenge 2: Lack of Labeled Data

When we started this project, we had identified less than 300 Closing Disclosures from our entire database of closing packages. We knew that to create any reliable classifier from such a small dataset would be extremely challenging.

Our Approach and Results

Rather than go through our collection of closing packages and manually label more Closing Disclosure pages, we thought this problem could benefit from a technique called data augmentation. Data augmentation works by taking your existing dataset and making minor alterations to create additional labeled data. In our case, as shown below, we used each page in our original dataset as a template to create additional example images that contained slight rotations and translations that we would often find in some of our scanned images of closing pages. Because pages are often scanned in upside-down, we also included examples that were full 180 rotations.

We augmented our data by taking each page in our original dataset (in yellow) to create additional example images that contained slight rotations and translations (in blue) that we would often find in some of our scanned images of closing pages.

Using this technique, we were able to create a dataset consisting of over 250,000 images of closing package pages with more than 10,000 of them being the first, second, or last pages of Closing Disclosures. Additionally, this gave our dataset broader coverage of the types of scanned mortgage documents we see in the real-world.

Machine Learning Model

We tested several different machine learning models and hyper-parameter sets in TensorFlow until we settled on the following convolutional neural network architecture. In production, the model has a micro-average f1 score of 99% and takes a couple of seconds on average to identify our Closing Disclosure pages from an entire closing document.

Our final convolutional neural network model architecture after testing other machine learning algorithms and hyperparameter tuning.

Summary

While a text-based classifier seems like the most obvious choice for a document classification problem, our unique constraints challenged us to find a more radical approach. In our case where we required near instantaneous document classification, we chose to use an image-based classifier because the data pre-processing steps were more than 100x quicker in production.

Machine learning, computer vision, and automation are continuing to play a large role at Snapdocs as we strive towards our company goal of bringing greater efficiency, accuracy, and joy to mortgage closings. Aside from this Closing Disclosure classification model, we have developed other models to help support fast hybrid signings. For example, we use machine learning models to quickly and accurately identify pages from mortgage documents for e-signing, wet-signing, or previewing. We use computer vision to automate the task of identifying signature lines on pages that require digital signatures from the consumer. In the future, we also anticipate features such as creating a ‘Table of Contents’ for consumers to see each of their docs page-by-page. This will also allow lenders and settlement agents to swap out individual documents, instead of entire packages when there’s an error.

Come Join Our Team. We’re Hiring!

Find our work interesting? Want to help us tackle the massive mortgage closing space? We’re hiring. Come join our team!

--

--