Picture by Fabian Grohs on Unshash.com

Preview content of invoices with Sypht SDK— a machine learning and OCR on demand solution for documents

As we move towards a paperless word, I find an increasing number of invoices come to me through email. This works out great for me, as I can just archive them in my email or online in cloud storage. This means all my invoices are in one place, ready for tax time. I don’t have to worry about scanning a piece of paper, then archiving the piece of paper or destroying it safely.

There is however a minor drawback; one that bothers me endlessly. Why can’t I just see how much I have to pay without: downloading an attachment; leaving my email; pinch zooming a PDF; blah blah blah. There are two main problems I would like to solve.

  1. I just want to be able to see the details of the invoice and not have to leave my email.
  2. I want to be able to store these details ready for tax time, or one-click pay them some how. In short, I want to be able to action upon these details.

This led me to investigate whether this was possible. My first stumbling point came when I had to extract content from PDF or image files.

Why is extracting content from PDF or images so hard?

In the worst case scenario, a PDF is no better than an image. There is no text that can be parsed without passing the file through an OCR. Even if you manage to do this with the few reliable and free API’s on the internet, there is a lot of “text-noise” in the results. How do you determine which field is the due amount; due date; payment details etc.

This seemed like a perfect fit for a machine learning! However, even if I use all the invoices I have amassed to date, this is not enough to train any machine learning algorithm. Not to forgot, the not-so-minor issue of me not knowing a thing about machine learning. I could arguably use all my invoices and train a machine learning algorithm for my specific needs; however, this will break when the format of an invoice changes and it will be a while before I amass enough new documents to train the algorithm to be reliable again.

Looking for solutions

I initially went down the route of using Google APIs to OCR all attachments received in my email and using regular expression to parse for the data I wanted. I very quickly released this was impractical, because it required a custom parser for each document I wanted to parse, or at least slight tweaks to existing parsers. This meant that I had to have knowledge of the document I was parsing and when automating end-to-end, this meant that every slight change that the sender made to an invoice (like changing the attachment name, or text position in the attachment) could break my parser and return unwanted results. An other problem was that I had no way to measure “confidence” in the result.

Side note: I’m not going to talk about how I managed to get content from my email in this article. If this is something you’re interested in, have a look at my GitHub repo (linked below) or leave a comment asking for an in-depth solution.

I very quickly realised that this approach was making my code exponentially more complex and frankly, impractical.

Enter Sypht

A friend of mine put me on to Sypht. Sypht is a service (free for personal use, as far as I can tell) that uses machine learning to convert your unstructured documents (images or PDF) to structured data. Basically, if you upload an image, you can request Sypht to parse it as (among other things) an invoice, bill, or as a general document.

You can use Sypht through their web app, however on the Sypht GitHub account, they also have a Sypht client SDK’s for most common programming languages.

Since my plan was to create an Android app that could download my emails with attachments and parse the invoices in them, I decided to use the Sypht Kotlin client. As we will see, the Sypht Kotlin client works great with Android, however it is not strictly limited in it’s functionality to Android and can be used with all your Kotlin projects.

Getting started with Sypht

Using the Sypht web app

The first step is to see if Sypht works! In order to do this, I had to create a Sypht account. You can do this by heading over to sypht.com and hitting “Register” in the top right corner, or going to their “Create an account” page.

The signup process is rather straight forward and when you finish registration and login again, you will be prompted with KYC (know your customer) type questions. There are 2 simple pages of these, takes less than 20 seconds to complete, at the end, you should be given an option to go to the developer docs or to the web app.

Select “App — for most users” to go to the Sypht console

Since I was still trying to suss out the service, I decided to go to the app and try uploading a few invoices. By default you should have an invoice there as an example.

The upload process is rather straight forward.

Click the “Upload” button, select or drag and drop your file.

Upload in progress
Upload successful

Once the file is uploaded, select the file and click the “Extract” button.

Specify what kind of document this is, in my case, it’s a “sypht.invoice” however I selected “sypht.generic” as well for good measure.

Wait a few seconds, later I saw a prompt on the screen indicating that my file had been parsed successfully.

I hit refresh and blue “I” and “G” icons appears next to the uploaded PDF, indicating that the documents had been parsed. This took less than 10 seconds in my case however your mileage may vary depending on how many categories you choose for extraction and how big your document is.

Clicking on the file now and it will show you a preview on your invoice and what content has been found on it.

So, I’ve concluded that the service does what I want. I can send it my bills and invoices and it can extract the payment details.

Using Sypht SDK to extract fields from an invoice

Now, the web app does what I want, however, before I build a whole app that extracts attachments from my email and sends them to Sypht, I would like to validate that the Sypht SDK works as expected. In order to do this, I started a new Android app and used Sypht Kotlin SDK with it to send a file to Sypht and see the response. The premise of this project is simple, send a hardcoded file to Sypht, check the response. Looking through their GitHub page, I can see that we can either provide a File or an InputStream. Knowing Android, I will probably have to test uploading a file using an InputStream.

You need to go to your Sypht account’s “Company” section.

Once there, select “Credentials” from the left hand side.

You will see a lot of “Client ID” and “Client Secret” on this site. You can use any one of them in your app. The “Field set” mentioned above each ID and Secret indicates what documents uploaded will be parsed as by default. You can use any key-secret pair you like, through the client SDK you can request more or fewer fields if you like.

So, using these credentials, I wrote a quick test application and used its unit test framework to check the output of the SDK.

The unit tests uploaded the file as a File object as well as an InputStream. As you can see below, both were success and a JSON object representing the data in the document was returned in the response.

Whats more, the code was straight forward. Below is the whole Unit Test class.

class SyphtUnitTest {
private lateinit var client: SyphtClient
private lateinit var file: File

@Before
fun setup_sypht_client() {
client = SyphtClient(BasicCredentialProvider(BuildConfig.SYPHT_CLIENT_ID, BuildConfig.SYPHT_CLIENT_SECRET))
file = getTestFile()
}

private fun getTestFile(): File {
val classLoader = javaClass.classLoader!!
return File(classLoader.getResource("receipt.pdf").file)
}

@Test
fun upload_sypht_using_file() {
val id = client.upload(file)
val result = client.result(id)
println(result)
assertNotNull(result)
}

@Test
fun upload_sypht_using_inputstream() {
val id = client.upload(file.name, file.inputStream())
val result = client.result(id)
println(result)
assertNotNull(result)
}
}

There is a slight gotcha which is that the code execution is current thread. This means that in Android, if the upload() or result() methods are called from the Main-thread, we are going to have a problem. The SDK does not provide a promise, async callback or any other way to get around this, however, the issue is minor as this means I can choose how I want to best implement async processing in my code. Note: The Sypht SDK uses OKHTTP, so I’m sure it will be entirely possible to add this support in the future.

Putting an app together using Sypht

My original pain point was that I just wanted to be able to click on an email or it’s attachment and preview invoice details. In order to download my emails and attachments, I decided to use Microsofts Graphs API. MS Graph provides great APIs for working with all sorts of Microsoft services including Outlook.com, Hotmail, OneDrive etc.

The code to the app can be found on my GitHub. The app uses MS Graph API to download your emails, fetch all their attachments, then uploads them to Sypht, asking Sypht to parse them as an invoice.

Below is a demo of a first run of my app, while in the first run it takes a bit of time to download attachments, then upload them to Sypht, the results are stored in Room for faster retrieval subsequently. As email and their attachments don’t change, this is a rather reliable way to get a performance boost.

The code for using the Sypht SDK was not much different from the Unit test example I mentioned above. However, if you want me to talk about how I built the app, leave a comment below.

A few things that I feel could be improvement in the SDK:

  • Response is a String. I would rather this be an Object presenting the result.
  • I had to download my attachments and then upload them to Sypht. I would rather like to be able to give Sypht my link and let their cloud service download the files directly as this would save me bandwidth and would presumable be a lot faster.
  • Despite what I said earlier, it would be nice for upload() and result() methods to have an async implementation.
  • There is no way to cancel file upload or fetching results short of terminating a thread or coroutine.
  • If I have already upload a file, I need to know it’s ID to retrieve the result again. It would be nice if there was a way to know if we have already upload a file (by name or by it’s hash).
  • Ability to choose/provide my own HTTP library. Sypht SDK uses OKHTTP but some libraries like Glide provide integrations that allow you to use OKHTTP or Volley.

Finally

In order to build great Android apps, read more of my articles.

Yay! you made it to the end! We should hang out! feel free to follow me on Medium, LinkedIn, Google+ or Twitter.

A software engineer, an Android, and a ray of hope for your darkest code. Residing in Sydney.