This is a walkthrough of how Dutch went from an idea to the App Store. Most of it is dedicated to the execution so you can get a sense of how we approached the problem. The rest will describe the submission process to help you avoid pitfalls when making a first time submission to the App Store.
Last Summer, a group of twelve of us decided to go to New Orleans for Independence Day. We had a packed itinerary and nobody wanted to squander precious time figuring out the bill at the end of each meal. We decided on a “simple” rotation where we would take turns paying for a meal and recording the assignments. This usually ended with someone taking a picture of the receipt and assuming we’d sort it out later. What resulted was an overly complicated expenses calculation sesh over messenger riddled with inconsistent rules and compromises.
Several in the group tried to suggest formulas for calculating each person’s true total, something on the order of SUM(items) * (tax rate + tip percentage). This of course turned messy when appetizers and shared entrées were thrown into the mix. Others were accustomed to the practice of splitting tax and tip evenly for every meal. This still meant we had to wait for replies confirming what each person had ordered. Vacation was over and all compromises were acceptable so long as they reduced complexity. By the end of the session, receipts were being split down the aisle.
It occurred to me that there should be a better solution. Existing apps were already doing stellar jobs at recording expenses, but this usually meant punching in one total and selecting whom in the group would share it. For finer granularity, we would have to parse the details of the receipt. My friend Raymond and I set out to build a bill splitter using optical character recognition (OCR). The idea is not novel, but we challenged ourselves to build the strongest parser paired with a modern, intuitive UI. We started with the following rules top of mind:
- The results of the model should discern between items and fees. Fees were to be split based on each diner’s percentage of the subtotal.
- A simple invite mechanism should exist to quickly get all diners into the receipt.
- There should be a persistence layer so that everyone invited to the receipt can reference the breakdown and the original image in the future.
- Functionality to edit, add, and delete items needed to be present so users could make changes in case the model overfit.
I began mocking up a design in Sketch and wiring interactions in InVision. InVision is a freemium product design and workflow tool that allows you to build interactive prototypes. It also acts as a repository of your layout changes, allowing you to reference earlier revisions. The design changed radically once we began beta testing and it was useful to revive older versions as user feedback poured in across builds. Designing an app from the ground up was as interesting experience. I was used to receiving redlines and assets at work laid out for me by expert designers. Still, there is an abundance of resources on the web to help provoke ideas. I found AWWWARDS and UI Movement to be particularly useful sources of design inspiration.
We decided to use this opportunity to adopt some new frameworks and learn about new technologies. I’ve been developing on iOS in Objective-C for five years now, but spent most of my recent years writing Rails APIs. This would be the perfect opportunity to dabble in Swift. Swift 3.0 had recently been released and was regarded as a major improvement for what is still a relatively young language. For the back end, we settled on bare-metal Golang. Since this was intended to be a learning experience, we wanted to choose a stack that was unfamiliar to either of us. I came from a Rails shop and those of you that have worked with Ruby might agree it’s some of the most fun code to write. It’s an easy, non-statically-typed syntax to pick up. Go, on the other hand, is strongly typed and touts exceptional support for concurrent programming. We also read that it provided a performance boost, largely due to the static compilation. We decided to give it a go (ba dum tshhh).
There was also room to explore different hosting solutions for our back end. We deployed our previous projects on Heroku and AWS but heard that Google Cloud was gaining traction. We did a quick spike on Heroku for Go and found dependency management to be quite messy. Like other popular hosting solutions, Google Cloud features a wide range of services from App Engine (PaaS) to Cloud Storage (for our receipt images). It boasts a modern console for managing your selected services and mostly competitive pricing. Our minds were set on using Google’s Vision API to perform the OCR anyway, so we decided to take the rest of its cloud offerings for a spin.
Training The Model
What excited us about the project was the thought that there would never be a finish line. Of course the product itself could always grow, eventually supporting P2P payments, point of sale integration, etc., but the lack of a standardized receipt format meant our core feature of parsing and its decision tree would have to continue to expand as we observed more receipts. In the beginning, there was a considerable amount of overfitting — the model was too rigid, assuming receipts would always be laid out in a particular way. We trained the first iteration by looking at receipts we had lying around (mostly Chipotle). The Vision API would classify text blobs within the provided image and spit them out along with some metadata: XY coordinates, bounding rects, etc. We finagled this data into confidence intervals of where the matching price was for each itemization until the items and fees came out perfectly. Home run right?
We blitzed through the first version of the mobile client, eager to get it into the hands of our families and friends to test. In the past, this was a complicated process where testers’ devices needed to be added to the provisioning profile of the app before so that it could be distributed outside of the App Store. That all changed when Apple acquired Testflight and added support for its beta testing capabilities right into iTunes Connect. It’s now simply a matter of typing in the emails of users you are interested in sending a build and they can play with it right away. There was no more need for remote hosting of .ipa application files and reminding your testers to re-download each time there was a new build.
As you probably guessed, the model performed horribly. My dad was one of our first users and an avid tester. I would receive push notifications informing me I had been added to a new receipt followed by a text: “Not working!”. The problem was two-fold. The Vision API would on occasion return obscure text found nowhere on the receipt, almost as if it was reading an entirely different document. Other times, the calculated bounds of the matching prices would be incorrect depending on the receipt layout, the lighting, and the degree to which the image was off axis.
This outright stumped us. Our first thought was that there was buggy logic resulting in us nondeterministically sending the wrong image to be processed. We looked through our entire cache of images but couldn’t find text matching the output on any of the receipts. This had to be a problem on Google’s end — maybe we induced collisions, resulting in fetches of images elsewhere in the cloud. It was a stretch, but we filed a support ticket anyways. After some more sleuthing, we discovered that different results could be procured for the same receipt, but the same image would always be read deterministically. This lead us to believe that Google was using the images’ EXIF data in some way to determine the orientation of the documents which lead characters to be read differently. But how could this be? The UI in the app always showed the picture being taken in portrait mode. I found this terrific resource explaining EXIF orientation relative to the positioning of the camera. We had assumed images taken would always be in orientation 1 (upright) but this was not always the case. The EXIF orientation was, in fact, 6 (camera turned 90° clockwise) for all of the images we were classifying. It was a miracle it had worked at all. The answer, then, was to perform a transform of the image to match the orientation before uploading it to our servers.
Finding a solution to remedy poor image conditions would be trickier. We introduced a normalization step into the processing lifecycle to boost the number of invariants across all images scanned. For Dutch, this meant aligning the receipt within the image as well as eliminating as many artifacts as possible. Our normalization can be broken down into three parts: detection of the corners of the receipt, application of a homography transform, and image denoising. Homography between two planes describes a relationship that exists between four points on one plane and four points on the other, allowing one set to be projected on to the other. In our case, we wanted the corners of the receipts (and their bounded points) to be transformed into the system of the images’ borders. OpenCV is an open source C++ library packed with many algorithms for computer vision, image processing, and machine learning. Bindings exist for other popular languages such as Python and Java. This had everything we needed to normalize our images. We applied Harris Corner Detection on the image, giving us our four corners that we then use to perform the transform and voilà. To denoise, we parlayed image denoising with simple thresholding. Denoising samples a small neighborhood around each pixel and chooses a value for it by looking at the resulting median and weighted average while thresholding reduces the input to a binary image — one where pixels were classified as black or white depending on the threshold value. This gave us our final result.
Normalization proved effective but was extremely costly, increasing latency of the upload request by an average of 43%. We still chose to apply the step, but only as a variant of the request to observe the result and its downstream effects. Higher parsing accuracy did in fact lead to more invites and receipt finalizes. We decided instead that we would attempt to solve with effective UI where we couldn’t compromise on performance.
Running parallel to model performance was our investigation into an effective user experience. Something that stood out to me was how often I heard the group request that someone send out a picture of the receipt. Persistence and accessibility of information have made strides in the past decade. You can easily access your bank statements, your past Uber rides, and even your credit score. We thought that your meals out should be no different. In addition to being a splitting apparatus, Dutch doubles as a journal to help you rediscover what you ordered, whom you dined with, and when y’all had that crazy night out. None of that would happen if we couldn’t quickly get diners without the app involved. This thought lead to us choosing phone number as the main attribution channel which gave us two quick wins in the funnel. It allowed us to leverage users’ contact books as pre-existing invite lists. Once selected, diners would be attributed to the receipt via their phone numbers and, upon successful authentication, would see the receipt waiting for them on the home screen. Phone number verification as an authentication mechanism also reduces a lot of the sign up friction that exists today where users are fumbling to verify their emails. Once contacts were on Dutch as friends, they could quickly be invited to any receipt much like they would to a Snap or a story.
Next, we had to decide how to best let our users make the assignments between diners and items. It seemed obvious to me in the beginning that users would work their way down the items lists, making the appropriate assignments for each item to ensure that every item was accounted for. The interface would be much simpler to build and the API would only have to consume assignments one item at a time. This worked smoothly when the number of items was low but scaled poorly for larger groups. One of our earliest beta testers, Dennis, offered another perspective. Instead of working down the receipt, someone usually went around the table asking each person what they ordered. There was the looming possibility of orphaned items at the end, but it still turned out to be more effective. Each diner knew best what was dishes were in front of them, and at most, what was in front of their neighbors. It didn’t help to ask everyone, “who ordered X?”, when only a few at the table could confirm the answer. Assigning by diner made the process much more fluid and also gave us the opportunity to introduce fun UI elements to an otherwise dense table of text. We left functionality in there to assign by item as it was still useful for quickly marking a shared appetizer as ordered by everyone.
We decided this was a good point to make a submission and start gathering feedback from new users. I had made submissions before, but most of the process had been abstracted away for me by other teams within the company dedicated to managing certificates and things of that sort. The first order of business was to create an organizational Apple developer account. We did not want the app appearing in the store under either of our names and we needed an umbrella account for other apps we have in our pipeline. Some time in 2012, Apple added a new requirement to the developer program enrollment process, asking for a DUNS number. DUNS numbers are issued by a company called Dun & Bradstreet that validates your team’s incorporation status. You can obtain one for free by signing up and providing some documentation. Sounds simple enough.
The process turned out to simply not be up to par with the expectations of teams around the world, resulting in frustration and momentum loss. Any quick search will show you this. After filling out pages of complicated and unnecessary data (I was often asked to select from a list of choices describing my company, none of which were applicable), I was presented with an option to either wait as long as 30 days, or pay $50 to expedite the process. I opted to wait. There was still polish I could work on and this would surely be the last hurdle before we submitted.
To my surprise, D&B responded to me after three business days. I took my number and attempted to enroll in the developer program. The form came back with an error stating that my DUNS number could not be identified and to try again later. I called Apple and immediately ran into a wall. Their support staff was trained to deflect any and all requests until the time period allotted for changes had elapsed. They would neither confirm nor explain any of the errors that surfaced. All I got across multiple calls was, “changes with D&B take 14 business days to reflect in our system”. This was ridiculous — in an age where terabytes of data move every second, Apple could only retrieve my information from D&B twice a month? My biggest fear was that at the end of 14BDs, there would still be some sort of clerical error and that I would have to wait again.
I tried every day, not convinced that it would actually take 14BDs. It seemed silly that the syncing of my data had to conform to business hours and schedules. In fact, it would probably be more efficient if it didn’t. Three days later, the enrollment form was able to find a record for my DUNS number. Upon submission however, it threw an error again, reporting that it could not identify my business’ entity type and rejecting my application. I quickly called D&B asking to verify my information. They responded, “LLC”, as it was present in the business name, and then reminded me that data could take 14BDs to reflect. At this point, I was hearing the same thing from both companies. Apple even scheduled a follow up with me for the day the window expired in an attempt to get me to stop calling. It was a game of hot potato and neither company wanted to service my request. Three days later, I finally found someone from D&B who was kind enough to go through my company info with me as it was in their system. It turns out there was a checkbox visible on their end that specified entity type that was, in fact, blank. You’d think that something in the ten paged form would have supplied the information to fill this out. Elated, I immediately called Apple back, forgetting that this scenario was undoubtably their bread and butter. I reported that there was a clerical error that was recently updated and was immediately told…. I won’t repeat it again. The change went through a day later and we submitted.
The journey has been a rewarding one. We’ve received a ton of feedback so far that will go towards improving the current experience as well as building effective interfaces for other projects. So what’s next? We’re coming soon to Android. Also, be on the look out for new interactions within the app such as P2P payments. Beyond Dutch, our goal is to enrich your plans by taking the parts that are ordinarily chores and changing them into effortless tasks, giving you more time to enjoy what you set out to do. Standby for more.
Questions? Never hesitate to drop us a support ticket through the app or reach out directly at email@example.com.