Why we built Magic Sudoku, the ARKit Sudoku Solver

Hint: Computer Vision + Augmented Reality = wow

Brad Dwyer
5 min readSep 24, 2017
The Magic Sudoku App in Action

Brad Dwyer is the founder of Hatchlings, a startup that makes games and mobile apps in Des Moines, Iowa.

Last week my company, Hatchlings, released Magic Sudoku for iOS11. It’s an app that solves sudoku puzzles using a combination of computer vision, machine learning, and augmented reality.

We got a ton of attention including getting voted to #1 on imgur, being retweeted over 2000 times, reaching the front page of Hacker News and Product Hunt, and getting featured in major publications like The Verge and The Next Web.

Many people have asked me about the app so I thought it would be fun to share some behind the scenes of how and why we built it.

This is the first post in a 3-part series. Part two is a technical post detailing how we built the app (including a look at the backend tools we built to help with the machine learning component). And part three will explore lessons learned and the business side.

Subscribe or follow me on twitter and you’ll be the first to know when they’re out. And don’t forget to download the app and give it a try yourself!

Magic Sudoku solves Sudoku Puzzles using the Power of Computer Vision, Machine Learning, and Augmented Reality

What makes Magic Sudoku different?

When Apple announced ARKit at WWDC17 I immediately knew I wanted to build something with it. I started thinking about potential app ideas.

I had a bunch of ideas but wanted to find one that would fit a list of criteria I had. Among them, I wanted to learn Swift, use ARKit, and dip my toe into machine learning for the first time. But first and foremost: I wanted to build something that would actually be improved by augmented reality!

Pokemon Go’s AR looks cool but isn’t integral to gameplay.

Too many AR apps don’t have a compelling reason to use the technology. They add augmented reality for the “cool” factor but basically just plop a 3D model on top of a video feed of the room you’re in “just because” and call it a day.

Heck, even Pokemon Go falls into this category. You can toggle the augmented reality mode to “off” and the app works just as well.

My idea was to combine computer vision with augmented reality to create a simple, streamlined UI that wouldn’t be possible without it.

Once I decided this, I narrowed my list down to a few concepts that met all of my criteria and ultimately landed on building a crossword puzzle solver. After exploring that for a few days I determined that it wasn’t going to be feasible with the tools I had available (Vision's image segmentation API wasn’t up to the task) and switched to building a Sudoku solver.

How adding computer vision to the equation changes things

Simpsons did it.
~ A lot of people

Most feedback has been positive. But the most common negative reaction I’ve gotten has been something along the lines of “Google Goggles has been doing this since 2011.” And yes, sudoku solvers have been available for a long time. The sudoku solver itself is not the cool part. Of the ~1 month of development time, writing the code that actually solves the puzzle took only an hour or two.

Technical people tend to understand why the app is cool. But it boils down to this: Magic Sudoku demonstrates a new model for human-computer interaction; computer vision is the input device and augmented reality is the output device.

Sidenote: changing or adding new “input/output” pairing combinations often provides new and better ways of doing things. Examples: Self Driving Cars (vision/motors), IOT (sensors/API), Google Translate (text/text), Instruments (touch/audio), Shazam (audio/text), Snapchat filters (image/image), Amazon Echo (voice/<many>). What other input/output pairings are there that haven’t yet been explored?

Several people have compared the app to “Terminator vision” and I think this is a good look at what’s possible when you combine CV+AR.

The Terminator doesn’t have to look at something, take a snapshot, feed it into a specific app to process it, and then look at the results. He simply looks at something and it transforms into a more useful state.

Luke Wroblewski describes this mode of interaction perfectly in his blog about how augmented reality headsets should work.

Notice how transformative this is for the UI. There are no intermediate steps. There are no buttons. There are no dialogs. There are no different screens. You simply look at something and see a transformed version of the world (in our case, you look at an empty sudoku and see the solution).

Word Lens Translation for iOS

Another great example of the power in combining CV+AR is is Word Lens (which was acquired by Google and is now built into Google Translate) which translates text in realtime simply by being pointed at something written in another language.

By using computer vision plus augmented reality we transform the world rather than “adding” to it as so many current-generation AR apps do.

So yes, you can create a sudoku solver without augmented reality. But it gets better when you add AR. In comparison to the simplest solvers, the time savings in data entry are night and day (keyboard entry vs immediate scan of the live video stream). And compared even to previous-generation image-scanning sudoku solvers, the flow is greatly streamlined and simplified.

We have several features coming down the pipeline that will make the unique advantages offered by AR even more evident as time goes on (but I don’t want to spill the beans on those quite yet!)

Stay tuned…

This is the first post in a 3-part series. Part two is a technical post detailing how we built the app (including a look at the backend tools we built to help with the machine learning component). And part three will explore lessons learned and the business side.

Subscribe or follow me on twitter and you’ll be the first to know when they’re out. And don’t forget to download the app and give it a try yourself!

--

--