Offline OCR using Tesseract in Unity… Part 1

Neelarghya
XRPractices
Published in
9 min readDec 16, 2019

This will be a guide on how to setup Tesseract for any platform using Unity, by building a project that will be able to read and highlight text on capturing an image… For the uninitiated Tesseract is an open source optical character recognition (OCR) engine. This tutorial will focus on building an independent (i.e. inclusive of all dependencies and data) setup for platforms such as Mac, Windows, Android. It will have to be a three-parter since it’s a lot to cover in a single article

Who is this for..?

This is for anyone who needs a standalone OCR solution for their Unity project independent of any APIs or network calls. I will assume you have a general grasp of Unity and GIT.

Note: Tesseract is open sourced under the Apache License, Version 2.0, as of the date I’m writing this, do read before using it.

Enough talk Lets get down to Business!

1. Setup a basic Unity project (If you need help with it you can find it here)

You can find the entire project here

git clone git@github.com:Neelarghya/tesseract-unity.git
git checkout 0d8d39a

[Commit: 0d8d39a]

2. Adding Plugins

This project contains Tesseract 4.1.0 plugins inclusive of dependencies for Mac, Windows x64 and Android… but incase you need it for a different version or for a different platform you can do so by Installing Tesseract via pre-built binary package or build it from source on your platform or a Docker.
For Android, I built it via installing Tesseract into an Android Docker. To give credit where credit is due, took “inspirations” from https://github.com/rhardih/bad. You would need to modify the MakeFile and DockerFile if you want to change the something like version.
The DLLs have internal dependencies with each other so changing the names is a bad idea.
[Commit: cb0c630]

3. Expose the Required APIs — Writing a Wrapper

using System;
using System.Runtime.InteropServices;

public class TesseractWrapper
{
#if UNITY_EDITOR
private const string TesseractDllName = "tesseract";
private const string LeptonicaDllName = "tesseract";
#elif UNITY_ANDROID
private const string TesseractDllName = "libtesseract.so";
private const string LeptonicaDllName = "liblept.so";
#else
private const string TesseractDllName = "tesseract";
private const string LeptonicaDllName = "tesseract";
#endif
[DllImport(TesseractDllName)]
private static extern IntPtr TessVersion();
}

So, we start by adding a class called TesseractWrapper which will act as an API layer between the application and the Tesseract DLL(s). As you might notice above the DLLs we got, have different names for different platforms. In order to get over this issue we are using Compiler Switches to fix the Tesseract and Leptonica(One of the major dependencies) plugin file names.
The way we expose Functions is by using DllImport(<fileName>) and extern key word… The function signature is something you would have to look up from the Documentations. Here we are exposing the function TessVersion() which has…

Function signature according to the Tesseract Documentation, so the return type is a pointer (IntPtr) with no params.
[Commit: fc7d5d6]

4. Check version

Until now the project has some plugins and code but nothing happens, let’s change that shall we? We will start by

  • We will start by Adding a layer over the exposed dll function, then massage the data (Marshal the pointer to String) provided by the dll function call to display the Version of Tesseract we will be using.
public string Version()
{
IntPtr strPtr = TessVersion();
string tessVersion = Marshal.PtrToStringAnsi(strPtr);
return tessVersion;
}
  • Next we need to display this as a Log (So that you can test it even if you are running it inside of a docker using Unity Command Line) as well as an Onscreen Text (This just makes sure you can test it on Platforms like android without using ADB).
  • Starting with 2 script files namely TesseractDriver (An additional layer over the wrapper that is responsible for the setting up of the environment for Tesseract and acts as the user’s point of entry, also leaves the implementation independent of Unity) and TesseractDemoScript (A demo script meant to guide users on how to use the Project)
using System;
using UnityEngine;

public class TesseractDriver
{
private TesseractWrapper _tesseract;

public string CheckTessVersion()
{
_tesseract = new TesseractWrapper();

try
{
string version = "Tesseract version: "
+ _tesseract.Version();
Debug.Log(version);
return version;
}
catch (Exception e)
{
string errorMessage = e.GetType() + " - " + e.Message;
Debug.LogError(errorMessage);
return errorMessage;
}
}
}
using UnityEngine;
using UnityEngine.UI;

public class TesseractDemoScript : MonoBehaviour
{
[SerializeField] private Text display;
private TesseractDriver _tesseractDriver;

private void Start()
{
_tesseractDriver = new TesseractDriver();
display.text = _tesseractDriver.CheckTessVersion();
}
}

Finally add an UI > Text (named Display in my scene) to the scene, an empty gameObject with the TesseractDemoScript Attached and assign the Text to the inspector exposed property “Display” of TesseractDemoScript.

At this point you know that the main Tesseract DLL is working (not sure if dependencies are… yet…) and can test on your required platform via visual feedback in the form of display or logs.
[Commit:
5276574]

5. Add TessData

Tesseract requires Trained Data files (tessdata) in order to work, you can pick and choose them here depending on the type (fast/best/normal) and language you need. You can also pick them up from your installation of Tesseract if you have done so (I did…). My repository also contains English tessdata for Tesseract 4.1.0
On you have it copy it into your Assets/StreamingAssets/
[Commit: 7565230]

6.1 Setup Tesseract (Finding the APIs)

Let build it from the top down… In TesseractDemoScript.Start() add

_tesseractDriver.Setup();

Although the code goes red, Top-Down coding is a good practice to understand how little you need to code and expose when you are designing an API.
Let’s add the Setup() for TesseractDriver

public void Setup()
{
_tesseract = new TesseractWrapper();
_tesseract.Init();
}

So now TesseractDemoScript is happy but TesseractDriver goes red, so let’s implement Init() in TesseractWrapper
But this is where we need to actually implement shit… So in order to setup an instance of Tesseract we need to call 2 functions from within the Plugin — TessBaseAPI.Create() and TessBaseAPI.Init(), so lets start by checking the docs and exposing them via our Wrapper, I will save you the pain of digging through the docs and get to the point.
But, here’s to the brave souls who are willing to risk their time and sanity (skip to 6.2 Setup Tesseract (Implementing) section if you aren’t) to go through the docs in order to uncover long lost knowledge. If you visit the Docs select the version you need then open up the Namespace > tesseract

You will find a huge list of Classes look for the required class in our case TessBaseAPI

Or just search for it… I know what you are gonna say, “There’s a search functionality provided by the site why even go through the Namespace > Class > Function bs?”… Well I get your point, but what you missed is you didn’t know what you had to look for until I told you so… And that’s exactly why this article is required. And you know where to look if you need something I ain’t covering.
After selecting TessBaseApi search for Init you will find 4 signatures for the function. When trying to expose an implementation from an DLL function overloading goes out the window and you need to specify the index (starting from 1) of the signature you will be using, so for the 3rd implementation the function name would be TessBaseAPIInit3(), yes class name followed by function name followed by signature index (not required if there is just 1 signature). So TessBaseApi.Create() (notice the dot), can be exposed via TessBaseApiCreate()

6.2 Setup Tesseract (Implementing)

So lets add the required functions to TesseractWrapper, and call them from Init(). Along with a handle to the tesseract instance and a constructor. Also returning a bool depending on whether Init was successful or not.

[DllImport(TesseractDllName)]
private static extern IntPtr TessBaseAPICreate();

[DllImport(TesseractDllName)]
private static extern int TessBaseAPIInit3(IntPtr handle, string dataPath, string language);
IntPtr tessHandle;

public TesseractWrapper()
{
tessHandle = IntPtr.Zero;
}
public bool Init(string lang, string dataPath)
{
try
{
tessHandle = TessBaseAPICreate();

int init = TessBaseAPIInit3(tessHandle, dataPath, lang);
if (init != 0)
{
Debug.LogError("Tess Init Failed");
return false;
}

return true;
}
catch (Exception e)
{
Debug.LogError(e);
return false;
}
}

You can observe that TessBaseAPIInit3() requires a data path and language which needs to be passed in as parameters so let’s change TesseractDriver accordingly.

public void Setup()
{
_tesseract = new TesseractWrapper();
string datapath = Application.streamingAssetsPath
+ "/tessdata/";

if (_tesseract.Init("eng", datapath))
{
Debug.Log("Init Successful");
}
}

Note I’m using “eng” to specify my tessdata language, similarly you can specify yours. If all goes well you will be greeted with…

This means your Tesseract version is setup/working with the tessdata provided
[Commit: 359a70f]

7. Clean Ups/Optimizations (Skippable)

I will make this quick… We will make sure there is only a single instance of Tesseract in a Wrapper at any point in time, and will be adding some boring functionalities and Logging. Adding some extra APIs.
[Commit: a8ceb3e]
So the new TesseractWrapper is

using System;
using System.Runtime.InteropServices;

public class TesseractWrapper
{
#if UNITY_EDITOR
private const string TesseractDllName = "tesseract";
private const string LeptonicaDllName = "tesseract";
#elif UNITY_ANDROID
private const string TesseractDllName = "libtesseract.so";
private const string LeptonicaDllName = "liblept.so";
#else
private const string TesseractDllName = "tesseract";
private const string LeptonicaDllName = "tesseract";
#endif
[DllImport(TesseractDllName)]
private static extern IntPtr TessVersion();

[DllImport(TesseractDllName)]
private static extern IntPtr TessBaseAPICreate();

[DllImport(TesseractDllName)]
private static extern int TessBaseAPIInit3(IntPtr handle, string dataPath, string language);

[DllImport(TesseractDllName)]
private static extern void TessBaseAPIDelete(IntPtr handle);

[DllImport(TesseractDllName)]
private static extern void TessBaseAPIEnd(IntPtr handle);

IntPtr _tessHandle;
private string _errorMsg;

public TesseractWrapper()
{
_tessHandle = IntPtr.Zero;
}

public string Version()
{
IntPtr strPtr = TessVersion();
string tessVersion = Marshal.PtrToStringAnsi(strPtr);
return tessVersion;
}

public string GetErrorMessage()
{
return _errorMsg;
}

public bool Init(string lang, string dataPath)
{
if (!_tessHandle.Equals(IntPtr.Zero))
Close();

try
{
_tessHandle = TessBaseAPICreate();
if (_tessHandle.Equals(IntPtr.Zero))
{
_errorMsg = "TessAPICreate failed";
return false;
}

if (string.IsNullOrWhiteSpace(dataPath))
{
_errorMsg = "Invalid DataPath";
return false;
}

int init = TessBaseAPIInit3(_tessHandle, dataPath,
lang);
if (init != 0)
{
Close();
_errorMsg = "TessAPIInit failed. Output: " + init;
return false;
}
}
catch (Exception ex)
{
_errorMsg = ex + " -- " + ex.Message;
return false;
}

return true;
}

public void Close()
{
if (_tessHandle.Equals(IntPtr.Zero))
return;
TessBaseAPIEnd(_tessHandle);
TessBaseAPIDelete(_tessHandle);
_tessHandle = IntPtr.Zero;
}
}

Adding below snippet to TesseractDriver

public string GetErrorMessage()
{
return _tesseract?.GetErrorMessage();
}

Adding error display to TesseractDemoScript

private void Start()
{
_tesseractDriver = new TesseractDriver();
display.text = _tesseractDriver.CheckTessVersion();
_tesseractDriver.Setup();
display.text += "\n" + _tesseractDriver.GetErrorMessage();
}

[Commit: a8ceb3e]

8. Get a texture

Just add any image that contains text to your Project > Assets > Images
We will be using this dummy image to check if Tesseract is able to recognize text.

And make sure you hit the check box Read/Write Enabled in it’s inspector

[Commit: adf982e]

Thank you for sticking so long with me… And sorry I know it’s kind of a cliff hanger but, we are done with most of the setup for OCR… Next part would cover recognizing text from image, some Android specific data setup, drawing boxes around recognized words and filtering words based on confidence level… exciting stuff… Hope it was helpful
See you in Part 2.

1 | Part 2 >

--

--

Neelarghya
XRPractices

Stuck between being the fly on the wall and the eye of the storm…