Offline OCR using Tesseract in Unity… Part 2

Neelarghya
XRPractices
Published in
9 min readDec 19, 2019

Let’s finish what we started…
In case you have missed out on setting up of the project you can follow Part 1 of this guide or clone the repository and checkout the commit

git clone git@github.com:Neelarghya/tesseract-unity.git
git checkout 199114b

Assuming we are in sync let’s continue…

9. Recognize text from image

9.1 Setting up the APIS

Let’s start by setting up some of the required API to feed an image/texture to the tesseract instance we created. For this you would require a few exposed functions in TesseractWrapper, namely
- TessBaseAPISetImage & TessBaseAPISetImage2 : to set the image that needs to be recognized.
- TessBaseAPIRecognize
: to recognize the image that was set
- TessBaseAPIGetUTF8Text
: to get the recognized text in utf-8 format
- TessDeleteText
: to delete the string pointer produced from the above function
- TessBaseAPIClear
: to clear the APIs

[DllImport(TesseractDllName)]
private static extern void TessBaseAPISetImage(IntPtr handle, IntPtr
imagedata, int width, int height,
int bytes_per_pixel, int bytes_per_line);

[DllImport(TesseractDllName)]
private static extern void TessBaseAPISetImage2(IntPtr handle,
IntPtr pix);

[DllImport(TesseractDllName)]
private static extern int TessBaseAPIRecognize(IntPtr handle, IntPtr
monitor);

[DllImport(TesseractDllName)]
private static extern IntPtr TessBaseAPIGetUTF8Text(IntPtr handle);

[DllImport(TesseractDllName)]
private static extern void TessDeleteText(IntPtr text);

[DllImport(TesseractDllName)]
private static extern void TessBaseAPIClear(IntPtr handle);

[Commit: e7c506f]

9.2 Let’s put the Recognition in OCR!

Let’s start off by writing a function inside TesseractWrapper called Recognize that would be fed a Texture2D, and initialize some basic variables we would need in the near future. Also a guard clause never hurts any one so…

public string Recognize(Texture2D texture)
{
if (_tessHandle.Equals(IntPtr.Zero))
return null;

int width = texture.width;
int height = texture.height;
Color32[] colors = texture.GetPixels32();
int count = width * height;
int bytesPerPixel = 4;
byte[] dataBytes = new byte[count * bytesPerPixel];

(Note: The closing curly braces aren’t present cause the function spills over to the next snippet(s))
Up next we will set up the Image that tesseract needs to recognize, we do this by building up the ByteStream/Array from the PixelArray that we produce from the texture. Then Writing that byte array to the memory and passing a pointer to the memory as a parameter to the SetImage of tesseract that we exposed earlier.

int bytePtr = 0;
for (int y = height - 1; y >= 0; y--)
{
for (int x = 0; x < width; x++)
{
int colorIdx = y * width + x;
dataBytes[bytePtr++] = colors[colorIdx].r;
dataBytes[bytePtr++] = colors[colorIdx].g;
dataBytes[bytePtr++] = colors[colorIdx].b;
dataBytes[bytePtr++] = colors[colorIdx].a;
}
}

IntPtr imagePtr = Marshal.AllocHGlobal(count * bytesPerPixel);
Marshal.Copy(dataBytes, 0, imagePtr, count * bytesPerPixel);
TessBaseAPISetImage(_tessHandle, imagePtr, width, height,
bytesPerPixel, width * bytesPerPixel);

Once tesseract knows the image it needs to recognize we can ask it to recognize the image (TessBaseAPIRecognize). Once it has recognized the text inside an image we can ask it what text (TessBaseAPIGetUTF8Text) it has recognized, tesseract doesn’t return the text but a pointer to the memory address that contains the text. We would need to marshal this string pointer to an actual string. For Windows you would need to convert it to an Ansi string and for Mac/Android/etc you can just use Auto. And clear up the image memory. Finally don’t forget to clear tesseract (TessBaseAPIClear) and delete the string pointer (TessDeleteText).

    if (TessBaseAPIRecognize(_tessHandle, IntPtr.Zero) != 0)
{
Marshal.FreeHGlobal(imagePtr);
return null;
}

IntPtr str_ptr = TessBaseAPIGetUTF8Text(_tessHandle);
Marshal.FreeHGlobal(imagePtr);
if (str_ptr.Equals(IntPtr.Zero))
return null;
#if UNITY_EDITOR_WIN || UNITY_STANDALONE_WIN
string recognizedText = Marshal.PtrToStringAnsi (str_ptr);
#else
string recognizedText = Marshal.PtrToStringAuto(str_ptr);
#endif

TessBaseAPIClear(_tessHandle);
TessDeleteText(str_ptr);

return recognizedText;
}

With the recognizedText in hand we are ready to read the image we added to the project last time.
[Commit: fc7e9df]

9.3 Display the recognized Text!

Let’s start from the Top-Down.
With adding an image that needs to be recognized and a Recognize call to the TesseractDemo class and add the text to the display.

[SerializeField] private Texture2D imageToRecognize;private void Recoginze()
{
display.text += "\n" +
_tesseractDriver.Recognize(imageToRecognize);
}

But who would call this Recognize() and we also need the unimplemented function Recognize for TesseractDriver. We need to call Recognize() only after setup is done thus… TesseractDempScript.Start()

private void Start()
{
_tesseractDriver = new TesseractDriver();
display.text = _tesseractDriver.CheckTessVersion();
_tesseractDriver.Setup();
Recoginze();
display.text += "\n" + _tesseractDriver.GetErrorMessage();
}

Finally TesseractDriver.Recognize() is a simple function that cascades the call to TesseractWrapper.Recognize()

public string Recognize(Texture2D imageToRecognize)
{
return _tesseract.Recognize(imageToRecognize);
}

Don’t forget to assign the Image to the inspector field imageToRecognize of TesseractDemoScript.

Then just hit play and let the magic happen…

[Commit: 3cfad88]

10. Highlighting Recognized Words

In order to highlight recognized words we need to get the positions of the recognized words. In order to know where the words were found we need to ask tesseract for the words (TessBaseAPIGetWords).

[DllImport(TesseractDllName)]
private static extern IntPtr TessBaseAPIGetWords(IntPtr handle, IntPtr pixa);

And with the addition of the above exposed function to TesseractWrapper we can actually find the bounding box of the recognized words.
This function that we exposed returns a Structure found in Leptonica library called Boxa… The Docs might be helpful…

So we would need to replicate these structures in out code, keeping in mind it was originally written in C++ and their love for Pointers…

using System;
using System.Runtime.
InteropServices;
[StructLayout(
LayoutKind.Sequential)]

public struct Box
{
public Int32 x;
public Int32 y;
public Int32 w;
public Int32 h;
public Int32 refcount;
}
--------------------------
using System;
using System.Runtime.
InteropServices;
[StructLayout(
LayoutKind.Sequential)]

public struct Boxa
{
public Int32 n;
public Int32 nalloc;
public Int32 refcount;
public IntPtr box;
}

I know the code looks congested… It’s just 2 short structs… but I’m hoping you notice me setting the StructLayout to Sequential this lets the compiler know that the variables are sequentially placed in the memory and this helps in marshaling it properly. Also notice me explicitly stating the use of 32 bit int… noticed..? Good…

So in the TesseractWrapper class just after TessBaseAPIRecognize call and prior to TessBaseAPIGetUTF8Text call add this block of code…

int pointerSize = Marshal.SizeOf(typeof(IntPtr));IntPtr intPtr = TessBaseAPIGetWords(_tessHandle, IntPtr.Zero);Boxa boxa = Marshal.PtrToStructure<Boxa>(intPtr);
Box[] boxes = new Box[boxa.n];
for (int index = 0; index < boxes.Length; index++)
{
IntPtr boxPtr = Marshal.ReadIntPtr(boxa.box,
index * pointerSize);
boxes[index] = Marshal.PtrToStructure<Box>(boxPtr);
Box box = boxes[index];
DrawLines(texture,
new Rect(box.x, texture.height - box.y - box.h, box.w, box.h),
Color.green);
}

Pointer warning: Anyone allergic to pointers may look away… :P
So we start by getting a pointer to the Boxa structure that TessBaseAPIGetWords returns. We Marshal it to a Boxa variable (boxa). Then we create an array of boxes of size boxa.n. This boxa contains a struct Box** property called box, i.e. a pointer to the first element in an array of pointers of type Box …You still with me… Nice… With that all we need to do is read values from the array of pointers and find the memory location pointed by the elements and convert them back to a Box type structure… That’s what the code does….
…That went better than I expected…

Once we have the bounding Boxes of the words all we need to do is draw boxed around them... The function that we called in the above snippet needs to be implemented now… Pretty self explanatory… Just drawing 4 lines.

private void DrawLines(Texture2D texture, Rect boundingRect, Color 
color, int thickness = 3)
{
int x1 = (int) boundingRect.x;
int x2 = (int) (boundingRect.x + boundingRect.width);
int y1 = (int) boundingRect.y;
int y2 = (int) (boundingRect.y + boundingRect.height);

for (int x = x1; x <= x2; x++)
{
for (int i = 0; i < thickness; i++)
{
texture.SetPixel(x, y1 + i, color);
texture.SetPixel(x, y2 - i, color);
}
}

for (int y = y1; y <= y2; y++)
{
for (int i = 0; i < thickness; i++)
{
texture.SetPixel(x1 + i, y, color);
texture.SetPixel(x2 - i, y, color);
}
}

texture.Apply();
}

[Commit: fa13e4e]

Finally time to display the new texture.
First let’s set up the Scene, by adding a Raw Texture that we will be using as the display, You can position it and organize the scene however you want… provided your Raw Texture and Text display are both visible in Game mode

We will add a GetHighlightedTexture() to TesseractDriver which will delegate the call to TesseractWrapper.GetHighlightedTexture()

public Texture2D GetHighlightedTexture()
{
return _tesseract.GetHighlightedTexture();
}

Let’s implement the TesseractWrapper.GetHighlightedTexture(), we will start by adding a property to store the highlighted/output texture. And will assign and replace all instances of the texture parameter in TesseractWrapper.Recognize() by the new property _highlightedTexture. Finally we will top it off with a getter for _highlightedTexture

private Texture2D _highlightedTexture;public string Recognize(Texture2D texture)
{
if (_tessHandle.Equals(IntPtr.Zero))
return null;

_highlightedTexture = texture;

int width = _highlightedTexture.width;
int height = _highlightedTexture.height;

...
}
public Texture2D GetHighlightedTexture()
{
return _highlightedTexture;
}

We will also clean up and streamline the TesseractDemoScript a bit…

using UnityEngine;
using UnityEngine.UI;

public class TesseractDemoScript : MonoBehaviour
{
[SerializeField] private Texture2D imageToRecognize;
[SerializeField] private Text displayText;
[SerializeField] private RawImage outputImage;
private TesseractDriver _tesseractDriver;
private string _text = "";

private void Start()
{
Texture2D texture = new Texture2D(imageToRecognize.width,
imageToRecognize.height, TextureFormat.ARGB32, false);
texture.SetPixels32(imageToRecognize.GetPixels32());
texture.Apply();

_tesseractDriver = new TesseractDriver();
Recoginze(texture);
SetImageDisplay();
}

private void Recoginze(Texture2D outputTexture)
{
ClearTextDisplay();
AddToTextDisplay(_tesseractDriver.CheckTessVersion());
_tesseractDriver.Setup();
AddToTextDisplay(_tesseractDriver.Recognize(outputTexture));
AddToTextDisplay(_tesseractDriver.GetErrorMessage(), true);
}

private void ClearTextDisplay()
{
_text = "";
}

private void AddToTextDisplay(string text, bool isError = false)
{
if (string.IsNullOrWhiteSpace(text)) return;

_text += (string.IsNullOrWhiteSpace(displayText.text) ? "" :
"\n") + text;

if (isError)
Debug.LogError(text);
else
Debug.Log(text);
}

private void LateUpdate()
{
displayText.text = _text;
}

private void SetImageDisplay()
{
RectTransform rectTransform =
outputImage.GetComponent<RectTransform>();

rectTransform.SetSizeWithCurrentAnchors(
RectTransform.Axis.Vertical,
rectTransform.rect.width *
_tesseractDriver.GetHighlightedTexture().height /
_tesseractDriver.GetHighlightedTexture().width);

outputImage.texture =
_tesseractDriver.GetHighlightedTexture();
}
}

With the correct inspector fields assigned…

…Let it Rip!

Ok, maybe the green is’t that prominent… Let’s try red next time :P
[Commit: b8eff52]

11. Filter words based on confidence

So our project works perfectly fine but for those of you who expect to get a lot of imperfect textures to read a filtering mechanism to get rid of garbage recognitions. To demonstrate this we will be using a more complicated texture and have a very high confidence threshold for filtering (60%). In real life situations a lower threshold like 20% or a dynamic threshold should be used.

Let’s set up the scripts inside of the TesseractWrapper

private const float MinimumConfidence = 60;[DllImport(TesseractDllName)]
private static extern IntPtr TessBaseAPIAllWordConfidences(IntPtr handle);

We need the threshold (MinimumConfidence) and the exposed function TessBaseAPIAllWordConfidences this will provide us with a list of confidence levels (varying from 0 to 100) for each word. In order to use this confidence we need to call this exposed function post TessBaseAPIRecognize call in TesseractWrapper.Recognize(), note the AllWordConfidences returns a pointer to the 1st element of an Integer32 array ending with -1, so you need to loop through until you get -1.

...
IntPtr confidencesPointer =
TessBaseAPIAllWordConfidences(_tessHandle);
int i = 0;
List<int> confidence = new List<int>();

while (true)
{
int tempConfidence = Marshal.ReadInt32(confidencesPointer,
i * 4);

if (tempConfidence == -1) break;

i++;
confidence.Add(tempConfidence);
}
...

Continuing with TesseractWrapper.Recognize() we need to filter the words out with lower confidence than the threshold and display the remaining words.

...
for (int index = 0; index < boxes.Length; index++)
{
if (confidence[index] >= MinimumConfidence)
{

IntPtr boxPtr = Marshal.ReadIntPtr(boxa.box, index *
pointerSize);
boxes[index] = Marshal.PtrToStructure<Box>(boxPtr);
Box box = boxes[index];
DrawLines(_highlightedTexture,
new Rect(box.x, _highlightedTexture.height - box.y -
box.h, box.w, box.h),
Color.green);
}
}

IntPtr stringPtr = TessBaseAPIGetUTF8Text(_tessHandle);
Marshal.FreeHGlobal(imagePtr);
if (stringPtr.Equals(IntPtr.Zero))
return null;

#if UNITY_EDITOR_WIN || UNITY_STANDALONE_WIN
string recognizedText = Marshal.PtrToStringAnsi (str_ptr);
#else
string recognizedText = Marshal.PtrToStringAuto(stringPtr);
#endif

TessBaseAPIClear(_tessHandle);
TessDeleteText(stringPtr);

string[] words = recognizedText.Split(new[] {' ', '\n'},
StringSplitOptions.RemoveEmptyEntries);

StringBuilder result = new StringBuilder();

for (i = 0; i < boxes.Length; i++)
{
Debug.Log(words[i] + " -> " + confidence[i]);
if (confidence[i] >= MinimumConfidence)
{
result.Append(words[i]);
result.Append(" ");
}
}

return result.ToString();

}

If you check the corresponding logs for words and confidences you will se it detects don’t as dow’ with a confidence of 51% and so on… Such recognitions are filtered via this mechanism. You can play around with the threshold values to fine tune your solution.
[Commit: b041356]

You are still here? Talk about perseverance…
Although the solution is mostly complete at this point I would like to demonstrate how to setup the tessdata for android where the streaming assets are actually embedded inside the APK thus unwritable. But I can’t drag this article any longer so we will cover it in the final Part.

< Part 1 | 2 | Part 3 >

--

--

Neelarghya
XRPractices

Stuck between being the fly on the wall and the eye of the storm…