Offline OCR using Tesseract in Unity… Part 3

Neelarghya
XRPractices
Published in
6 min readMar 12, 2020

Apologies on the super late update… Let’s configure it for Android…
In case you have missed out on setting up of the project and getting it to work on your PC you can follow Part 1 and Part 2 of this guide or clone the repository and checkout the commit

git clone git@github.com:Neelarghya/tesseract-unity.git
git checkout b041356

Assuming we are in sync let’s continue…

12. Android Specifics Setups

12.1 Basic platform switch

Open Build Settings > Add the Main scene > Select Android > Switch Platform.
If you don’t have the Android build platform down loaded or sdk not set up follow this.

12.2 Setting up Player Settings

Select Player Settings, now in theory you should configure a lot of stuff here to better suit your app as per your need, but the only one necessary to build is
Player Settings > Other Settings > Package Name (turn it to lower case and you should be fine…)

[Commit: 8eee647]

With these settings done you can Build for Android!
…Quite an achievement right? If it were that easy I wouldn’t be writing this…
If you actually tried it you would be greeted with these lovelies…

Tesseract version: 4.1.0
TessAPIInit failed. Output: -1

If you are getting this it means the initialization of Tesseract failed. It mostly occurs due to any issue with the TessData. The fact that the Tesseract version displays tells us that the dll/so files were found, it’s a simple sanity check.
But instead if you are greeted with…

System.DllNotFoundException: libtesseract.so
at (wrapper managed-to-native) TesseractWrapper.TessBaseAPICreate()

It’s an issue with the dll/so files. And Unity does a crappy job at telling you what the problem is. It’s not always that the file shown in the log (libtesseract.so in our case) that wasn’t found, but most of the time it’s one of their dependencies. So how do you get to know what are the dependencies of the DLLs you are using..? …From Docs of the provider or using external tools (like DependencyWalker for windows etc). The better approach is still to install whatever you want into a docker of the required platform and extract the library files… (That’s how I got hold of the original dlls)

If you do face this issue (DllnotFound) it might be caused due to the lack of libc++_shared.so, for some it’s pre-configured with Unity’s base dlls and will get packaged during build if any Library requires C++ support, if not you need to add it yourself. (And my deepest apologies to those who went through the previous articles and were faced with this issue… I should have written this sooner… If you still have issues do reach out…)
I’m providing the .so file anyways… [Commit: 74a73c9]

Ok enough rabbit-holing for unfound dlls, back to the original issue of not being able to initialize Tesseract. This occurs because TessData is inside of StreamingAssets which for Android is actually part of APK and not in any file system. But TessData needs to be read/writable for Tesseract to work with, which isn’t the case with StreamingAssets in Android. Thus we need to move it to storage a.k.a Persistent Data.

13. Injecting & Extracting TessData

Building the Zip-UnZip Mechanism

Now in order to move TessData to Persistent Data, we will
1. Zip the TessData folder
2. Copy it to Persistent Data path
3. UnZip it

13.1 Zip the TessData folder

Simply go to the StreamingAssets in your project directory and compress the tessdata folder into you favoured compression, I will be using tgz
Note: the editor will still be using the one from StreamingAssets so don’t delete it yet (yes it causes some redundancy and inflates the APK size).

cd Assets/StreamingAssets
tar -czvf tessdata.tar.gz tessdata

13.2 Copy it to Persistent Data path

Copying the file at runtime is a bit trickier…
In the TesseractDriver let’s add a function (CopyAllFilesToPersistentData) to copy this newly created zip file

private async void CopyAllFilesToPersistentData(List<string>  fileNames, UnityAction onSetupComplete)
{
String fromPath = "jar:file://" + Application.dataPath + "!/assets/";
String toPath = Application.persistentDataPath + "/";

foreach (String fileName in fileNames)
{
if (!File.Exists(toPath + fileName))
{
Debug.Log("Copying from " + fromPath + fileName +
" to " + toPath);
WWW www = new WWW(fromPath + fileName);

while (!www.isDone)
{
await Task.Delay(
TimeSpan.FromSeconds(Time.deltaTime));
}

File.WriteAllBytes(toPath + fileName, www.bytes);
Debug.Log("File copy done");
www.Dispose();
www = null;
}
else
{
Debug.Log("File exists! " + toPath + fileName);
}

UnZipData(fileName);
}

OcrSetup(onSetupComplete);
}

It’s a simple copy script that copies all files in the fileNames list from Androids StreamingAssets to PersistentDataPath, using a Unity’s WWW followed by a File.WriteBytes.
You might have noticed it also takes a onSetupComplete parameter, it is essentially a call back once all setup is complete, for out case it’s a recognnize call to the Tesseract.

13.3 UnZip it

In order to Zip/UnZip we can use any library… You are also free to write your own, I will be using icsharpcode/SharpZipLib
Let’s start by adding their dlls into the plugins

Next we will add a class UnZipUtil to take care of the Zipping and UnZipping

using System.IO;
using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;

public class UnZipUtil
{
public static void CreateTarGZ_FromDirectory(string tgzFilename,
string sourceDirectory)
{
Stream outStream = File.Create(tgzFilename);
Stream gzoStream = new GZipOutputStream(outStream);
TarArchive tarArchive =
TarArchive.CreateOutputTarArchive(gzoStream);

// Note that the RootPath is currently case
// sensitive and must be forward slashes e.g. "c:/temp"
// and must not end with a slash, otherwise cuts
// off first char of filename
// This is scheduled for fix in next release
tarArchive.RootPath =
sourceDirectory.Replace('\\', '/');
if (tarArchive.RootPath.EndsWith("/"))
tarArchive.RootPath =
tarArchive.RootPath
.Remove(tarArchive.RootPath.Length - 1);

AddDirectoryFilesToTar(tarArchive, sourceDirectory, true);

tarArchive.Close();
}

public static void AddDirectoryFilesToTar(TarArchive tarArchive,
string sourceDirectory, bool recurse)
{
// Optionally, write an entry for the directory itself.
// Specify false for recursion here
// if we will add the directory's files individually.

TarEntry tarEntry =
TarEntry.CreateEntryFromFile(sourceDirectory);
tarArchive.WriteEntry(tarEntry, false);

// Write each file to the tar.
//
string[] filenames = Directory.GetFiles(sourceDirectory);
foreach (string filename in filenames)
{
tarEntry = TarEntry.CreateEntryFromFile(filename);
tarArchive.WriteEntry(tarEntry, true);
}

if (recurse)
{
string[] directories =
Directory.GetDirectories(sourceDirectory);
foreach (string directory in directories)
AddDirectoryFilesToTar(tarArchive,
directory, recurse);
}
}

public static void ExtractTGZ(string gzArchiveName,
string destFolder)
{
Stream inStream = File.OpenRead(gzArchiveName);
Stream gzipStream = new GZipInputStream(inStream);

TarArchive tarArchive =
TarArchive.CreateInputTarArchive(gzipStream);
tarArchive.ExtractContents(destFolder);
tarArchive.Close();

gzipStream.Close();
inStream.Close();
}
}

There are 3 functions namely: CreateTarGZ_FromDirectory, AddDirectoryFilesToTar and ExtractTGZ.
But for our use case we will only be using ExtractTGZ cause we only need the extraction. Why keep the rest you might feel like automating the build process by zipping as a pre-build script thus removing the manual step to compress the files.

14. Finally time to patch everything together

In the CopyAllFilesToPersistentData you might have noticed a place holder function UnZipData() it’s time to add it to the TesseractDriver

private void UnZipData(string fileName)
{
if (File.Exists(
Application.persistentDataPath + "/" + fileName))
{
UnZipUtil.ExtractTGZ(
Application.persistentDataPath + "/" + fileName,
Application.persistentDataPath);
Debug.Log("UnZipping Done");
}
else
{
Debug.LogError(fileName + " not found!");
}
}

Next we will need a few Compiler switches and Setup code in TesseractDriver to make every thing work. Note: fileNames list must contain your zip file name (tessdata.tgz) for our case.

private static readonly List<string> fileNames = new List<string> {"tessdata.tgz"};public void Setup(UnityAction onSetupComplete)
{
#if UNITY_EDITOR
OcrSetup(onSetupComplete);
#elif UNITY_ANDROID
CopyAllFilesToPersistentData(fileNames, onSetupComplete);
#else
OcrSetup(onSetupComplete);
#endif
}

public void OcrSetup(UnityAction onSetupComplete)
{
_tesseract = new TesseractWrapper();

#if UNITY_EDITOR
string datapath = Path.Combine(Application.streamingAssetsPath, "tessdata");
#elif UNITY_ANDROID
string datapath = Application.persistentDataPath + "/tessdata/";
#else
string datapath = Path.Combine(Application.streamingAssetsPath, "tessdata");
#endif

if (_tesseract.Init("eng", datapath))
{
Debug.Log("Init Successful");
onSetupComplete?.Invoke();
}
else
{
Debug.LogError(_tesseract.GetErrorMessage());
}
}

And a bit of same in TesseractDemoScript

private Texture2D _texture;private void Start()
{
Texture2D texture = new Texture2D(imageToRecognize.width, imageToRecognize.height, TextureFormat.ARGB32, false);
texture.SetPixels32(imageToRecognize.GetPixels32());
texture.Apply();

_tesseractDriver = new TesseractDriver();
Recoginze(texture);
}

private void Recoginze(Texture2D outputTexture)
{
_texture = outputTexture;
ClearTextDisplay();
AddToTextDisplay(_tesseractDriver.CheckTessVersion());
_tesseractDriver.Setup(OnSetupCompleteRecognize);
}

private void OnSetupCompleteRecognize()
{
AddToTextDisplay(_tesseractDriver.Recognize(_texture));
AddToTextDisplay(_tesseractDriver.GetErrorMessage(), true);
SetImageDisplay();
}

[Commit: bcac764]

What’s Next?

Well deploy and it will work… theoretically speaking… :P

As for improvements… Try adding a camera feed texture to it and have fun.
If you wanna get technical though try out the pre-build script to optimize the build process, deleting the unzipped version before build to save on APK size.

Anyways… Thanks for sticking around for so long… Until next time…

< Part 1 | Part 2 | 3|

--

--

Neelarghya
XRPractices

Stuck between being the fly on the wall and the eye of the storm…