Offline OCR using Tesseract in Unity… Part 3
Apologies on the super late update… Let’s configure it for Android…
In case you have missed out on setting up of the project and getting it to work on your PC you can follow Part 1 and Part 2 of this guide or clone the repository and checkout the commit
git clone git@github.com:Neelarghya/tesseract-unity.git
git checkout b041356
Assuming we are in sync let’s continue…
12. Android Specifics Setups
12.1 Basic platform switch
Open Build Settings > Add the Main scene > Select Android > Switch Platform.
If you don’t have the Android build platform down loaded or sdk not set up follow this.
12.2 Setting up Player Settings
Select Player Settings, now in theory you should configure a lot of stuff here to better suit your app as per your need, but the only one necessary to build is
Player Settings > Other Settings > Package Name (turn it to lower case and you should be fine…)
[Commit: 8eee647]
With these settings done you can Build for Android!
…Quite an achievement right? If it were that easy I wouldn’t be writing this…
If you actually tried it you would be greeted with these lovelies…
Tesseract version: 4.1.0
TessAPIInit failed. Output: -1
If you are getting this it means the initialization of Tesseract failed. It mostly occurs due to any issue with the TessData. The fact that the Tesseract version displays tells us that the dll/so files were found, it’s a simple sanity check.
But instead if you are greeted with…
System.DllNotFoundException: libtesseract.so
at (wrapper managed-to-native) TesseractWrapper.TessBaseAPICreate()
It’s an issue with the dll/so files. And Unity does a crappy job at telling you what the problem is. It’s not always that the file shown in the log (libtesseract.so in our case) that wasn’t found, but most of the time it’s one of their dependencies. So how do you get to know what are the dependencies of the DLLs you are using..? …From Docs of the provider or using external tools (like DependencyWalker for windows etc). The better approach is still to install whatever you want into a docker of the required platform and extract the library files… (That’s how I got hold of the original dlls)
If you do face this issue (DllnotFound) it might be caused due to the lack of libc++_shared.so, for some it’s pre-configured with Unity’s base dlls and will get packaged during build if any Library requires C++ support, if not you need to add it yourself. (And my deepest apologies to those who went through the previous articles and were faced with this issue… I should have written this sooner… If you still have issues do reach out…)
I’m providing the .so file anyways… [Commit: 74a73c9]
Ok enough rabbit-holing for unfound dlls, back to the original issue of not being able to initialize Tesseract. This occurs because TessData is inside of StreamingAssets which for Android is actually part of APK and not in any file system. But TessData needs to be read/writable for Tesseract to work with, which isn’t the case with StreamingAssets in Android. Thus we need to move it to storage a.k.a Persistent Data.
13. Injecting & Extracting TessData
Building the Zip-UnZip Mechanism
Now in order to move TessData to Persistent Data, we will
1. Zip the TessData folder
2. Copy it to Persistent Data path
3. UnZip it
13.1 Zip the TessData folder
Simply go to the StreamingAssets in your project directory and compress the tessdata folder into you favoured compression, I will be using tgz
Note: the editor will still be using the one from StreamingAssets so don’t delete it yet (yes it causes some redundancy and inflates the APK size).
cd Assets/StreamingAssets
tar -czvf tessdata.tar.gz tessdata
13.2 Copy it to Persistent Data path
Copying the file at runtime is a bit trickier…
In the TesseractDriver let’s add a function (CopyAllFilesToPersistentData) to copy this newly created zip file
private async void CopyAllFilesToPersistentData(List<string> fileNames, UnityAction onSetupComplete)
{
String fromPath = "jar:file://" + Application.dataPath + "!/assets/";
String toPath = Application.persistentDataPath + "/";
foreach (String fileName in fileNames)
{
if (!File.Exists(toPath + fileName))
{
Debug.Log("Copying from " + fromPath + fileName +
" to " + toPath);
WWW www = new WWW(fromPath + fileName);
while (!www.isDone)
{
await Task.Delay(
TimeSpan.FromSeconds(Time.deltaTime));
}
File.WriteAllBytes(toPath + fileName, www.bytes);
Debug.Log("File copy done");
www.Dispose();
www = null;
}
else
{
Debug.Log("File exists! " + toPath + fileName);
}
UnZipData(fileName);
}
OcrSetup(onSetupComplete);
}
It’s a simple copy script that copies all files in the fileNames list from Androids StreamingAssets to PersistentDataPath, using a Unity’s WWW followed by a File.WriteBytes.
You might have noticed it also takes a onSetupComplete parameter, it is essentially a call back once all setup is complete, for out case it’s a recognnize call to the Tesseract.
13.3 UnZip it
In order to Zip/UnZip we can use any library… You are also free to write your own, I will be using icsharpcode/SharpZipLib
Let’s start by adding their dlls into the plugins
Next we will add a class UnZipUtil to take care of the Zipping and UnZipping
using System.IO;
using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;
public class UnZipUtil
{
public static void CreateTarGZ_FromDirectory(string tgzFilename,
string sourceDirectory)
{
Stream outStream = File.Create(tgzFilename);
Stream gzoStream = new GZipOutputStream(outStream);
TarArchive tarArchive =
TarArchive.CreateOutputTarArchive(gzoStream);
// Note that the RootPath is currently case
// sensitive and must be forward slashes e.g. "c:/temp"
// and must not end with a slash, otherwise cuts
// off first char of filename
// This is scheduled for fix in next release
tarArchive.RootPath =
sourceDirectory.Replace('\\', '/');
if (tarArchive.RootPath.EndsWith("/"))
tarArchive.RootPath =
tarArchive.RootPath
.Remove(tarArchive.RootPath.Length - 1);
AddDirectoryFilesToTar(tarArchive, sourceDirectory, true);
tarArchive.Close();
}
public static void AddDirectoryFilesToTar(TarArchive tarArchive,
string sourceDirectory, bool recurse)
{
// Optionally, write an entry for the directory itself.
// Specify false for recursion here
// if we will add the directory's files individually.
TarEntry tarEntry =
TarEntry.CreateEntryFromFile(sourceDirectory);
tarArchive.WriteEntry(tarEntry, false);
// Write each file to the tar.
//
string[] filenames = Directory.GetFiles(sourceDirectory);
foreach (string filename in filenames)
{
tarEntry = TarEntry.CreateEntryFromFile(filename);
tarArchive.WriteEntry(tarEntry, true);
}
if (recurse)
{
string[] directories =
Directory.GetDirectories(sourceDirectory);
foreach (string directory in directories)
AddDirectoryFilesToTar(tarArchive,
directory, recurse);
}
}
public static void ExtractTGZ(string gzArchiveName,
string destFolder)
{
Stream inStream = File.OpenRead(gzArchiveName);
Stream gzipStream = new GZipInputStream(inStream);
TarArchive tarArchive =
TarArchive.CreateInputTarArchive(gzipStream);
tarArchive.ExtractContents(destFolder);
tarArchive.Close();
gzipStream.Close();
inStream.Close();
}
}
There are 3 functions namely: CreateTarGZ_FromDirectory, AddDirectoryFilesToTar and ExtractTGZ.
But for our use case we will only be using ExtractTGZ cause we only need the extraction. Why keep the rest you might feel like automating the build process by zipping as a pre-build script thus removing the manual step to compress the files.
14. Finally time to patch everything together
In the CopyAllFilesToPersistentData you might have noticed a place holder function UnZipData() it’s time to add it to the TesseractDriver
private void UnZipData(string fileName)
{
if (File.Exists(
Application.persistentDataPath + "/" + fileName))
{
UnZipUtil.ExtractTGZ(
Application.persistentDataPath + "/" + fileName,
Application.persistentDataPath);
Debug.Log("UnZipping Done");
}
else
{
Debug.LogError(fileName + " not found!");
}
}
Next we will need a few Compiler switches and Setup code in TesseractDriver to make every thing work. Note: fileNames list must contain your zip file name (tessdata.tgz) for our case.
private static readonly List<string> fileNames = new List<string> {"tessdata.tgz"};public void Setup(UnityAction onSetupComplete)
{
#if UNITY_EDITOR
OcrSetup(onSetupComplete);
#elif UNITY_ANDROID
CopyAllFilesToPersistentData(fileNames, onSetupComplete);
#else
OcrSetup(onSetupComplete);
#endif
}
public void OcrSetup(UnityAction onSetupComplete)
{
_tesseract = new TesseractWrapper();
#if UNITY_EDITOR
string datapath = Path.Combine(Application.streamingAssetsPath, "tessdata");
#elif UNITY_ANDROID
string datapath = Application.persistentDataPath + "/tessdata/";
#else
string datapath = Path.Combine(Application.streamingAssetsPath, "tessdata");
#endif
if (_tesseract.Init("eng", datapath))
{
Debug.Log("Init Successful");
onSetupComplete?.Invoke();
}
else
{
Debug.LogError(_tesseract.GetErrorMessage());
}
}
And a bit of same in TesseractDemoScript
private Texture2D _texture;private void Start()
{
Texture2D texture = new Texture2D(imageToRecognize.width, imageToRecognize.height, TextureFormat.ARGB32, false);
texture.SetPixels32(imageToRecognize.GetPixels32());
texture.Apply();
_tesseractDriver = new TesseractDriver();
Recoginze(texture);
}
private void Recoginze(Texture2D outputTexture)
{
_texture = outputTexture;
ClearTextDisplay();
AddToTextDisplay(_tesseractDriver.CheckTessVersion());
_tesseractDriver.Setup(OnSetupCompleteRecognize);
}
private void OnSetupCompleteRecognize()
{
AddToTextDisplay(_tesseractDriver.Recognize(_texture));
AddToTextDisplay(_tesseractDriver.GetErrorMessage(), true);
SetImageDisplay();
}
[Commit: bcac764]
What’s Next?
Well deploy and it will work… theoretically speaking… :P
As for improvements… Try adding a camera feed texture to it and have fun.
If you wanna get technical though try out the pre-build script to optimize the build process, deleting the unzipped version before build to save on APK size.