Installing Tesseract 3.04+ in Ubuntu 14.04

Newest version of Tesseract, that could be installed in Ubuntu 14.04 is 3.03, but some libraries require version not lower than 3.04 (I have encountered such library written on Python — tesserocr ). In this paper I have described how to avoid difficulties with this issue.

To begin working with Tesseract 3.04 you need to install Leptonica 1.71+, but the highest version of Leptonica that you could install in Ubuntu 14.04 is 1.70. To install newer version, you need to compile it manually from sources.

So, here is a plan of actions :

  • Compile Leptonica 1.71+
  • Compile Tesseract 3.04+(over compiled Leptonica)
  • Install desired library that works with Tesseract

1)Install libraries, that are required by Leptonica and Tesseract

sudo apt-get install python-distutils-extra tesseract-ocr tesseract-ocr-eng libopencv-dev libtesseract-dev libleptonica-dev python-all-dev swig libcv-dev python-opencv python-numpy python-setuptools build-essential subversion
sudo apt-get install autoconf automake libtool
sudo apt-get install libpng12-dev libjpeg62-dev libtiff4-dev zlib1g-dev

2)Install libraries required for tesseract training(optional) :

sudo apt-get install libicu-dev libpango1.0-dev libcairo2-dev

3)Download Leptonica (choose preferred version here and modify command) :

wget http://www.leptonica.com/source/leptonica-1.74.1.tar.gz

4)Unpack and build downloaded Leptonica archive :

tar xvf leptonica-1.74.tar.gz
cd leptonica-1.74
./configure
make
sudo make install

5) Install Tesseract over installed Leptonica

git clone https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
./configure --enable-debug
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
sudo make install
sudo ldconfig

6) You can check tesseract version by typing tesseract -v. If all steps were successful, it should look like here :

>tesseract -v
tesseract 4.00.00alpha-241-g6f83ba0
leptonica-1.74.1
libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8
Found AVX
Found SSE

7)Use pip (or other package manager for your programming language) to install required library :

pip install tesserocr

Following resources have been used for writing article :