Self hosted enterprise document server using Mayan EDMS 3.0 and an ODROID HC1

In a previous post I mentioned how a natural disaster lead me to rethink my work tools, in particular servers. One of the things I work the most with are scanned documents. For working with scanned documents I use Mayan EDMS, a free open source document manager I started in 2011. Since then, it has taken a life of its own with hundred of thousand of users around the world and it is now maintained by a group of core developers.

Hurricane Maria, category 5+.

The hardware

I currently have several solar powered systems running Mayan EDMS which host a few million documents in separate repositories. By running these systems using solar power, I can continue working and hosting my clients’ documents even when the power grid experiences serious problems.

The ODROID C2. Quad code, 64-bit ARM SBC.

For the smaller repositories my preferred single board computer is the ODROID C2. However a new months ago a new ODROID board came out with some interesting specification.

The new ODROID HC1. Octo core, 32-bit ARM SBC. Has SATA, Gigabit Ethernet and a big heatsink.

The new board in question is called the HC1 (for “Home Cloud”). The most notable features of this board is that it includes a SATA interface that is internally connected via USB 3.0. The CPU on this board uses the big.LITTLE technology which means there are 4 Cortex-A15 cores running at 2Ghz and 4 Cortex-A7 core for a total of 8 cores in a heterogeneous configuration. This board can re-balance running code by migrating it from the big cores to the smaller ones to save power if the task is simple. The board is also able to move complex tasks from the small cores to the big ones to improve performance. For networking, it comes with a Gigabit Ethernet port. Finally this board uses passive cooling and its heat sink doubles as the case and hard drive mounting plate.

I was very eager to see how Mayan EDMS would perform in such hardware. I got myself an HC1 and after several iterations of installations and testing I arrived at the following procedure.

Operating system

Update: In light of new information it is recommended that users remove or avoid DietPi and instead use Armbian as the operating system distribution for this project.

The first step is to choose an operating system. My favorite distributions are Armbian and DietPi. They are the best distributions for most modern single board computers. For this tutorial I’m going to use DietPi 6.9, which is based on Debian Stretch with Linux kernel 4.14.32.

Building Mayan EDMS

At the time of writing Mayan EDMS 3.0 has not been release yet, so we around going to build the installable package from the latest stable source code branch.

Docker

We are going to use Docker to avoid having to install all the development dependencies at the system level. These dependencies are only going to be needed once. After building an installable Python package for Mayan EDMS, the Docker container will be deleted automatically.

Install Docker with the following commands:

curl -fsSL get.docker.com -o get-docker.sh
sh get-docker.sh

Obtaining the source code

Download the latest revision of the versions/next branch of the source code. This is the branch on which the final release will be based.

cd /tmp
wget https://gitlab.com/mayan-edms/mayan-edms/-/archive/versions/next/mayan-edms-versions-next.zip
cd mayan-edms-versions-next

Another way to obtain the source code is by cloning the repository using Git:

sudo apt-get install git-core -y
cd /tmp
git clone https://gitlab.com/mayan-edms/mayan-edms.git
cd mayan-edms
git checkout versions/next

Building the installable Python package

While still in the directory of the Mayan EDMS source code, execute the build target of the makefile. This will launch a temporary Docker container and will produce a file in the dist folder named mayan_edms-3.0rc1-py2-none-any.whl

make build

Preparing the system

Next, we will install the executable requirements for running Mayan EDMS:

sudo apt-get install g++ gcc ghostscript gnupg1 graphviz libjpeg-dev libmagic1 libpq-dev libpng-dev libreoffice libffi-dev libtiff-dev poppler-utils postgresql python-dev python-pip python-virtualenv redis-server sane-utils supervisor tesseract-ocr zlib1g-dev -y

Link the platform specific graphics libraries to the general libraries folder so that the Python graphics package can find them:

ln -s /usr/lib/arm-linux-gnueabihf/libz.so /usr/lib/
ln -s /usr/lib/arm-linux-gnueabihf/libjpeg.so /usr/lib/

Create an unprivileged user account for the installation. This is to avoid running Mayan EDMS as root:

sudo adduser mayan --disabled-password --disabled-login --no-create-home --gecos “”

Create the parent directory where the project will be installed:

sudo mkdir -p /opt

Create the Python virtual environment for the installation. This will isolate the Python requirements of Mayan EDMS from the rest of the system:

sudo python /usr/lib/python2.7/dist-packages/virtualenv.py /opt/mayan-edms

Make the mayan user the owner of the installation directory:

sudo chown mayan:mayan /opt/mayan-edms -R

Install the Mayan EDMS Python package built in the first step:

sudo -u mayan /opt/mayan-edms/bin/pip install --no-cache-dir /tmp/mayan-edms/dist/mayan*.whl

Install the Python client for PostgreSQL and Redis. Mayan EDMS uses PostgreSQL as its database to store document information and Redis to coordinate with its worker processes via messages:

sudo -u mayan /opt/mayan-edms/bin/pip install --no-cache-dir psycopg2==2.7.3.2 redis==2.10.6

Create the PostgreSQL user for the installation. The user will be named “mayan” for the sake of ease. Change “mayanuserpass” to anything you want but make sure you remember it as it will be needed in a future step:

sudo -u postgres psql -c “CREATE USER mayan WITH password ‘mayanuserpass’;”

Create the PostgreSQL database. The “mayan” user will be the owner of the new database named “mayan”:

sudo -u postgres createdb -O mayan mayan

Initialize the Mayan EDMS installation. Update the value of the variable MAYAN_DATABASE_PASSWORD to be the same as the password of the PostgreSQL user created. This step will create the necessary database tables, will download the Javascript libraries and create an initial administrator user:

sudo -u mayan MAYAN_DATABASE_ENGINE=django.db.backends.postgresql \
MAYAN_DATABASE_NAME=mayan \
MAYAN_DATABASE_PASSWORD=mayanuserpass MAYAN_DATABASE_USER=mayan \
MAYAN_DATABASE_HOST=127.0.0.1 \
MAYAN_MEDIA_ROOT=/opt/mayan-edms/media \
/opt/mayan-edms/bin/mayan-edms.py initialsetup

Collect the static files. This step takes the images, templates, CSS and Javascript files and moves them to a location where they can be served to the browser. The files are also compressed and combined:

sudo -u mayan MAYAN_MEDIA_ROOT=/opt/mayan-edms/media \
/opt/mayan-edms/bin/mayan-edms.py collectstatic --noinput

Create the supervisor file at /etc/supervisor/conf.d/mayan.conf. The Supervisor program will launch and monitor the processes required to execute Mayan EDMS. Once again replace the value of the variable MAYAN_DATABASE_PASSWORD with the password of the PostgreSQL user created before:

[supervisord]
environment=
MAYAN_ALLOWED_HOSTS=”*”, # Allow access to other network hosts other than localhost
MAYAN_CELERY_RESULT_BACKEND=”redis://127.0.0.1:6379/0",
MAYAN_BROKER_URL=”redis://127.0.0.1:6379/0",
PYTHONPATH=/opt/mayan-edms/lib/python2.7/site-packages:/opt/mayan-edms/data,
MAYAN_MEDIA_ROOT=/opt/mayan-edms/media,
MAYAN_DATABASE_ENGINE=django.db.backends.postgresql,
MAYAN_DATABASE_HOST=127.0.0.1,
MAYAN_DATABASE_NAME=mayan,
MAYAN_DATABASE_PASSWORD=mayanuserpass,
MAYAN_DATABASE_USER=mayan,
MAYAN_DATABASE_CONN_MAX_AGE=60,
DJANGO_SETTINGS_MODULE=mayan.settings.production
[program:mayan-gunicorn]
autorestart = true
autostart = true
command = /opt/mayan-edms/bin/gunicorn -w 2 mayan.wsgi --max-requests 500 --max-requests-jitter 50 --worker-class gevent --bind 0.0.0.0:8000
user = mayan
[program:mayan-worker-fast]
autorestart = true
autostart = true
command = nice -n 1 /opt/mayan-edms/bin/mayan-edms.py celery worker -Ofair -l ERROR -Q converter -n mayan-worker-fast.%%h --concurrency=1
killasgroup = true
numprocs = 1
priority = 998
startsecs = 10
stopwaitsecs = 1
user = mayan
[program:mayan-worker-medium]
autorestart = true
autostart = true
command = nice -n 18 /opt/mayan-edms/bin/mayan-edms.py celery worker -Ofair -l ERROR -Q checkouts_periodic,documents_periodic,indexing,metadata,sources,sources_periodic,uploads,documents -n mayan-worker-medium.%%h --concurrency=1
killasgroup = true
numprocs = 1
priority = 998
startsecs = 10
stopwaitsecs = 1
user = mayan
[program:mayan-worker-slow]
autorestart = true
autostart = true
command = nice -n 19 /opt/mayan-edms/bin/mayan-edms.py celery worker -Ofair -l ERROR -Q mailing,tools,statistics,parsing,ocr -n mayan-worker-slow.%%h --concurrency=1
killasgroup = true
numprocs = 1
priority = 998
startsecs = 10
stopwaitsecs = 1
user = mayan
[program:mayan-celery-beat]
autorestart = true
autostart = true
command = nice -n 1 /opt/mayan-edms/bin/mayan-edms.py celery beat --pidfile= -l ERROR
killasgroup = true
numprocs = 1
priority = 998
startsecs = 10
stopwaitsecs = 1
user = mayan

Restart the Supervisor service and wait for about a minute or two while the frontend process spins up:

systemctl restart supervisor.service

Launch and point your web browser to the port 8000 of the IP address of the ODROID HC1 in your network. You should see a screen like this:

Default login screen after installation.
The “Recent Documents” view. The last 40 documents to have been access will be shown here.
The main document view. On the sidebar each app adds its own tab. Each tab provides additional functionality and features.
OCR is performed automatically. Each document is automatically indexed and searchable.
Document pages view.
Interactive document page view, including zoom and rotation controls.

Conclusion

Performance: When it comes to Mayan EDMS, the 8 cores of the HC1 perform a bit slower than the 4 cores of the C2. The C2 has a 64-bit CPU while the HC1 has a 32-bit CPU. The idea workload for the HC1 are tasks composed of many small processes. If on the contrary your tasks are made up of big processes requiring a lot of computation the C2 is the better choice.

Networking: Uploads and downloads benefit from the Gigabit ethernet. Sending files to Mayan EDMS for processing was much faster than on the C2. However sending and receiving files is not the most frequent task in Mayan EDMS. The most common task is accessing web views and here, a Gigabit ethernet didn't had much to offer.

Storage: The HC1’s SATA interface is the fastest I’ve seen on an SBC. I have other board with a SATA interface like the Banana Pro and the performance difference is several others in magnitude. But compared to an eMMC module running on a C2, the SATA interface’s throughput falls behind even using an SSD hard drive. eMMC modules are expensive though and limited in size. The biggest eMMC module I own is a 16GB one. eMMC modules are best used for system files and not data files. Here while slower than the C2, the HC1 SATA port is the right call.

Build quality: No complains at all. Like its predecessors the HC1 shows the markings of a professional manufacturer using the latest SMD technologies. Rock solid stability, everything is documented, no features needing an extra license like other boards, schematics and bill of materials available online. It is a pure joy to work with this board.

Overall: Mayan EDMS specific requirements don’t make the best use of the HC1’s capabilities. Mayan EDMS needs fast access to files but more than that need fast access to its database and computation power for converting images and performing OCR. Mayan EDMS benefits the most from a strong CPU and abundant memory. The HC1 has the same amount of RAM as the C2 but a slower CPU. This is not a shortcoming of the HC1, it was made with fileserving tasks in mind. It was not built to be an app server. HC1 is optimized to shuffle many files between the network and the hard drive as fast as possible. The one feature that will make me keep Mayan EDMS running on the HC1 is the SATA interface. The is a big price gap between eMMC modules and SATA SSD drives. I rather have the ability to use larger storage than have to continue aggregating eMMC modules in a cluster configuration. Perhaps a better use for the HC1 would be to serve as a NAS for the C2 cluster running Mayan EDMS. The HC1 has only one SATA port which make RAID configurations almost impossible, but, it is stackable. This lends to a natural cluster configuration of HC1s using a distributed filesystem backed by SATA drives instead of eMMC modules, and that is a very interesting prospect.

Once again Hardkernel has done a great job of producing top notch quality hardware. Hardkernel produces hardware with unique capabilities that don’t pretend to be the be-all and end-all of the single board marker. Instead the hardware is made to compliment their existing lineup of products. And that is something no other consumer SBC manufacturer is doing.