Benchmarking avro and fastavro using pytest-benchmark, tox and matplotlib

Abrar Sheikh
4 min readMay 22, 2018


Ever wonder how you can benchmark a block of Python code to see how fast it runs or how do you choose between multiple libraries that do the same thing but not sure which one is faster or which python interpreter is most performant for your application. Fantastic, you have come to the right place. In this blog, I will introduce you to a set of tools that can help you achieve exactly that.

I work for the distributed systems team at Yelp which is responsible for building a Streaming Data Pipeline systems that caters real-time data to our sales, analytics, and data science teams. Any observations or claims that I make here are my own and by no means represents Yelp Engineering.

I am going to take 2 python implementations of Apache AVRO (avro and fastavro), compare their encoders and decoders against each other in terms of performance on various python interpreters. Finally draw conclusion as to what works best.

Lets begin…

Following code uses the pytest-benchmark, an extension for pytest.

  • ensure that pypy, pypy3, py27, py35 and py36 are installed on your operating system before running tox.
  • tox is responsible for running the pytest once for each python interpreter. This is declared in envlist variable of tox.
  • running pytest with -m “benchmark” only runs benchmark tests.
  • passing — benchmark-json=.benchmark-{basepython} to pytest tells pytest-benchmark to write the result of benchmark into files. Later I will use these files to compare results. For each run of tox, {basepython} evaluates to the version of python interpreter used in that run.

By default pytest-benchmark displays the results on console along with the rest of the pytest test suite. And it looks something like

pytest-benchmark on pypy
pytest-benchmark on pypy3
pytest-benchmark on py27
pytest-benchmark on py35
pytest-benchmark on py36
  • There is a lot of information in those tables but the one that truly captures the pulse of performance is OPS (kops/s), which basically indicates the number of operations performed each second.
  • benchmark.pedantic gives control over rounds, iteration and warmup_rounds.
  • I have categorized my results into 2 logical buckets for ease of interpretation and comparison. This is achieved by the use of @pytest.mark.benchmark(group='...').
  • encoders group lists the benchmark results from fastavro schemaless_writer and avro writer.
  • decoders group lists the benchmark results from fastavro schemaless_reader and avro reader.

This is fine for starters but it gets tedious if we were looks at 5 such groups, one for each python interpreter. Drawing conclusions about which library performs better across python versions is somewhat not obvious here. So I wrote a simple python script that plots the results from .benchmark-{basepython}* files to do this job.

Plotting results from pytest-benchmark. Code inspired from


  • fastavro is multifold faster compared to avro, at-least in all the cpython interpreters.
  • general performance of pypy3 is much better than all other python interpreters.
  • avro is much faster than fastavro on pypy, this is really surprising.

If you are building Python application at scale that does serialization and deserialization with AVRO then fastavro running on pypy3 is most performant.


I want to give a shoutout to Scott Belden and Ryan for their insights with which resulted in me writing this blog.