Optimize Python with Cython

Eva(Tzuni) Hsieh
5 min readJan 16, 2019

--

There are may ways to make your Python code faster, using Cython is one of them. If you need to optimize your Python code, especially for handling big data, this might be a good solution for you.

Before we started, please install Cython with pip command:

pip3.6 install Cython

What Is Cython?

Cython is an optimizing static which makes writing C extensions for Python as easy as writing Python itself. If you just want to optimize certain functions in your package, you can simply use Cython to rewrite those functions then magically import them just like normal Python functions.

The Cython code compilation happens in two stages:

There are two ways of compiling by using command lines: cython and cythonize.

cython, compiles to C/C++ files

Takes a .py or .pyx file and compiles it into a C/C++ file.

cython test.pyx

cythonize, compiles to C/C++ files and create Python importable modules

Takes a .py or .pyx file and compiles it into a C/C++ file, then puts *.so* to the source file for direct import. This will create Python importable extension modules.

cythonize -a -i test.pyx # `-a` - annotate; `-i` - in place

Create The First Cython Code

For this example, I will put all .pyx code under /demopkg/cython folder like this:

[demo]
|
|-- /demopkg
| |
| |-- /cython
| | |
| | |-- utils.pyx # my first cython code
| | |-- __init__.py
| |
| |-- utils.py
| |-- __init__.py
|
|-- setup.py

Let’s put some Cython code in demopkg/cython/utils.pyx

# distutils: language=c++
# cython: language_level=3
def chello(name):
return 'Hello, %s. :P' % name

Don’t forget to specify the language_level for Cython compiling. If you don’t specify your language_level, it will by default use Python 2.

Compile Cython Code

There are 3 ways to compile your Cython code and make it importable just like a normal Python library.

1. setuptools and pip

import setuptools
from Cython.Build import cythonize
if __name__ == '__main__':
setuptools.setup(
ext_modules=cythonize('demopkg/cython/*.pyx', annotate=True),
package_data={
'demopkg': [
'demopkg/cython/*'
]
}
)

Don’t forget to add package_data to make sure you have included all *.pyx files into your package.

Install your package with -e(editable) argument for local development:

cd demo/
pip3.6 install -e .

Test with your Interactive Python Console and you should be able to see something like this:

>>> from demopkg.cython.utils import chello
>>> print(chello('Castiel'))
Hello, Castiel. :P

2. build_ext manually

Once you compiled your Cython code via pip install command, you need to re-compile them again after every single change. Instead of doing pip install to install everything over and over again, you can simply use this command line to compile your Cython code to c.

This is mostly for debugging and experimentation.

python3.6 setup.py build_ext --inplace

3. Use pyximport for Development

Even with build_ext, it is still pretty annoyed to compile code every time you make code change during development.

Once you used pip3.6 install -e ., make sure you deleted all the .so and .cpp files then using pyximport to import your Cython libraries like this:

>>> import pyximport; pyximport.install()
>>> from demopkg.cython.utils import chello
>>> print(chello('Castiel'))
Hello, Castiel. :P

You can now make any code change you want for chello function, and you can see the change immediately without compiling.

The pyximport module also has experimental compilation support for normal Python modules. This allows you to automatically run Cython on every .pyx and .py module that Python imports, including the standard library and installed packages. Cython will still fail to compile a lot of Python modules, in which case the import mechanism will fall back to loading the Python source modules instead.

import pyximport; pyximport.install(pyimport=True)
from demopkg.cython.utils import chello

Performance

How fast it can be?

Well, it depends, but it does has significant differences in some cases. Here is my testing example:

Cython code: demopkg/cython/demo.pyx

# distutils: language=c++
# cython: language_level=3
def cloop(count=50):
res = []
for n in range(0, count):
res.append('Loop %d' % n)
return '\n'.join(res)
def ccount(limit):
result = 0
for a in range(1, limit + 1):
for b in range(a + 1, limit + 1):
for c in range(b + 1, limit + 1):
if c * c > a * a + b * b:
break
if c * c == (a * a + b * b):
result += 1
return result

Python code: demopkg/demo.py

def pyloop(count=50):
res = []
for n in range(0, count):
res.append('Loop %d' % n)
return '\n'.join(res)
def pycount(limit):
result = 0
for a in range(1, limit + 1):
for b in range(a + 1, limit + 1):
for c in range(b + 1, limit + 1):
if c * c > a * a + b * b:
break
if c * c == (a * a + b * b):
result += 1
return result

Notice that I used exactly the same Python syntax for both Cython and Python examples, and here is the result I got after running the time measurement:

import timeitprint(timeit.timeit("cloop(50)", setup="from demopkg.cython.demo import cloop"))
# 9.98754950700095
print(timeit.timeit("pyloop(50)", setup="from demopkg.demo import pyloop"))
# 15.239155852003023
print(timeit.timeit("ccount(10)", setup="from demopkg.cython.demo import ccount"))
# 20.738424556999234
print(timeit.timeit("pycount(10)", setup="from demopkg.demo import pycount"))
# 34.80161868201685

Alright, Cython version obviously seems faster, but not that fast.

Let’s make some change to make it better. I just use cdef to declare those variables in ccount() function.

def ccount(limit):
cdef int result = 0
cdef int a = 0
cdef int b = 0
cdef int c = 0
for a in range(1, limit + 1):
for b in range(a + 1, limit + 1):
for c in range(b + 1, limit + 1):
if c * c > a * a + b * b:
break
if c * c == (a * a + b * b):
result += 1
return result

This time, the performance result looks REALLY different.

import timeitprint(timeit.timeit("cloop(50)", setup="from demopkg.cython.demo import cloop"))
# 9.938087677990552
print(timeit.timeit("pyloop(50)", setup="from demopkg.demo import pyloop"))
# 15.104133175016614
print(timeit.timeit("ccount(10)", setup="from demopkg.cython.demo import ccount"))
# 0.4274253460171167
print(timeit.timeit("pycount(10)", setup="from demopkg.demo import pycount"))
# 35.2860820889764

With pure Python syntax, Cython version was only x1.67 faster than Python code; with cdef static types, it was x82.55 faster than Python code.

Why?

In ccount(), because variables a, b, c, and result are involved in arithmetic within the for-loop, typing those variables can make significant difference in the performance. If your variables are not involved in for-loops, they might make less of a difference.

You can check Musings on Cython — Cython def, cdef and cpdef functions 0.1.0 documentation to see the benchmark result for def, cdef, and cpdef and Faster code via static typing — Cython.

Related Articles

--

--

Eva(Tzuni) Hsieh

Software Engineer. Love Python and Go; JavaScript/Node.js sometimes.