Optimize Python with Cython
There are may ways to make your Python code faster, using Cython is one of them. If you need to optimize your Python code, especially for handling big data, this might be a good solution for you.
Before we started, please install Cython with pip
command:
pip3.6 install Cython
What Is Cython?
Cython is an optimizing static which makes writing C extensions for Python as easy as writing Python itself. If you just want to optimize certain functions in your package, you can simply use Cython to rewrite those functions then magically import them just like normal Python functions.
The Cython code compilation happens in two stages:
There are two ways of compiling by using command lines: cython
and cythonize
.
cython
, compiles to C/C++ files
Takes a .py
or .pyx
file and compiles it into a C/C++ file.
cython test.pyx
cythonize
, compiles to C/C++ files and create Python importable modules
Takes a .py
or .pyx
file and compiles it into a C/C++ file, then puts *.so*
to the source file for direct import. This will create Python importable extension modules.
cythonize -a -i test.pyx # `-a` - annotate; `-i` - in place
Create The First Cython Code
For this example, I will put all .pyx
code under /demopkg/cython
folder like this:
[demo]
|
|-- /demopkg
| |
| |-- /cython
| | |
| | |-- utils.pyx # my first cython code
| | |-- __init__.py
| |
| |-- utils.py
| |-- __init__.py
|
|-- setup.py
Let’s put some Cython code in demopkg/cython/utils.pyx
# distutils: language=c++
# cython: language_level=3def chello(name):
return 'Hello, %s. :P' % name
Don’t forget to specify the language_level
for Cython compiling. If you don’t specify your language_level
, it will by default use Python 2.
Compile Cython Code
There are 3 ways to compile your Cython code and make it importable just like a normal Python library.
1. setuptools
and pip
import setuptools
from Cython.Build import cythonizeif __name__ == '__main__':
setuptools.setup(
ext_modules=cythonize('demopkg/cython/*.pyx', annotate=True),
package_data={
'demopkg': [
'demopkg/cython/*'
]
}
)
Don’t forget to add package_data
to make sure you have included all *.pyx
files into your package.
Install your package with -e
(editable) argument for local development:
cd demo/
pip3.6 install -e .
Test with your Interactive Python Console and you should be able to see something like this:
>>> from demopkg.cython.utils import chello
>>> print(chello('Castiel'))
Hello, Castiel. :P
2. build_ext
manually
Once you compiled your Cython code via pip install
command, you need to re-compile them again after every single change. Instead of doing pip install
to install everything over and over again, you can simply use this command line to compile your Cython code to c.
This is mostly for debugging and experimentation.
python3.6 setup.py build_ext --inplace
3. Use pyximport
for Development
Even with build_ext
, it is still pretty annoyed to compile code every time you make code change during development.
Once you used pip3.6 install -e .
, make sure you deleted all the .so
and .cpp
files then using pyximport
to import your Cython libraries like this:
>>> import pyximport; pyximport.install()
>>> from demopkg.cython.utils import chello
>>> print(chello('Castiel'))
Hello, Castiel. :P
You can now make any code change you want for chello
function, and you can see the change immediately without compiling.
The pyximport module also has experimental compilation support for normal Python modules. This allows you to automatically run Cython on every .pyx
and .py
module that Python imports, including the standard library and installed packages. Cython will still fail to compile a lot of Python modules, in which case the import mechanism will fall back to loading the Python source modules instead.
import pyximport; pyximport.install(pyimport=True)
from demopkg.cython.utils import chello
Performance
How fast it can be?
Well, it depends, but it does has significant differences in some cases. Here is my testing example:
Cython code: demopkg/cython/demo.pyx
# distutils: language=c++
# cython: language_level=3
def cloop(count=50):
res = []
for n in range(0, count):
res.append('Loop %d' % n)
return '\n'.join(res)
def ccount(limit):
result = 0
for a in range(1, limit + 1):
for b in range(a + 1, limit + 1):
for c in range(b + 1, limit + 1):
if c * c > a * a + b * b:
break if c * c == (a * a + b * b):
result += 1
return result
Python code: demopkg/demo.py
def pyloop(count=50):
res = []
for n in range(0, count):
res.append('Loop %d' % n)
return '\n'.join(res)
def pycount(limit):
result = 0
for a in range(1, limit + 1):
for b in range(a + 1, limit + 1):
for c in range(b + 1, limit + 1):
if c * c > a * a + b * b:
break if c * c == (a * a + b * b):
result += 1
return result
Notice that I used exactly the same Python syntax for both Cython and Python examples, and here is the result I got after running the time measurement:
import timeitprint(timeit.timeit("cloop(50)", setup="from demopkg.cython.demo import cloop"))
# 9.98754950700095
print(timeit.timeit("pyloop(50)", setup="from demopkg.demo import pyloop"))
# 15.239155852003023print(timeit.timeit("ccount(10)", setup="from demopkg.cython.demo import ccount"))
# 20.738424556999234
print(timeit.timeit("pycount(10)", setup="from demopkg.demo import pycount"))
# 34.80161868201685
Alright, Cython version obviously seems faster, but not that fast.
Let’s make some change to make it better. I just use cdef
to declare those variables in ccount()
function.
def ccount(limit):
cdef int result = 0
cdef int a = 0
cdef int b = 0
cdef int c = 0 for a in range(1, limit + 1):
for b in range(a + 1, limit + 1):
for c in range(b + 1, limit + 1):
if c * c > a * a + b * b:
break if c * c == (a * a + b * b):
result += 1
return result
This time, the performance result looks REALLY different.
import timeitprint(timeit.timeit("cloop(50)", setup="from demopkg.cython.demo import cloop"))
# 9.938087677990552
print(timeit.timeit("pyloop(50)", setup="from demopkg.demo import pyloop"))
# 15.104133175016614print(timeit.timeit("ccount(10)", setup="from demopkg.cython.demo import ccount"))
# 0.4274253460171167
print(timeit.timeit("pycount(10)", setup="from demopkg.demo import pycount"))
# 35.2860820889764
With pure Python syntax, Cython version was only x1.67 faster than Python code; with cdef
static types, it was x82.55 faster than Python code.
Why?
In ccount()
, because variables a
, b
, c
, and result
are involved in arithmetic within the for-loop, typing those variables can make significant difference in the performance. If your variables are not involved in for-loops, they might make less of a difference.
You can check Musings on Cython — Cython def, cdef and cpdef functions 0.1.0 documentation to see the benchmark result for def
, cdef
, and cpdef
and Faster code via static typing — Cython.