Speedup your existing Python project with Cython +30x

Isaac Yimgaing
Analytics Vidhya
Published in
7 min readOct 20, 2021

Do you love Python? Me too. But let’s be honest. Although Python is the best programming language since 2 years, his speed performance with just ridiculous.

I hear your voice telling me: We have many tools to speedup our code: dusk,dask, swifter, pyparallelize (the one I have created. more here); Yes you are right. But, even if we use all those tools, the performance of our code still be poor in comparison to other language.

speed program comparaison (source: https://www.researchgate.net/)

Why? Python is interpreted language and not compile language. I won’t go deep into detail of that difference (you can learn more on that here).

But what if I tell that we can achieve the same speed performance of the best language like C with Python. Are you exited? So, let’s jump into it and introduce to you Cython, the speedup wrapper of Python.

What is Cython

As we see before, Python as poor performance in term of speedup and that can increase exponentially with the size of the data to process. This is where Cython comes in.

Cython is a wrapper of modification of Python that adds C/C++ data types. It extends Python capabilities in other to address many new features like: speed execution, access variable in memory, pointers… To achieve that, Cython add the possibility to type variable; then, Cython compiling Python code into a bytecode representation, and executing the result in the virtual machine at runtime.

Why need Cython

Cython is usually implemented when we want to speed up our code. But it can do much more than that like:

  • It can be used to access to data in RAM in other to free space. This is very helpful and Python don’t permit that
  • It can be used to translate C/C++ to python because it’s a bidirectional wrapper C/C++ to Python and Python to C/C++

Are you already convinced? Let us jump to the next part and see your it work

How it work

In other to achieve all does tasks listed before, Cython need to convert our Python code to C/C++ bytecode representation. We will have 3 steps to achieve this:

  • Step 1: write our python function with our file extension .py
  • Step 2: write a Cython version of this function in a file extension .pyx
  • Step 3: Create the setup file .py and point to the .pyx file
  • Step 4: Go to the setup.py directory and run the setup. This will generate C/C++ bytecode and other files

> python setup.py

  • Step 5: Import the new version of our function and use it on our project

And that’s it!

Notice that Cython offer different type of function (cdef, cpdef), but we will see how to use it in the next section.

Simple exemple with Cython

Assume that your data scientist job is to process +100 Go text data and the process takes several days to finish. The project is organize as follow:

  1. Install Packages

We need first to install Cython from here. You will need to install setuptools if not yet install

> pip install Cython==0.29.23
> pip install setuptools

Notice that Cython has a new release in beta and you can try it if you want.

2. Create Cython version of the code

We will use sample data from this kaggle. For the base line, we have construct the processing function and get the time running.

Now assume that we just want to Cythonize one function (preprocessing). For that, we will create a new folder (not obligatory) “/optim” and store all our Cython files on it.

We will copy the .py file with the function and rename the extension to .pyx into /optim. Once it is done, we will create the setup.py file.

After that, we can run the setup.py and use the preporcessing function from “/optim” in our code. this running will generate new files.

To generate the files, use this command: python setup.py build_ext — inplace

Notice: It’s possible to get a generate errors when we try to run setup.py . this error can appear when the __init__.py file in /optim exist. So, it is recommended to run setup.py and then create __init__.py file.

Finaly, we can run the main.py using this new cythonize preprocessing function.

Oh wait! Just a minute. You don’t have to customise the .pyx with variable type before?

Good question. Notice that you don’t necessarily have to customize the function to gain performance. Just using Cython on existing python function without any change make much more difference as you can see in the results below.

Results using Python
Results using Cython

As you can see, we have obtained 10 seconds speedup just using Cython without any change. Sound great right?

Let us now outperform this by optimizing .pyx function.

Cython optimisation

Before going deep inside the real Cython opyimization steps, just look at the preprocessor.html file generate when making the setup.

This file helps us the visualize our code compile level. More the yellow is, more the code is run use python, and so, more it is slow. Our main goal is to better optimize it to be compile in C.

To optimise a Cython function, we can play at 2 levels:

  1. Level 1: function

The basic Python function key word is “def”. In Cython we have 2 function key words:

a. cdef: this function is a C/C++ version. So:

  • the function variables and parameters must be C/C++ (variables type for example).
  • This function can not be called from a native Python file .py (like from… import…)
  • It is more use as sub function, ie to optimize a part of a function.

b. cpdef: This function is a hybrid mixture of C/C++ and Python.

  • Variable parameters have to be understand boot by C/C++ and Python. But it is also possible to use not C/C++ variable.
  • The function can be called from any python file .py

2. Level 2: Variables

Cython allow us to use all python types (depending on the function case). But we need to know to C/C++ x Python type equivalence bellow.

more here

Now we have a good understanding of Cython, let us implement the final solution and see the result we obtain.

Final integration of Cython on existing project

For this last optimization, we will modify the preprocessor.pyx file by setting varaibles types and convert native python function type to C type.

here is the result:

I change the text editor to much more show the modifications.

We then need to rebuild all setup files by using the setup build command (in case of error, delete first the __init__ file on optim and create it after building end) and we can quickly look the results by openning the reprocessor.html file.

As you can see, we have less yellow lines than before. It confirm the fact that the code is more C than native Python. Let’s now find the time taking by the new version. We can also notice that we still have yellow line, so still have pure Python code. We can continue to better transform this code.

As we can see, the speed of the process clairly increase, means that the new Cython code version is much more better.

We can continue to try to make the code more better in other the gain more speed.

CONCLUSION

our job was to show how we can transform our native Python script to gain much more speed. Cython is one way to achiev this task. We can find other similar Cython distributions. It is also possible to directly write the code in natice C programming, and import it as a Python module.

Ps: you can find the project here!

STOP DOING DATA WAITING, START CODING!

--

--

Isaac Yimgaing
Analytics Vidhya

Passionate about data and its applications, I support business teams in building intelligent use cases. Let’s connect on /in/isaac-yimgaing