Execute python code at the speed of C- Extending Python

Published in

Practo Engineering

7 min readJul 18, 2017

While Python excels as a stand-alone language, it also shines as a glue language, a language that combines or ties together “chunks” of functionality from other languages or third-party libraries.

Any code that we write using any compiled language like C, C++, or Java can be integrated or imported into Python. This code is considered as an “extension.” A Python extension module is nothing more than a normal C library. On Unix machines, these libraries usually end in .so (for shared object). On Windows machines, you typically see .dll (for dynamically linked library).

But why would you want to write an extension module?

The most common reason is to make available to Python programs a third-party library written in some other language. It’s these wrapper modules that enable Python programs to use OpenGL, GUI toolkits such as wxWindows and Qt and compression libraries such as zlib.

Another major benefit of extension modules is that they run at the speed of compiled code, rather than the slower speed of interpreted Python code.

Python enables faster development with its tremendous features but there is no doubt that execution time taken by Python is comparatively much larger as compared to compiled languages. So if you have special performance requirements, you can move CPU-intensive operations into an extension module. The approach I take is to first build my entire application in Python, profile it, and then move performance bottlenecks into C as needed.

Thus we achieve mix of both, a faster development and excellent performance. I will now discuss how you can write an extension module in Python using C and what benefits you can derive with an example.

Before I dive into the details of writing extension module, see the performance benefit that extension will offer.

Lets say we want to do below compute intensive task in python.

a = 100234234b = 22342342c = 341342for  i in range(1, 10000001):    a = ((a * a * b)%c)    b = ((a * b * b)%c)print(a+b)

If you profile this, the code will take 5.558 sec to execute and consume around 300 MB memory.

However, doing same via writing extension module in C will take 0.092 sec to execute and consume around 10 MB memory.

Writing Extension module in C:

benchmark.c

#include <Python.h>#include <stdio.h>static PyObject *foo_bar(PyObject *self, PyObject *args);static PyMethodDef FooMethods[] = {{"calc",  foo_bar, METH_VARARGS},{NULL, NULL}  /* Sentinel */};void initbenchmark(){(void) Py_InitModule("benchmark", FooMethods);}static PyObject *foo_bar(PyObject *self, PyObject *args){if (!PyArg_ParseTuple(args, ""))return NULL;long long a = 100234234;  long long b = 22342342;  long long c = 341342;for(int i = 1; i <= 10000000; i++){a = (a * a * b)%c;b = (a * b * b)%c;}return Py_BuildValue("L", a+b);}

setup.py

from distutils.core import setup, ExtensionMOD = "benchmark"setup(name = MOD, ext_modules = [Extension(MOD,sources=['benchmark.c'])],description = "My C Extension Module")

Now this will execute at the speed of compiled code and we will be able to achieve excellent performance benefits both in terms of execution time and memory consumption.

Lets understand the code for extension now.

First we import the required header files. For writing extensions we must include <Python.h> which pulls in the Python API that we will be consuming to write extension file.

The heart of writing extension modules is explained below as diagram:

There is a straightforward translation from the argument list in Python to the arguments passed to the C function. The C function always has two arguments, conventionally named self and args.

For module functions, the self argument is NULL or a pointer selected while initializing the module (see Py_InitModule()). For a method, it would point to the object instance.

The args argument will be a pointer to a Python tuple object containing the arguments. Each item of the tuple corresponds to an argument in the call’s argument list. The arguments are Python objects — in order to do anything with them in our C function we have to convert them to C values.

The function PyArg_ParseTuple() in the Python API checks the argument types and converts them to C values. It uses a template string to determine the required types of the arguments as well as the types of the C variables into which to store the converted values. More about this will follow later.

PyArg_ParseTuple() returns true (nonzero) if all arguments have the right type and its components have been stored in the variables whose addresses are passed. It returns false (zero) if an invalid argument list was passed. In the latter case it also raises an appropriate exception so the calling function can return NULL immediately.

Our function must return the value as a Python object. This is done using the function Py_BuildValue(), which is something like the inverse of PyArg_ParseTuple(): it takes a format string and an arbitrary number of C values, and returns a new Python object.

The Module’s Method Table and Initialization Function

The method table must be passed to the interpreter in the module’s initialization function. The initialization function must be named initname(), where name is the name of the module, and should be the only non-static item defined in the module file.

When the Python program imports module benchmark for the first time in our sample code, initbenchmark() is called. It calls Py_InitModule(), which creates a “module object” and inserts built-in function objects into the newly created module based upon the table (an array of PyMethodDef structures) that was passed as its second argument. Py_InitModule() returns a pointer to the module object that it creates (which is unused here). It may abort with a fatal error for certain errors, or return NULL if the module could not be initialized satisfactorily.

Each entry in the method table is a struct of type PyMethodDef and has four fields, in order:

The last entry in the array is indicated by a sentinel entry filled with NULLs.

Extracting Parameters in Extension Functions

The PyArg_ParseTuple() function is declared as follows:

int PyArg_ParseTuple(PyObject *arg, char *format, …);

The arg argument must be a tuple object containing an argument list passed from Python to a C function. The format argument must be a format string and then there will be corresponding variables in C storing these passed parameters from Python.

I will give some examples and in details for various types of params, refer:

https://docs.python.org/2/c-api/arg.html#arg-parsing

int ok;

ok = PyArg_ParseTuple(args, “”); /* No arguments */

/* Python call: f() */

const char *s;

ok = PyArg_ParseTuple(args, “s”, &s); /* A string */

/* Possible Python call: f(‘test!’) */

long k, l;

ok = PyArg_ParseTuple(args, “lls”, &k, &l, &s); /* Two longs and a string */

/* Possible Python call: f(1, 2, ‘three’) */

We can similarly use I for unsigned int, i for int, k for unsigned long, f for float, d for double etc…

Returning values to Python

Py_BuildValue(const char *format, …)

Create a new value based on a format string similar to those accepted by the PyArg_ParseTuple() and a sequence of values.

Examples (to the left the call, to the right the resulting Python value):

Compilation and Linkage

There are two more things to do before you can use your new extension: compiling and linking it with the Python system. For this we just need the script in setup.py where we mention the module name in Python and source file of C.

How to build extension ?

python setup.py build

this will generate benchmark.so file or benchmark.dll file as discussed above based on your operating system.

Copy this generated file in your Python path and you will be able to import benchmark as python module anywhere in code.

How can we use this:

import benchmarkc = benchmark.calc()print c

This gives the promised benefits :

If you profile this, the code without using extension will take 5.558 sec to execute and consume around 300 MB memory.

However, doing same via writing extension module in C will take 0.092 sec to execute and consume around 10 MB memory.

Hence now you have an answer to slow execution time in Python for performance bottlenecks. Now code fast and execute it faster.

Follow us on twitter for regular updates. If you liked this article, please hit the ❤ button to recommend it. This will help other Medium users find it.

Execute python code at the speed of C- Extending Python

Written by PAWAN PUNDIR