Performance gain by writing a C extension in python

Interpreted language will never match the performance of compiled languages . Ever since I moved on to python from C/C++ , I always wanted to combine best of both worlds by extending python in C .

To gauge performance benefits i tried coding same algorithm (trivial sort ) in python , C & Cython . followed by running different versions of same algorithm with same input size

Lets start by writing a simple C code

void swap(int * a, int * b) {
int tmp = * a; * a = * b; * b = tmp;
}
void sort(int * array, int len) {
for (int i = 0; i < len; i++) {
for (int j = i; j < len; j++) {
if (array[i] > array[j]) {
swap( & array[i], & array[j]);
}
}
}
}

We can build this and create a .so by

gcc -fPIC -c sort.c 
gcc -shared -o libsort.so sort.o

This should create libsort.so in your PWD

Now lets write same algorithm n C[p]ython and save this file as cysort.pyx

def cy_sort(iarray,length):
for i in range(length):
for j in range(i,length):
if iarray[i]>iarray[j]:
tmp=iarray[i]
iarray[i]=iarray[j]
iarray[j]=tmp

Write setup.py as

from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules=cythonize("cysort.pyx"),
)

Build cython extension

python setup.py build_ext --inplace

This should create cysort.c and cysort.so in your PWD

Now we are done with cython and c part , now lets build Python part .

from ctypes import *
import random
import sys
from cysort import cy_sort
def convert_list_to_array(lll):
intarraytype=c_int*len(lll)
intarray=intarraytype()
index=0
for l in lll:
intarray[int(index)]=int(l)
index=index+1
return intarray
def print_c_array(iarray,length):
for i in range(0,length):
print iarray[i]
def py_sort(iarray,length):
for i in range(length):
for j in range(i,length):
if iarray[i]>iarray[j]:
tmp=iarray[i]
iarray[i]=iarray[j]
iarray[j]=tmp
if len(sys.argv)!=3 :
print "Incorrect number of arguments Arg1=sample size (< 10000000) ,Arg2=C/Python/Cython"
sample_size=int(sys.argv[1])
func_call=str(sys.argv[2])
lr= random.sample(xrange(10000000),sample_size)
iarray = convert_list_to_array(lr)
if func_call.upper()=="C":
sort_lib = cdll.LoadLibrary("./libsort.so")
csort=sort_lib.sort
csort.argtypes=[c_void_p,c_int]
csort(iarray,len(lr))
elif func_call.upper()=="CYTHON":
cy_sort(iarray,len(lr))
else:
py_sort(iarray,len(lr))

Now its time to test code and profile same algorithm in python,C & Cython

python -m profile pysort.py 1000 python
python -m profile pysort.py 1000 c
python -m profile pysort.py 1000 cython

Y axis is response time in seconds and X is sample input size

As we can clearly see ctypes clearly outperforms both C & cython . running code with cython improves performance by 35–40% . But c/ctypes is 33 times faster than pure python !!

So choice is simple if you have only performance in mind , write a code in C and hook it with your python code using ctypes . this however comes with couple of caveats

  1. should be proficient in C/C++
  2. being C ,its inherently not portable . You need to write/build as many versions of C code as number platforms you wish to support

We should probably go for ctypes extension only when we have a small CPU bound code which is taking more than half of total processing time & we have ran out of other optimizations . As a last resort this small piece of code can be put out in C .

If you are not familiar with C or don’t want to have C dependency in your ecosystem , cython is more suitable . most of the python code that you have can be put it out in cython by adding simple build step in between . Again i would recommend profiling your code and putting only small CPU bound pieces to cython .

Like what you read? Give abhijeet gorhe a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.