My God, It’s Full of Stars (2/7) — Accessing C Structures in Python Using Ctypes
The first task in my project to build a graphical user interface in Python for some legacy C code was to get the two codes talking to one another. The C code was written by astronomers (mostly myself) and was built primarily for speed. It makes extensive use of structures and pointers to make accessing memory more efficient than the equivalent object-oriented version, and way more efficient than the Python equivalent would be. I wanted to take advantage of the speed of C while still using mostly Python syntax. Enter the “ctypes” library.
First let’s tackle the C code:
1) Create a structure. Below is a much simplified version of my cluster structure for illustration purposes:
struct cluster {
int nStars;
double M_wd_up;
double parameter[5];
double mean[5];
};
2) Write a function that allocates memory for an instance of the structure and returns a pointer to that instance’s memory address:
struct cluster * allocCluster(int nStars){
struct cluster *pCluster = (struct cluster *) calloc(sizeof(struct cluster));
pCluster->nStars = nStars;
return pCluster;
}
3) Compile the C code into a shared library (“.so”). The exact commands depend on the system and compiler. Generally speaking, it will look something like:
gcc -c -Wall -Werror -fpic simCluster.c
gcc -shared -o simCluster.so simCluster.o
Now we switch over to Python.
4) On the command line, install ctypes:
pip install ctypes
5) In Python, import ctypes:
import ctypes
6) Tell the program where to find the shared C library that we created in step 3.
c = ctypes.CDLL("../c/simCluster.so")
7) Declare a class based on “ctypes.Structure” and set the “_fields_” attribute to a list that matches the structure members as declared in C:
class cluster(ctypes.Structure):
_fields_ = [('nStars',ctypes.c_int),
('M_wd_up',ctypes.c_double),
('parameter',ctypes.c_double * 5),
('mean',ctypes.c_double * 5)]
The order and the declared types here are super important. One of the reasons for the speed of C code is the efficient use of memory. When you declare an instance of a structure, it allocates a single block of memory for that structure based on the types and number of variables declared in the structure. When you then need to set or recall some particular member of that structure, C can go to the memory address of the start of that structure, count how many memory addresses forward it needs to go, and then grab whatever is in memory there. If the declarations in C and Python do not match, the code will grab (or overwrite!) whatever happens to be at that address without checking to see if that’s indeed what you are looking for.
8) Declare a new type for a pointer to a cluster:
clusterPtr = ctypes.POINTER(cluster)
9) Declare the argument and result types of the C function we wrote in step 2:
c.allocCluster.argtypes = [ctypes.c_int]
c.allocCluster.restype = clusterPtr
10) Call the C function from step 4 to create a pointer to the structure. In this case, we are initializing our cluster to have 100 stars in it:
pCluster = c.allocCluster(100)
You can now access the attributes of that structure the same way you would any other Python class. For instance, to get the M_wd_up:
x = pCluster.contents.M_wd_up
To set the third item in the “parameter” array:
pCluster.contents.parameter[2] = 10.0
We need “.contents” because “pCluster” is a pointer to a cluster structure, not the structure itself. This pointer can now be treated in our Python code just like any other Python object. Meanwhile, it exists in memory as a C structure, meaning that we can also access it directly from our C code. For instance, the workhorse function of my legacy C code is:
void evolve(struct cluster *pCluster) {
//Evolve the stars in the cluster
}
In my actual C code, the cluster structure contains many more attributes than those listed in the simplified example above, including things like which models to use as well as an array of “star” structures that in turn contain info about each individual star. The evolve method takes a pointer to a cluster and, entirely in C, uses all of that information to determine the color and brightness of each individual star, then store those values in that star’s structure. Since all of this is done entirely in C, it takes full advantage of C’s efficient memory usage and speed.
And yet, because the function is compiled as part of the shared library object that I created in step 3 and imported into Python in step 6, I can call this function directly in Python if I want:
c.evolve(pCluster)
I can now access any of the attributes in the cluster structure in Python as well. In fact, with the help of an addition Python function I wrote called “clusterToDataFrame”, I can grab all of the relevant data from the cluster structure (and its array of star structures) and put them into a DataFrame, which I can then use in Python:
st.c.evolve(pCluster, -1)
result = st.clusterToDataFrame(pCluster)
The speed and efficiency of C. The convenience of Python. It truly is the best of both worlds.
P.S. The actual structures for clusters, stars, and models look like this:
struct cluster {
int nStars;
double M_wd_up;
double parameter[NPARAMS];
double stepSize[NPARAMS];
double mean[NPARAMS];
double priorVar[NPARAMS];
double priorMean[NPARAMS];
double betamabs;
double betaFabs;
double betaFY;
double betaAgeMod[3];
double AGBt_zmass;
double varScale;
struct model evoModels;
struct star *stars;
};
struct star{
int id;
double obsPhot[FILTS];
double photometry[FILTS];
double variance[FILTS];
double useFilt[FILTS];
double U;
double massRatio;
int status[2];
int wdType[2];
int isFieldStar;
int useDuringBurnIn;
double clustStarPriorDens;
double clustStarProposalDens;
double beta[NPARAMS][2];
double betaMassRatio[2];
double meanU;
double varU;
double meanMassRatio;
double varMassRatio;
double UStepSize;
double massRatioStepSize;
int boundsFlag;
double wdLogTeff[2];
double massNow[2];
};
struct model{
int evoModel;
int brownDwarfEvol;
int mainSequenceEvol;
int IFMR;
int WDcooling;
int WDatm;
int filterSet;
int numFilts;
int needFS;
double minMass;
};
The corresponding Python classes are:
#### Model structure ####
class model(ctypes.Structure):
_fields_ = [('evoModel',ctypes.c_int),
('brownDwarfEvol',ctypes.c_int),
('mainSequenceEvol',ctypes.c_int),
('IFMR',ctypes.c_int),
('WDcooling',ctypes.c_int),
('WDatm',ctypes.c_int),
('filterSet',ctypes.c_int),
('numFilts',ctypes.c_int),
('needFS',ctypes.c_int),
('minMass',ctypes.c_double)]
#### Model pointer ####
modelPtr = ctypes.POINTER(model)
#### Star Structure ####
class star(ctypes.Structure):
_fields_ = [('id',ctypes.c_int),
('obsPhot',ctypes.c_double * FILTS),
('photometry',ctypes.c_double * FILTS),
('variance',ctypes.c_double * FILTS),
('useFilt',ctypes.c_double * FILTS),
('U',ctypes.c_double),
('massRatio',ctypes.c_double),
('status',ctypes.c_int * 2),
('wdType',ctypes.c_int * 2),
('isFieldStar',ctypes.c_int),
('useDuringBurnIn',ctypes.c_int),
('clustStarPriorDens',ctypes.c_double),
('clustStarProposalDens',ctypes.c_double),
('beta',(ctypes.c_double * 2) * NPARAMS),
('betaMassRatio',ctypes.c_double * 2),
('meanU',ctypes.c_double),
('varU',ctypes.c_double),
('meanMassRatio',ctypes.c_double),
('varMassRatio',ctypes.c_double),
('UStepSize',ctypes.c_double),
('massRatioStepSize',ctypes.c_double),
('boundsFlag',ctypes.c_int),
('wdLogTeff',ctypes.c_double * 2),
('massNow',ctypes.c_double * 2)]
#### Star pointer ####
starPtr = ctypes.POINTER(star)
#### Cluster Structure ####
class cluster(ctypes.Structure):
_fields_ = [('nStars',ctypes.c_int),
('M_wd_up',ctypes.c_double),
('parameter',ctypes.c_double * NPARAMS),
('stepSize',ctypes.c_double * NPARAMS),
('mean',ctypes.c_double * NPARAMS),
('priorVar',ctypes.c_double * NPARAMS),
('priorMean',ctypes.c_double * NPARAMS),
('betamabs',ctypes.c_double),
('betaFabs',ctypes.c_double),
('betaFY',ctypes.c_double),
('betaAgeMod',ctypes.c_double * 3),
('AGBt_zmass',ctypes.c_double),
('varScale',ctypes.c_double),
('evoModels',model),
('stars',starPtr)]
#### Cluster pointer ####
clusterPtr = ctypes.POINTER(cluster)
Note that C doesn’t care what order I declare these in, whereas in Python, which executes sequentially, I need to create the model and star structures (and their pointer types) before I can then use those structures as part of my cluster structure.
NPARAMS and FILTS are macros that are defined in a header file in C. Getting their values to use in Python is a little tricky, and will be the subject of my next post in this series.
Full code available at: https://github.com/stevendegennaro/mcmc