My God, It’s Full of Stars (3/7) — Reading Macros in C Header Files and and Creating Python Variables from Them

Data Science Filmmaker
4 min readDec 5, 2023

--

I am working on a project to create a graphical user interface in Python for C code that I wrote many years ago for my doctoral dissertation in astronomy. My goal is to change the C code as little as possible. As a result, one of the unusual things I found myself needing to do was to pull in macros that were defined in the headers of various C files so that I could access their values in Python.

A macro in C is a preprocessor command that looks like:

#define FILTS              14

When the C compiler sees this, it knows that every time it sees “FILTS” in the source code, it should be replaced by “14”. This is done at compile time, which means that unlike a variable name, “FILTS” never appears in the shared object (.so) created by the compiler. Since the .so file is what Python loads at run time, Python does not have access to the value of FILTS when it runs.

source: https://www.geeksforgeeks.org/cc-preprocessors/

To let Python know the value of FILTS, you could in theory, just redefine it as a variable in Python:

FILTS = 14

This has two disadvantages. First, if this value is ever changed in future versions of the C code, you would have to remember to also change it in the Python code. Second, there are hundreds of these macros defined in my C code, which means hundreds of variables I would need to create in Python and then manually keep track of to make sure they are always in sync with any changes in the C code.

So instead I wrote a little script to parse the C header files and get the value of every macro that is defined therein:

from typing import Dict
import re

def getCMacros(fileName: str) -> Dict:
with open(fileName,"r") as f:
lines = f.readlines()

macros = {}
for line in lines:
if line.startswith("#define"):
regex = r'\(.+?\)|\S+'
result = re.findall(regex, line)
if len(result) > 2:
macros[result[1]] = result[2]
return macros

The function takes a filename as a string, opens the file, and reads all of the file’s lines into a list. It then looks through each line to see if that line defines a macro. If it does, we split the line into a list of its tokens based on whitespace. Most of the time, the line will contain three tokens: 1) the pre-processor command (“#DEFINE”), the name of the macro (“FILTS”) and the value of the macro (“14”). Occasionally, there is a comment after the definition, in which case the number of tokens is greater.

We check to make sure that there are at least three tokens because it is also possible in C to define a macro without a value. This is most commonly done to let the compiler know that a particular header file has already been included:

#ifndef _FILENAME //If the header is not already defined
#define _FILENAME //Define it

//Do other stuff

#endif

For the purposes of our macro-parsing function, we don’t care about this type of macro, so we skip it. We then take the second token (the name) and use it as a key for a dictionary, with the third token as the value. In our case, this would create the dictionary:

In [4]: macros
Out[4]: {'FILTS': 14}

Simple. If I wanted to, I could use this dictionary directly whenever I needed the value of the macro.

for i in range(macros['FILTS']):
print(pCluster.contents.photometry[i])

In practice, I found this to be cumbersome. For instance, to access the individual physical parameters of the cluster, the C code defines a bunch of macros:

#define AGE                  0          // age sampling
#define YYY 1 // helium sampling
#define FEH 2 // metallicity sampling
#define MOD 3 // modulus sampling
#define ABS 4 // absorption sampling

To access these parameters from within the cluster structure (created in my last post), the syntax would be:

 pCluster.contents.parameter[macros['AGE']] = 9.0
pCluster.contents.parameter[macros['YYY']] = 0.1
pCluster.contents.parameter[macros['FEH']] = -0.1
pCluster.contents.parameter[macros['MOD']] = 10.0
pCluster.contents.parameter[macros['ABS']] = 0.3

I would prefer to just use each macro as a variable directly, rather than having to find its value in a dictionary. So after finding all of the macros, I ran a second very short script that turned each key into a variable name and each value into the value of that variable:

## Pull in and parse the #define macros in evolve.h and structures.h
macros = getCMacros("../c/structures.h")
macros.update(getCMacros("../c/evolve.h"))
for key in macros:
exec(key + f" = {macros[key]}")

The exec() function takes a string and executes that string as if it were a line of code.

In the end, my code is functionally equivalent to manually rewriting each macro as a line of Python code, but it will always remain in sync with the C code without having to be manually synchronized.

I hope this little bit of code can help someone with the same problem in the future. If so, please let me know in the comments!

Complete code can be found at https://github.com/stevendegennaro/mcmc

--

--