Python, C and Symbols

Shady Atef
Just me, me, programming & life
5 min readApr 2, 2018
Python Logo

I was recently assigned a task to embed a python interpreter in our system which is written — mostly- in C++.

That seemed an easy task, just capture the input from the UI and feed it to PyRun_* functions in the Python C-API — especially it’s well documented.

So let’s kick a few code here

PyRun_SimpleString(/* Python code*/); // pass python code as c-string

That’s was too cool, but I want to run python files. Lucky me there is PyRun_SimpleFile , just give it FILE* as the first parameter and just the file name.

FILE* to_run_script= fopen("script.py", "r");
if (to_run_script != NULL)
PyRun_SimpleFile(to_run_script, "script.py");

Easy, right ? No, it doesn't work just a blank black console. Nothing more.
But why ? — Python API documentation states:

Note also that several of these functions take FILE* parameters. One particular issue which needs to be handled carefully is that the FILE structure for different C libraries can be different and incompatible.

Or simply It won’t work under specific operating system ( I’ve tested windows).

So to run a file you have to load it as python file as following :

PyObject* python_file_obj = PyFile_FromString("script.py", "r");
FILE* to_run_script = PyFile_AsFile(python_file_obj);
if (to_run_script != NULL)
PyRun_SimpleFile(to_run_script, "script.py");

That worked, but the Human nature drives us for more control.

Let’s try PyRun_File — that takes two python dictionaries for Symbol tables.

Symbol table is just a dictionary where keys represent the name of variables, functions ,classes or imported modules and values represent variables’ values, or memory location for other symbols.

In python, at any point in code, there are always two tables :

  • A global one and it can be accessed via globals() .
  • A local one and also can be accessed via locals() .

Let’s try it

// C++ code
PyObject* global_symbol_table = PyDict_New();
PyObject* local_symbol_table = PyDict_New();
if (to_run_script != NULL)
PyRun_File(to_run_script, "script.py", Py_file_input,global_symbol_table,local_symbol_table);
std::cout << getInterpreterError(); // A custom function that will print any errors occurred during `PyRun_File`

Everything was up and running, till I decided to import the sysmodule.

I got the following error

ImportError was raised with error message : __import__ not found

What ?? According to Python documentation, import statement calls a built-in function called __import__ .

Python searches for functions in the following order:

  • Local symbol table
  • Global symbol table
  • A symbol table called __builtin__ , it’s simply a dictionary that exists at the global symbol table. (Launch python interactive terminal and try vars(__builtin__) , you will find __import__ and other built-in functions.

But What if there is no dict called __builtin__ in global symbol table, python will do nothing, popping up the not found error.

Instead of new dict as global symbol table, let’s use one for __main__ module.

//C++ code
PyObject * main_module = PyImport_ImportModule("__main__");
PyObject* global_symbol_table = PyModule_GetDict(main_module);
PyObject* local_symbol_table = PyDict_New();
PyObject* python_file_obj = PyFile_FromString("script.py", "r");
FILE* to_run_script = PyFile_AsFile(python_file_obj);
if (to_run_script != NULL)
PyRun_File(to_run_script, "script.py", Py_file_input, global_symbol_table, local_symbol_table);

It works and prints “Endianness of this system is little”

#script.py
import
sys

print "Endianness of this system is", sys.byteorder

That’s seemed perfect, till a friend decided to use a recursive function.

To simplify things, I will demonstrate using factorial function

def fact(x):
if x <= 1:
return 1
return x * fact(x - 1)

print fact(4)

The following error message hit us :

NameError was raised with error message : global name 'fact' is not defined

Double checking python code, it’s okay and it works if I run the python file directly by calling python script.py .

So I decided to print the global and local symbol table inside the function and on the script level.

#script.py after modification
def fact(x):
print "Globals inside function : id %s and %s " % (id(globals()), globals())
print "Locals inside function : id %s and %s " % (id(locals()), locals())
if x <= 1:
return 1
return x * fact(x - 1)


print "Globals id %s and %s " % (id(globals()), globals())
print "Locals id %s and %s " % (id(locals()), locals())

print fact(4)

The output was as following

Globals id 140521406824808 and {‘__builtins__’: <module ‘__builtin__’ (built-in)>, ‘__name__’: ‘__main__’, ‘__doc__’: None, ‘__package__’: None} 
Locals id 140521405908424 and {‘fact’: <function fact at 0x7fcdb07b9b90>}
Globals inside function : id 140521406824808 and {‘__builtins__’: <module ‘__builtin__’ (built-in)>, ‘__name__’: ‘__main__’, ‘__doc__’: None, ‘__package__’: None}
Locals inside function : id 140521405940072 and {‘x’: 4}

Globals and locals inside function are as expected.

But to my surprise, fact is added to locals of script, the dictionary we passed in C++ for local symbol table. that’s why python can’t find it in recursive call as it searches locals inside function, then the globals inside function then the builtin functions.

Everything in python is added to the local symbol table, except what’s declared as global

Let’s add more examples

class Dummy:
pass

global
fact2
def fact2(x):
print "Globals inside function : id %s and %s " % (id(globals()), globals())
print "Locals inside function : id %s and %s " % (id(locals()), locals())
if x <= 1:
return 1
return x * fact2(x - 1)

And the value of symbol tables at the script level

Globals id 140484876980584 and {‘__builtins__’: <module ‘__builtin__’ (built-in)>, ‘__name__’: ‘__main__’, ‘__doc__’: None, ‘fact2’: <function fact2 at 0x7fc52f226b90>, ‘__package__’: None} 
Locals id 140484876064200 and {‘Dummy’: <class __main__.Dummy at 0x7fc52f217738>}

Note that class Dummy is stored in the local symbol table, while function fact2 is stored in the global symbol table.

Now calling fact2(4) will yield the correct number24 , but why the original factorial function worked when I invoked the script by launch python script.py .

Looking at the globals and locals when launching the python directly

Globals id 140403094360424 and {‘Dummy’: <class __main__.Dummy at 0x7fb22481c738>, ‘__builtins__’: <module ‘__builtin__’ (built-in)>, ‘__file__’: ‘/home/jerry/ClionProjects/PythonAPI/script.py’, ‘__package__’: None, ‘fact2’: <function fact2 at 0x7fb22482ab90>, ‘__name__’: ‘__main__’, ‘__doc__’: None} Locals id 140403094360424 and {‘Dummy’: <class __main__.Dummy at 0x7fb22481c738>, ‘__builtins__’: <module ‘__builtin__’ (built-in)>, ‘__file__’: ‘/home/jerry/ClionProjects/PythonAPI/script.py’, ‘__package__’: None, ‘fact2’: <function fact2 at 0x7fb22482ab90>, ‘__name__’: ‘__main__’, ‘__doc__’: None}

Note that locals and globals have the same id, they are the same object, so this distinction is of no-value here.

Bottom line: How I did solve my problem, I did as python interpreter by using the same dictionary for both symbol tables when invoking PyRun from C++

PyRun_File(to_run_script, "script.py", Py_file_input,global_symbol_table,global_symbol_table);

--

--