Intro to Python ast Module

Shanshan
5 min readJan 24, 2022

--

Photo by Dima Pechurin on Unsplash

ast is a module in the python standard library. Python codes need to be converted to an Abstract Syntax Tree (AST) before becoming “byte code”(.pyc files). Generating AST is the most important function of ast, but there are more ways one can use the module.

This post summarizes my learnings after studying 10+ resources and contributing to one active open-source project that uses ast.

First, what is an Abstract Syntax Tree?

Let’s look at an example.

Suppose your python code is just one line: print('hello world'). You can convert it to an AST object with:

import ast
tree = ast.parse("print('hello world')")

To print out the tree’s data, use ast.walk function to traverse the nodes in preorder sequence (root, left, right):

for node in ast.walk(tree):
print(node)
print(node.__dict__)
print("children: " + str([x for x in ast.iter_child_nodes(node)]) + "\\n")

You will get results:

<ast.Module object at 0x105697640>
{'body': [<ast.Expr object at 0x105695690>],
'type_ignores': []}
children: [<ast.Expr object at 0x105695690>]
<ast.Expr object at 0x105695690>
{'value': <ast.Call object at 0x1056959f0>,
'lineno': 1, 'col_offset': 0, 'end_lineno': 1, 'end_col_offset': 20}
children: [<ast.Call object at 0x1056959f0>]
<ast.Call object at 0x1056959f0>
{'func': <ast.Name object at 0x105696140>,
'args': [<ast.Constant object at 0x106280370>],
'keywords': [], 'lineno': 1, 'col_offset': 0,
'end_lineno': 1, 'end_col_offset': 20}
children: [<ast.Name object at 0x105696140>,
<ast.Constant object at 0x106280370>]
<ast.Name object at 0x105696140>
{'id': 'print',
'ctx': <ast.Load object at 0x103075f90>,
'lineno': 1, 'col_offset': 0, 'end_lineno': 1,
'end_col_offset': 5}
children: [<ast.Load object at 0x103075f90>]
<ast.Constant object at 0x106280370>
{'value': 'hello world', 'kind': None, 'lineno': 1, 'col_offset': 6, 'end_lineno': 1, 'end_col_offset': 19}
children: []
<ast.Load object at 0x103075f90>
{}
children: []

In an AST, each node is an object defined in python AST grammar class (https://docs.python.org/3/library/ast.html). In our sample, there are 7 nodes in the tree:

It’s not surprising to see our root element (the first element printed out) is an ast.Module object, because each .py file is a module.

The root node has only one element in the list in its ‘body’ attribute because our code has only one statement. In a more complicated example, you would easily see more elements in the list. The element is an as.Expr object and it’s a Call that invokes ‘print’ function with args ‘hello worlds’. All make sense.

The visual presentation of the tree object for our sample is:

AST example
Figure 1. AST of a python file with code “print('hello world')"

It’s interesting to note that some Classes include position info. For instance, from lineno, col_offset,end_lineno, end_col_offset of ast.Constant, we can tell the string we want to print (“hello world”) is defined from the 6th character to the 19th character in the 1st line of the code. It turns out these position data is what enables compiler to print useful error message! They are also useful for manipulating the codes, which we will see in the example later.

Second, how is ast module used?

Generating the Abstract Syntax Tree structure, also called ‘syntax analyzer’, is a step of python code compiling (Figure 2). When you run a python file (.py), the codes are first compiled to AST and then turned into “byte code”(.pyc files). During execution, byte code is passed to the Python Virtual Machine to be interpreted. Therefore, even though you may not have heard about the ast module, you are using it all the time.

Figure 2. Phases of a Compiler. Source: https://www.geeksforgeeks.org/phases-of-a-compiler/

In addition, AST is a powerful structure used by a variety of applications.

If you are really ambitious, you can even generate new python codes by composing an AST tree and then converting it to the codes using unparse function. The function is available since python 3.9.

Third, how to use ast for code generation

Since ast module provides the position of code elements, one can use it to manipulate code texts to make new ones. One example is shown by this open source project https://github.com/hhursev/recipe-scrapers. It uses ast to generate boilerplate codes and promote standardized practice among contributors.

A bit ground: the goal of the project is to build libraries to scrape recipe data from a variety of sites. To add a new site, a developer needs to create scraper class and test cases to fit the framework. Adam Keys created a code generator that automates the steps for adding new site. Once a developer calls the generator with a ClassName and the site url, new class files will be generated and inserted into the correct folders. Casual contributors only need to customize any implementation specific for the site and provide expected results for the unit test.

Diving deep into how this generator works. Here is a code snippet from https://github.com/hhursev/recipe-scrapers/blob/master/generate.py:

def generate_scraper_test(class_name, host_name):
with open("templates/test_scraper.py") as source:
code = source.read()
program = ast.parse(code)
state = GenerateTestScraperState(class_name,
host_name, code)
for node in ast.walk(program):
if not state.step(node):
break
...
class GenerateScraperState(ScraperState):
def __init__(self, class_name, host_name, code):
super().__init__(code)
self.class_name = class_name
self.host_name = host_name
def step(self, node):
if isinstance(node, ast.ClassDef)
and node.name == template_class_name:
offset = self._offset(node) segment_end = self.code.index(
template_class_name, offset)
self._replace(self.class_name,
segment_end,len(template_class_name))
if isinstance(node, ast.Constant) and
node.value == template_host_name:
offset = self._offset(node)
segment_end = self.code.index(
template_host_name, offset)
self._replace(
self.host_name, segment_end,
len(template_host_name))
return True

To create a new scraper class, the code reads a template file test_scraper.py that has the boilerplate codes for a new scraper class and default method implementation. It uses the ast module to parse the template code and find the offset location of elements to modify, indicated by template_class_name and template_host_name. It uses string operations to replace template names with actual class name and host names. Lastly, it writes the manipulated codes to the desired folder.

Using ast to generate code in a collaborative project like this has a few benefits:

  • It saves time and minimizes the boring work of creating new classes
  • It makes following the framework very easy, and makes it easy to onboard new developers
  • It encourages everyone follows the same framework without excessive communication

The Wrap

ast is a powerful python module that builds Abstract Syntax Tree. It is the hidden force behind many popular tools such as IDEs that developers cannot live without.

It can also be used in code generation, which simplifies onboarding new people for collaborative projects. Code generation helps standardize code structure and promote shared practices without impeding development.

Useful Resources

For those who want to research more about this package, here are the resources I found most useful among many that I explored.

Hope you find the post interesting or useful. Thanks for reading.

--

--

Shanshan

Inquisitive tinkerer. I’m writing about coding, technical product creation, and philosophical quotes. These are things that fascinate me at this moment.