Let’s get a little “meta” about programming.
How does the Python program (better know as the interpreter) “know” how to run your code? If you’re new to programming, it may seem like magic. In fact, it still seems like magic to me after being a professional for more than a decade.
The Python interpreter is not magic (sorry to disappoint you). It follows a predictable set of steps to translate your code into instructions that a machine can run.
At a fairly high level, here’s what happens to your code:
if
is a different token than a numeric value like 42
.BINARY_ADD
and are meant to be very generic so that a computer can run them.Many more details could fit into that description, but that’s the rough sketch of how typed characters are executed by computer CPUs.
By the time your source code is turned into bytecode, it’s too late to gain much understanding about what you wrote. Bytecode is very primitive and very tuned to making the interpreter fast. In other words, bytecode is designed for computers over people.
On the other hand, abstract syntax trees have enough structured information within them to make them useful for learning about your code. ASTs still aren’t very people friendly, but they are more sensible than the bytecode representation.
Because Python is a “batteries included” language, the tools you need to use ASTs are built into the standard library.
The primary tool to work with ASTs is the ast
module. Let’s look at an example to see how this works.
ast
by example
Below is the example Python script that we’ll use. This script answers the question of “what modules were imported?”
import ast
from pprint import pprint
def main():
with open("ast_example.py", "r") as source:
tree = ast.parse(source.read())
analyzer = Analyzer()
analyzer.visit(tree)
analyzer.report()
class Analyzer(ast.NodeVisitor):
def __init__(self):
self.stats = {"import": [], "from": []}
def visit_Import(self, node):
for alias in node.names:
self.stats["import"].append(alias.name)
self.generic_visit(node)
def visit_ImportFrom(self, node):
for alias in node.names:
self.stats["from"].append(alias.name)
self.generic_visit(node)
def report(self):
pprint(self.stats)
if __name__ == "__main__":
main()
This code does a couple of major things:
You can run this code as:
$ python3 ast_example.py
{'from': ['pprint'], 'import': ['ast']}
Transform to AST
with open("ast_example.py", "r") as source:
tree = ast.parse(source.read())
In two lines of code, we read a file and create an AST named tree
. The ast.parse
function makes this a snap! There is a ton happening under the hood of that function that we can blissfully ignore.
With one function call, Python processed all the tokens, followed all the rules of the language, and built a data structure (i.e., a tree) containing all the relevant information to run the code.
Before moving on, let’s take a moment to consider what a tree is. Trees are a very deep topic in software development so consider this a primer rather than an exhaustive explanation.
A tree is a way to hold data as a set of “nodes” connected by “edges.”
+-----+
| A |
+-----+
/ \
/ \
+-----+ +-----+
| B | | C |
+-----+ +-----+
In this diagram, A, B, and C are all nodes and there are edges connecting A to B and A to C.
One way to represent this tree in code could be:
class Node:
def __init__(self, value):
self.value = value
self.children = []
tree = Node('A')
tree.children.append(Node('B'))
tree.children.append(Node('C'))
Notice that the tree
is actually a node! When we work with a tree, we’re really dealing with a collection of nodes, and the tree variable is a reference to the “root” node (e.g., node A). By having this kind of structure, we can check each node in the tree and take action. We do that by visiting each node in the tree and processing its data.
def print_node_value(value):
print(value)
def visit(node, handle_node):
handle_node(node.value)
for child in node.children:
visit(child, handle_node)
# tree is from the previous example.
visit(tree, print_node_value)
# This should print:
# A
# B
# C
Now that we have an idea of what a tree is, we can consider what the next section of the example script does. The tree structure of the Python abstract syntax tree is more involved because of the count of its nodes and the type of data stored, yet the core idea of nodes and edges is the same.
Analyze the ASTOnce we have the tree, the Analyzer
follows the visitor pattern that I showed above to extract information out of the tree.
I noted that a Python AST is more complex than my basic Node
design. One difference is that it tracks various types of nodes. This is where ast.NodeVisitor
is useful.
A NodeVisitor
can respond to any type of node in the Python AST. To visit a particular type of node, we must implement a method that looks like visit_<node type>
.
My example code is trying to find out about imports. To learn about imports, the code pulls from the Import
and ImportFrom
node types.
def visit_Import(self, node):
for alias in node.names:
self.stats["import"].append(alias.name)
self.generic_visit(node)
def visit_ImportFrom(self, node):
for alias in node.names:
self.stats["from"].append(alias.name)
self.generic_visit(node)
This code takes the name of the module and stores it in a list of statistics. While the code is not fancy, it shows you how to interact with AST nodes.
With the NodeVisitor
class defined, we can use it to analyze the tree.
analyzer = Analyzer()
analyzer.visit(tree)
The visit
method will delegate to your visit_<node type>
method whenever that type of node is encountered while traversing through the tree structure.
So, what kinds of node types are there? You can find the full list in the Abstract Grammar section of the ast
module documentation. Truthfully, I find that documentation a little hard to absorb. You may have more success by referring to a more exhaustive guide like the Green Tree Snakes Nodes guide.
By now, you hopefully understand how to:
NodeVisitor
.I think you can answer many interesting questions about your code by using abstract syntax trees. Questions like:
The ast
module is probably not a tool that you will reach for very often. In those times that you do need ast
, its minimal API is quite memorable and you can analyze code quickly.
If you found this useful, would you mind sharing this on X or your favorite social media site? I like chatting with people about these kinds of topics so feel free to tweet me at @mblayman.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4