Demystifying Python Bytecode: A Guide to Understanding and Analyzing Code Execution

Noran Saber Abdelfattah
11 min readJun 8, 2023

--

Introduction

Python code is executed using bytecode, which acts as a bridge between machine execution and source code that can be viewed by humans. Understanding bytecode may help with performance analysis, troubleshooting, and gaining an understanding of how code behaves. Python’s dis module is essential for decomposing bytecode into individual instructions, which enables programmers to examine code behavior, spot performance snags, and resolve difficult problems.

Bytcode

  • To get to know what is the bytecode let’s at first look at the flow of the process of interpretation
  • Python source code: which you create in human-readable form to represent the logic and functionality of your program, is where you should begin. (Your Code)
  • Compilation: Your Python source code goes through a compilation process when you run or execute it. The Python compiler, which is a component of the CPython interpreter, converts your source code into CPython bytecode at this stage. This bytecode is a set of lower-level, cross-platform instructions. It is a translator that translates your source code into bytecode.
  • CPython bytecode: After the compilation stage, the resultant Python bytecode is produced. It is made up of a series of commands that the CPython interpreter can comprehend and carry out. (You can imagine it like a guidebook to guide the interpreter to execute the code and how to execute it) Just like a guidebook provides step-by-step instructions for performing certain tasks, CPython bytecode provides step-by-step instructions for the interpreter on how to execute your Python code. Each bytecode instruction represents a specific action or operation that the interpreter needs to perform.
  • Execution of the interpreter: The CPython interpreter now reads and runs the CPython bytecode line by line. It executes the desired operations and produces the output or behavior described by your source code by following the instructions given in the bytecode.
  • Note that: Python compiler in the compilation process is part of CPython interpreter, so, they are not the same thing
  • So, the flow is: Source code -> Compilation -> CPython bytecode -> Interpreter execution

Now let’s talk and go deeply through bytecode

  • CPython bytecode: Bytecode is a low-level representation of the instructions in a programming language.
  • In the case of Python, the CPython interpreter uses a particular kind of bytecode known as CPython bytecode. It functions as a collection of guidelines that specify the activities the interpreter should do.
  • But if we are trying to look at the bytecode, I guess this will be our reaction😂😂
  • So, I guess we need help from someone who can understand the bytecode and analysis and explain it for us. That’s the role of dis module

dis module

  • The dis module, which is a component of the standard Python library, aids in the analysis of CPython bytecode. In order for you to comprehend what each instruction accomplishes, the dis module in Python is designed to disassemble or break down the bytecode into its individual instructions.
  • Disassembling bytecode is done in order to gain a better understanding of how the interpreter runs Python code. It can be helpful for code analysis and debugging, figuring out performance snags, or discovering how specific constructs are converted into bytecode.

How can dis module help us?

  • Understanding code behavior: Developers may learn how their code is executed and how certain constructions are converted into bytecode instructions by looking at the disassembled bytecode.
  • Performance analysis: By looking at the bytecode instructions and the sequence in which they are executed, the dis module can help locate performance bottlenecks. This can help the code be optimized for improved performance.
  • Debugging: Disassembled bytecode can be useful in diagnosing and debugging difficult problems. It enables programmers to examine the precise instructions being carried out and find any issues.

You can think about it like your teacher he is looking at difficult concepts and try to explain them in a simple way for you and if you misunderstand anything he got that and try to find the issue

Let’s take an example

  • Example function: Let’s take a simple example function called myfunc(). It takes a list (alist) as input and returns the length of that list.
  • Disassembling myfunc(): To disassemble the bytecode of myfunc(), you can use the dis.dis() function provided by the dis module. It will show you the individual bytecode instructions used in the function.
  • Understanding the disassembled code: The disassembled code will be displayed with line numbers. Each line represents a bytecode instruction and tells the interpreter what to do. For example, in the given disassembled code:
  • Line 2: The bytecode instruction LOAD_GLOBAL loads the built-in function len.
  • Line 3: The bytecode instruction LOAD_FAST loads the value of the local variable alist.
  • Line 6: The bytecode instruction CALL_FUNCTION calls the previously loaded function with 1 argument.
  • Line 9: The bytecode instruction RETURN_VALUE returns the result of the function call.

Question 🤔. If i want to access ont the details in my bytecode, Can I do that and if yes how?

  • Yes, you can, by using the Bytecode object. simply you are creating a big container to hold all the information to be able to access on them

Bytecode object

  • A Bytecode object can be used to encapsulate code in Python. Easy access to the specifics of the compiled code is made possible via this object.
  • dis.Bytecode() function: This function breaks the bytecode corresponding to a function, method, source code, or code object. It takes the code you want to analyze as input.
  • Looping over a Bytecode instance: If you have a Bytecode object, you may access each individual bytecode action by iterating over it. Each bytecode action corresponds to a particular code instruction.
  • Instruction instance: The bytecode for a function, method, source code, or code object is examined by this function. It accepts as input the code you wish to analyze.
  • The Bytecode object produces instruction instances as you iterate through it. These instances stand in for specific bytecode operations and include details about them, such as the operation’s name (opname).
  • To make it clear: Instruction instance it’s the information you got back from the dis.Bytecode function and this information (LOAD_GLOBAL, LOAD_FAST, CALL_FUNCTION, RETURN_VALUE)

Some Methods

  1. rom_traceback() method
  • When your code encounters an error while running, it generates information called a traceback. This traceback tells you which parts of your code were executed before the error occurred.
  • The from_traceback() method takes this traceback information and creates a special object called a Bytecode instance from it.
  • The Bytecode instance helps you understand where the error happened in your code by identifying the specific instruction (a step in the code) that caused the error.
  • The from_traceback() method also sets something known as the “current_offset” to that problematic instruction, so you know exactly which part of the code to focus on when debugging.
  • The from_traceback() function, to put it simply, takes the error information, determines which section of your code caused the issue, and then provides a clear direction for where to search to solve it.
  • My reaction to the function, it helps us in debugging

2. codeobj

  • When you create code in Python, the computer must be able to comprehend it and run it. Your code must go through a process known as “compiling” in order for this to happen. Your Python code is transformed into a unique format that the machine can comprehend during compilation. A “compiled code object,” or simply “codeobj,” is the name given to this translated code.
  • compile() function: To create a codeobj, you use the compile() function in Python. This function takes your Python code as input and transforms it into a codeobj, which contains the code in a format that the computer can execute.
  • Purpose of the codeobj: The codeobj is what the Python interpreter actually runs. It contains all the necessary information and instructions for the computer to execute your Python code correctly.
  • A codeobj is the end result of compiling your Python code using the compile() method, to put it simply. The only version of your code that the computer can comprehend and run is the one that has been changed.

3. first_line:

  • first_line is a term used to describe the line number where the disassembled code starts. When code is compiled and transformed into bytecode (a lower-level representation of the code), each line of the original source code is assigned a line number.
  • First_line is used to denote the line number starting at the top of the disassembled code in the context of the Bytecode object and bytecode analysis. When analyzing the bytecode, it aids in delivering precise line number information.
  • For example, if you have a Python function defined starting from line 10 in your source code file, and you disassemble the bytecode of that function using the Bytecode object, setting first_line to 10 would ensure that line numbers in the disassembled code correspond to the original source code accurately.
  • Simply put, first_line helps to keep track of the line numbers from the original source code in the disassembled bytecode, allowing you to match specific instructions or operations with the corresponding lines in the source code.

4. info() method

  • A function called info() gives comprehensive information about a piece of code. Python code that has been compiled and acquired via the compile() function or another method is represented as a code object.
  • The info() function on a code object delivers a multi-line string with different information about the code when you call it. To make it simpler to read and comprehend, this material is given in a structured manner.

The info() function can return a variety of information, including:

  • Code object name(e.g., the function or module name)
  • Code object type (e.g., function, module, or built-in code)
  • where the code object originated, The filename or source location
  • The line numbers where the code object starts and ends in the source code
  • Any flags or special properties associated with the code object.
  • Simply put, the info() function gives you crucial information about an object’s name, type, location, and other attributes while summarizing a description of the produced code object. When interacting with or studying the code, this information may be useful.

Let’s take a few examples

  • Let’s explain the output
  • Name: The name of the function is calculate.
  • Filename: The filename is "string>. In this case, it indicates that the function is defined within a string or interactive session rather than a specific file.
  • Argument count: The function takes two arguments (a, b) .
  • Positional-only arguments: There are no positional-only arguments for this function. So it’s not a positional argument function.
  • Keyword-only arguments: There are no keyword-only arguments for this function.
  • Number of local variables: The function has three local variables.
  • Stack size: The maximum stack size needed for the bytecode operations is 2. It indicates the maximum number of items that can be stored on the stack during the execution of the function.
  • Flags: The function has the flags OPTIMIZED and NEWLOCALS. These flags give information about the function’s properties. OPTIMIZED indicates that the bytecode has been optimized, and NEWLOCALS indicates that the function uses a new local namespace.
  • Constants: The function uses one constant, which is None. Constants are values that are used within the bytecode instructions.
  • Variable names: The function has three variable names: a, b, and result. It tells us the names of the variables used within the function’s scope.

Another one

  • Each line represents an instruction or operation performed by the bytecode. Let’s break down each line:
  • 4 0 RESUME 0: This line indicates the bytecode offset and the operation RESUME. It typically appears at the beginning of the bytecode and is related to internal interpreter mechanics.
  • 5 2 LOAD_FAST 0 (a): This line loads the value of the local variable a onto the stack. The LOAD_FAST instruction is used to access local variables quickly.
  • 4 LOAD_FAST 1 (b): This line loads the value of the local variable b onto the stack. The LOAD_FAST instruction is used to access local variables quickly.
  • 6 BINARY_OP 0 (+): This line performs the binary operation + on the top two items on the stack (the values of a and b). The BINARY_OP instruction is used for binary operations like addition, subtraction, multiplication, etc.
  • 10 STORE_FAST 2 (result): This line stores the result of the addition operation in the local variable result. The STORE_FAST instruction is used to assign a value to a local variable.
  • 6 12 LOAD_FAST 2 (result): This line loads the value of the local variable result onto the stack.
  • 14 RETURN_VALUE: This line indicates that the function should return the value on the top of the stack (which is the value of result in this case) as the output of the function.

Example3

  • Information about a Python function’s instructions is provided through the output. Each instruction denotes a step taken by the function to complete a job.
  • RESUME: This instruction doesn’t have any specific action associated with it. It is just a marker.
  • LOAD_FAST: This instruction loads the value of a variable named ‘a’.
  • LOAD_FAST: This instruction loads the value of a variable named ‘b’.
  • BINARY_OP: This instruction performs a binary operation (addition, in this case) using the values from the previous two instructions.
  • STORE_FAST: This instruction stores the result of the operation in a variable named ‘result’.
  • LOAD_FAST: This instruction loads the value of the ‘result’ variable.
  • RETURN_VALUE: This instruction indicates the end of the function and returns the value.

Conclusion

In conclusion, bytecode is essential to the execution of Python code, and the dis module and the Bytecode object offer helpful resources for decoding and comprehending bytecode. Developers can better understand code behavior, improve efficiency, and solve challenging issues by digging into the bytecode specifics. Exploring bytecode helps developers design more effective and reliable code and improves their understanding of how Python is executed.

Resource

Sincerely, Noran❤️

--

--