← back to blogposts

Understand Your Code Better With Bytecode Inspection

7/2021

Ever wondered why some Python constructs are faster than others? Do you know why to use {} instead of dict()? This and other "secrets" will be revealed by a bytecode representation of your source code.

Table of contents

  1. Bytecode
  2. Under the hood of the Python Virtual Machine
  3. Python Bytecode Inspection
  4. What now?

Bytecode

It would be appropriate to start with the term "bytecode", or sometimes you might hear "portable code". Bytecode is a universal set of instructions, into which our human-readable source code is interpreted. This feature is advantageous for a multi-platform approach, as your code can run anywhere, where the corresponding interpreter is installed (JVM for Java, Python, ...). This statement is true for so-called interpreted languages like Python, Java, JavaScript, or Perl. Compiled languages work in a bit different manner, but let's keep it for another time. 

This set could be composed of instructions like:

  • LOAD_NAME, LOAD_CONST, STORE_NAME, DELETE_NAME - variables and constants operations

  • BINARY_ADD, BINARY_SUBSTRACT, BINARY_AND - binary operations

  • UNARY_POSITIVE, UNARY_NEGATIVE, UNARY_NOT - unary operations

  • JUMP_FORWARD, JUMP_ABSOLUTE - jump operations

Note: There is a full list of operations with explanations in the Python Docs.

Under the hood of the Python Virtual Machine

Stack
Stack - abstract data type (Wikipedia)

CPython uses a stack-based (LIFO) virtual machine. That said, every instruction must be first pushed to the stack before it can be executed. There are 3 python stack types:

1. The Call Stack

  • Imagine the call stack as the main spine of the Python program composed of so-called "frames", where the bottom of the stack is the entry point and each function call causes a push of a corresponding frame to the top of the stack. When a function returns a value, the frame is popped out of the stack.

2. The Evaluation/Data Stack

  • This stack is responsible for handling all kinds of, surprisingly, evaluations. That's basically where the Python Code happens. For example, the LOAD_FAST, BINARY_MULTIPLY, and INPLACE_ADD instructions are evaluated here.

3. The Block Stack

  • The block stack is also a part of a frame and it keeps track of every block that's present in the python code. Each loop or try-catch block will cause a push to this stack. Because of this behaviour, Python knows exactly what to do when you call keywords like break or continue.

Python Bytecode Inspection

In order to reveal the bytecode which will be generated from our Python code, we can use the dis module - Disassembler for Python bytecode:

>>> import dis
>>> dis.dis("print('Hello, World!')")

Then this Python code:

print("Hello, World!")

Will be translated into this bytecode:

1          0 LOAD_GLOBAL              0 (print)
           2 LOAD_CONST               0 ('Hello, World!')
           4 CALL_FUNCTION            1
           6 RETURN_VALUE

Where:

[1]|[2]|[3]|[4]|          [5]         |[6]|  [7]
---|---|---|---|----------------------|---|-------
  1|   |   |  0|LOAD_GLOBAL           |  0|(print)
   |   |   |  2|LOAD_CONST            |  0|("Hello, World!")
   |   |   |  4|CALL_FUNCTION         |  1|
   |   |   |  6|RETURN_VALUE          |   |
  1. Line number in our source code
  2. Current instruction executed (labelled with -->)
    • This one can occur at faulty lines, for example when you try to load an undefined variable. The current instruction will be the one, where the error occurred, as the value cannot be loaded onto the stack, thus the next operation cannot be executed.
  3. Possible JUMP from an earlier instruction (labelled with >>)
    • These can occur at if-else blocks for example.
    • Tip: dis.findlabels() is good for "Detecting all offsets in the code object code which are jump targets, and return a list of these offsets."
  4. The address in the bytecode, a.k.a. "On which position can I find this instruction?" 
    • Note that these are always multiples of 2 for CPython 3.6+, as there is always a 1-byte instruction that takes a 1-byte argument. Earlier versions of CPython had variable instruction length, as there were instructions with 0 to 2 arguments, which led to 1-byte - 3-byte instructions.
  5. The instruction name
  6. The argument
  7. The human-readable form of the argument

What now?

You can flex with all the stuff you just learned, yay! Now seriously... This is an awesome way to get closer to understanding your code on a deeper level. And not just Python code, there are a lot of languages that use bytecode. When you are uncertain about which concept to use, inspect the operations each generates and compare. Now you can try it for yourself with the infamous duo {} and dict(). 

 

Thanks for learning new stuff with me, I really appreciate it! I am also open to contributions and improvements. Now go and write some effective code!




Add a comment
You can include your email if you want a response from the author.