Internal working of Python

KAUSHIK K 1941116
6 min readAug 21, 2021

--

Introduction

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Python programs are also platform independent. Once we write a Python program, it can run on any platform without rewriting once again. Python uses PVM to convert python code to machine understandable code. Now let us see the internal workings of python, before going in-depth into the technical concepts let us try to understand few technical terms to have a better understanding.

Compiler VS Interpreter

An interpreter is a computer program, which coverts each high-level program statement into the machine code. This includes source code, pre-compiled code, and scripts. Both compiler and interpreters do the same job which is converting higher level programming language to machine code. However, a compiler will convert the code into machine code before program run. Interpreters convert code into machine code when the program is run.

Compiler:

Interpreter:

What is PVM?

We know that computers understand only machine code that comprises 1s and 0s. Since computer understands only machine code, it is imperative that we should convert any program into machine code before it is submitted to the computer for execution. For this purpose, we should take the help of a compiler. A compiler normally converts the program source code into machine code.

A Python compiler does the same task but in a slightly different manner. It converts the program source code into another code, called byte code. Each Python program statement is converted into a group of byte code instructions.

Python Virtual Machine (PVM) takes those byte codes converts those instructions into machine code so that the computer can execute those machine code instructions and display the final output. To carry out this conversion, PVM is equipped with an interpreter. The interpreter converts the byte code into machine code and sends that machine code to the computer processor for execution. Since interpreter is playing the main role, often the Python Virtual Machine is also called an interpreter.

Machine code

Machine code is the low-level binary 1s and 0s that make up the instructions to the processor. These are processed directly by the CPU and are the final output of a compiler for given CPU and operating system combination. Machine code for one CPU and OS will not run on different CPU or OS that isn’t compatible. (i.e. Intel x64 Windows OS machine code will not run on Intel x86 Windows OS ).

Byte code

Byte code is a virtualized machine code. Unlike machine code for a real processor, byte code is often for an idealized or virtual processor that doesn’t actually exist. Byte code is based on a CPU architecture like a register or stack machine but often uses general features common to any CPU or instructions and concepts that don’t exist on any CPU.

Python Code:

Byte code:

After converting high level language to byte code it is sent to python virtual machine to get executed.

Why Interpreted ?

One popular advantage of interpreted languages is that they are platform-independent. As long as the Python bytecode and the Virtual Machine have the same version, Python bytecode can be executed on any platform (Windows, MacOS, etc).

Memory Management

Where is a program stored and executed in computers?

Whenever you save your program into a file (any kind of file, say ‘add.py’), it automatically gets stored in the secondary storage i.e. the hard disk. The Operating System’s kernel (kernel’s File management system) does that.

When we run a program, it is loaded into the main/primary memory of the computer called RAM, the entire program is transferred to the RAM.

Stack

A stack is a special area of computer’s memory which stores temporary variables created by a function. In stack, variables are declared, stored and initialized during runtime.

It is a temporary storage memory. When the computing task is complete, the memory of the variable will be automatically erased. The stack section mostly contains methods, local variable, and reference variables.

Heap

The heap is a memory used by programming languages to store global variables. By default, all global variable are stored in heap memory space. It supports Dynamic memory allocation. The heap is not managed automatically for you and is not as tightly managed by the CPU. It is more like a free-floating region of memory.

Is everything in python an object?

Python is an object-oriented programming language. Everything is in Python treated as an object, including variable, function, list, tuple, dictionary, set, etc. Every object belongs to its class. An object is a real-life entity. An object is the collection of various data and functions that operate on those data.

Let us see what is reference to have a better understanding of further concepts,

“A link to an object”. A reference is an address that indicates where an object’s variables and methods are stored.

Garbage Collection

Python deletes unwanted objects (built-in types or class instances) automatically to free the memory space. The process by which Python periodically frees and reclaims blocks of memory that no longer are in use is called Garbage Collection.

Code:

Let us see what happens in the Stack and Heap:

String “Hello” is stored in the Heap, and in the stack a reference is created to that object. In above diagram both variable ss and s have the same memory address in the stack as both are referring to the same object. Same thing applies for number too.

So when s = “good bye” the value in the heap will not be replaced, thus ss referring to that will remain the same but what happens is that a new value is created in the heap and s refers to the address of the new value created in the heap.When variable ss also changes its reference there is something called as reference count that each object contains, it counts the number of variables referring that particular object, when reference count is less than 1 the object deletes itself from the memory by the garbage collector which deallocates the object in heap which no longer have references to them.

For better understanding use memory dump tools to extract and analyse data from RAM.

Thank you

--

--