3 min readMay 20, 2024

In the context of processing trillions of financial transactions or any other large dataset, efficiency and performance are critical. In cases where data processing and computational capaticty are critical, libraries such as NumPy stand out with their ability to perform fast operations directly on data arrays. This documentation briefly explains why numerical programming libraries (NumPy etc.) are advantageous than object-oriented programming languages like Python for such tasks:

Introduction

Imagine you have a huge box of colored beads that you need to sort by color as quickly as possible. You have two options as described below:

Option 1 (Using Python (OOP)): Each bead is inside a small clear box. To know the color, you need to open each box, check the color, and then place the bead in the right pile.
Option 2: (Using NumPy): All beads are loose, and you can see their colors directly. You can quickly move each bead to its respective pile without having to open any boxes.

From a binary machine perspective, the additional steps required to “open the box” in OOP languages introduce unnecessary complexity and slow down the processing of large data sets. On the other hand, similar to option 2; NumPy uses homogeneous data types for its arrays, which allows for efficient, contiguous memory allocation. This structure reduces the overhead seen in object-oriented programming, where each data element is an object with additional attributes like type and reference count:

Step-by-Step Example of Data Storage in Binary System: OOP vs. NumPy

Understanding Data Storage

Python (OOP)

Here’s a analysis of how Python stores the integer x = 5 in memory:

Step 1: Memory Allocation

Object Creation: When you define x = 5, Python creates an int object.
Memory Overhead: This object is not just a plain integer. It includes several extra informations:
Type: Points to the type of the object, in this case, int.
Reference Count: Helps Python to know how many references points to the object.
Object Value: The actual integer value, which is 5.

Step 2: Binary Representation

Storage: The value 5 is stored in binary form inside the object. In a 64-bit system, integers are typically stored using 64 bits (or 8 bytes), but the entire object takes up much more space due to the additional overhead.

Understanding Data Storage in NumPy

NumPy, on the other hand, is designed to handle large arrays of data more efficiently. Here’s how NumPy stores an array with a single integer 5:

Step 1: Array Creation

Direct Storage: When you create an array, e.g., import numpy as np and x_np = np.array([5]), NumPy allocates space for the array in a continuous block of memory.
Data Type Specification: NumPy stores elements in the array as int32 by default on many systems, which uses 4 bytes per integer.

Step 2: Binary Representation

Efficient Memory Use: Unlike Python objects, NumPy does not store type information or reference counts for every element. It uses a single type and shape descriptor for the entire array.
Direct Access: Each element in the array, such as 5, is directly converted into a binary format (e.g., 00000101 in 8-bit binary) and stored consecutively in memory.

Visual Comparison

To visualize the difference, consider the memory layout:

Python OOP (integer):

Memory used: 20 bytes

Layout: | Type Info (8 bytes) | Reference Count (8 bytes) | Value (4 bytes) |

NumPy Array (single integer):

Memory used: 4 bytes (if using int32)

Layout: | 00000101 | (directly in binary form)

Introduction

Understanding Data Storage

Python (OOP)

Step 1: Memory Allocation

Step 2: Binary Representation

Understanding Data Storage in NumPy

Step 1: Array Creation

Step 2: Binary Representation

Visual Comparison

Written by Sevgi Demirel