Lazy loading using the iterator in PHP for saving 90% of memory

Alin Pintilie
4 min readFeb 6, 2023

--

Of course, there are some advantages when it comes to eager loading (loading all data one at a time) like processing speed or code complexity but there are cases when memory is more important than speed.

Please see below two popular examples of eager loading :

public function getAllInvoices()
{
$statement = Db::getInstance()->prepare("select * from Invoices");
$statement->execute();
return $statement->fetchAll(PDO::FETCH_OBJ);
}

public function importFromFile($filePath)
{
$fileContent = file_get_contents($filePath);
$explodedContent = explode("\n", $fileContent);
$arrayOfElements = array_map('json_decode', $explodedContent);

retrun arrayOfElements;
}

$documentStoringService = new DocumentStoringService();

//case 1
foreach (getAllInvoices() as $invoice) {
$documentStoringService->store($invoice);
}

//case 2
foreach (importFromFile() as $invoice) {
$documentStoringService->store($invoice);
}

We have two common cases when we fetch some data and then we parse them one by one. This code has nothing special except the fact that those two functions could return a big number of elements which could require an enormous amount of memory.

The question is why do we need to load all data in memory when we could load one at a time? Fortunately, there is a mechanism for loading one at a time and as you already guessed, it is the iterator. Retrieving items from database

There are many ways of getting items from the database depending on the data access layer used. In our example, we are using PDO (PHP Data Objects) extension. Of course, PDO already has a ready-to-use lazy mechanism that is worth taking into consideration when you want to deal with lazy loading. However, we are building our solution on top of iterator because of other advantages that may bring.

Below is the implementation of the Iterator interface:

class LazyDatabaseInvoiceIterator implements Iterator
{
private mixed $row;
private int $index = 0;
private mixed $statement;

public function current(): mixed
{
return $this->row;
}

public function next(): void
{
$this->row = $this->statement->fetch(PDO::FETCH_OBJ);
$this->index++;
}

public function key(): int
{
return $this->index;
}

public function valid(): bool
{
return !empty($this->row);
}

public function rewind(): void
{
$this->statement = Db::getInstance()->prepare("select * from Invoices ");
$this->statement->execute();
$this->row = $this->statement->fetch(PDO::FETCH_OBJ);
}
}

For our purpose, we need to fetch all invoices from the database as you can see in the rewind function. Using the PDO fetch (instead of fetchAll as it is inside the getAllInvoices) function we will get only one row at a time which is assigned to $this->row attribute. After checking if the row is valid we get the current fetched object. Inside the next function, we call again $this->statement->fetch(PDO::FETCH_OBJ) but because of internal pointer implementation, PDO will return the next element from the database. In this way, our iterator can provide fetched elements one after one without keeping all elements in memory.

The above implementation could be simplified by implementing IteratorAggregate

Using LazyDatabaseInvoiceIterator in foreach:

$invoices = new LazyDatabaseInvoiceIterator();
foreach ($invoices as $invoice) {
$documentStoringService->store($invoice);
}

What is the economy in terms of memory usage?

I run the script in PHP 8 with 20.908 elements (Invoice objects in the database). The memory usage with eager loading was 10,57 Mb. After the implementation of LazyDatabaseInvoiceIterator, the usage was 0,82 Mb. We managed to save 9,75 MB (92,24%).

Reading items from a file

When it comes to reading items from a file of course there are many ways. To put it, in contrast, I used as a bad practice file_get_contents(). This is not a bad function itself but could be a better way of reading from files especially when the file content is very large. However, because file_get_contents uses memory mapping could be used inside the iterator too. It all depends on your requirements. However, in our case, we will use an iterator.

class LazyFileInvoiceIterator implements Iterator
{
private int $pointer = 0;
private string $line;
private SplFileObject $file;

public function __construct(string $filePath)
{
$this->file = new SplFileObject($filePath);
}

public function current(): mixed
{
return json_decode($this->line);
}

public function next(): void
{
$this->line = $this->file->fgets();
$this->pointer++;
}

public function key(): int
{
return $this->pointer;
}

public function valid(): bool
{
return !empty($this->line) && $this->file->valid();
}

public function rewind(): void
{
$this->pointer = 0;
$this->file->seek(0);
$this->line = $this->file->fgets();
}
}

I used SplFileObject for a more elegant approach. In the rewind function, I set the pointer to 0, set start reading from line 0, and get the first line content. Maybe you are wondering why I use json_decode in the current function. It is because I read from a .ndjson file which is very useful for storing objects and also make decoding very easy as you can see.

Using LazyFileInvoiceIterator in foreach:

$fileIterator = new LazyFileInvoiceIterator($filePath);
foreach ($fileIterator as $invoice){
$documentStoringService->store($invoice);
}

What is the economy in terms of memory usage?

I run the script in PHP 8 with 100.000 elements (Invoice objects in the database). The memory usage with eager loading was 67.17 Mb. After the implementation of LazyDatabaseInvoiceIterator, the usage was 0,02 Mb. In this case, the difference is huge.

However, could be some downsides like speed or more intense processing but in some cases, memory saving is more important. Feel free to adapt it to your own uses.

Thank you

--

--