Symfony and iRODS Optimization: Python, Asynchronous Handling, and Redis Integration

Mathia Pagani
4 min readDec 4, 2023

Navigating iRODS Collections in Symfony

Recently, I had to find a way for a Symfony data access platform to provide easy access for users to their iRODS collections.

For those unfamiliar, iRODS provides a virtualized layer over various storage systems, enabling users to access and manage distributed data resources as if they were part of a single system.

It is extensively utilized in scientific and research fields for efficient management and collaboration on large-scale, distributed datasets.

Navigating iRODS Challenges

Now, this tool, brilliant as it is, has a unique trait — it’s not your everyday easygoing companion, especially when it comes to seamlessly integrating with a web app and you have a penchant for optimization.

The challenge was embedded in a kaleidoscope of elements that needed to be well-intersected. Moreover, it was not made any easier by the fact that iRODS discontinued the PHP integration API (irods-php) in 2018 and the platform needed to be implemented in PHP and Symfony, ensuring ongoing maintainability in an environment primarily comprised of Symfony developers.

Additionally, upgrading the data center’s iRODS version to the latest C++ HTTP API, allowing OpenID token exchange, couldn’t be executed immediately. A temporary solution had to be devised.

While I’m not certain if I found the optimal solution, delving into the problem yielded insights worth sharing. I believe my experience could offer some tips for those facing a similar scenario and seeking short-to-medium-term solutions, a version 2 will come.

Python-Powered Wisdom

My initial decision proved wise: Python.

The Symfony Process component, that executes a command in a sub-process, facilitated the seamless execution of Python scripts, and Python scripts easily interacted with the iRODS server through the python-irodsclient API.

I managed pretty easily the back-and-forth between the Symfony controller and the Python script, which returned the process output to the controller, that happily was passing it to the Twig template to display.

However, three interconnected issues surfaced. For heavyweight data collections, the fetching time exceeded the timeout set in the Nginx configuration, resulting in an occasional 502 error. I first increased the proxy_read_timeout, providing more time to address the other two problems. For scaling, it became imperative to handle the fetching process asynchronously, and for a better user experience, a caching solution was mandatory.

Let’s go asynchronous.

The second crucial player in this development scenario is Messanger, a Symfony component providing a message bus for sending and handling messages asynchronously. Based on a system of envelopes that encapsulate instances of messages, these are dispatched from sender to receiver through transports (e.g., queues).

Symfony’s documentation is always clear (here). You create a Message class, pass its instance as an argument of the __invoke function in a brand-new `MessageHandler` class (that is called this way each time the Message class is instanciated), and instantiate it in your controller like this:

$irodsConnectionMessage = new IrodsConnectionMessage($userName, $userPassword);

then dispatch it:

$messageBus->dispatch($irodsConnectionMessage);

after that we just need to retrieve the ball:

$rootContent = $irodsConnectionMessage->getOutput();
$rootArray = json_decode($rootContent, true);

Ensure proper configuration of the Message and MessageHandler classes in config/services.yaml like this (autowire set to false to avoid attempting instantiation of the class when parameters have not been defined yet):

App\Message\IrodsConnectionMessage:
autowire: false

App\MessageHandler\IrodsConnectionHandler:
autowire: false
tags: [messenger.message_handler]

Caching for Speed: Redis to the Rescue

Yet, a third problem loomed. Navigating between collections and displaying their content, composed of files and subcollections, meant waiting more than one, even two seconds at each click. Implementing caching was imperative, and Redis came to the rescue.

For scaling reasons, I created a separate Redis container in my Docker Compose (and added the Redis service in the GitLab CI setup) running on the standard port 6379. I then added this in Symfony config/packages/framework.yaml, to use Redis as the cache provider:

framework:
session:
save_path: "redis://redis:6379"

After installing the Redis bundle in the Symfony container, in the controller dispatching the Message instance, I injected the CacheItemPoolInterface into the constructor of the class and used it as $cacheItemPool along with a different cache key for each collection path to store and retrieve the data associated with any path in the cache, like this:

$newPath = $request->get('new_path');
$cacheKey = 'irods_connection_' . $request->getClientIp() . '_collection_content_' . md5($newPath);
$cachedItem = $this->cacheItemPool->getItem($cacheKey);

// Check if the data is already in cache
if ($cachedItem->isHit()) {
// If yes, retrieve and render the cached data
return $this->render('storage/dashboard.html.twig', [
'collection_content' => json_decode($cachedItem->get(), true),
]);
}
// If not in cache, dispatch a message to retrieve the data as seen above
// Retrieve the data from the dispatched message

// Cache the data for future use
$cachedItem->set(json_encode($collectionContentArray));
$cachedItem->expiresAfter(3600);
$this->cacheItemPool->save($cachedItem);

This approach significantly reduced the collection data display waiting time close to zero.

Closing the Curtain on v1: A First Step in the Journey

In conclusion, navigating the complexities of optimizing Symfony, iRODS integration, and enhancing user experience brought me forth valuable insights and efficient solutions. While Version 1 is now behind us, the journey continues as we work on Version 2.

--

--

Mathia Pagani

Software Developer. #CyberSecurity #AI #BigData #WebDev