Symfony and iRODS Optimization: Python, Asynchronous Handling, and Redis Integration
Navigating iRODS Collections in Symfony
Recently, I had to find a way for a Symfony data access platform to provide easy access for users to their iRODS collections.
For those unfamiliar, iRODS provides a virtualized layer over various storage systems, enabling users to access and manage distributed data resources as if they were part of a single system.
It is extensively utilized in scientific and research fields for efficient management and collaboration on large-scale, distributed datasets.
Navigating iRODS Challenges
Now, this tool, brilliant as it is, has a unique trait — it’s not your everyday easygoing companion, especially when it comes to seamlessly integrating with a web app and you have a penchant for optimization.
The challenge was embedded in a kaleidoscope of elements that needed to be well-intersected. Moreover, it was not made any easier by the fact that iRODS discontinued the PHP integration API (irods-php) in 2018 and the platform needed to be implemented in PHP and Symfony, ensuring ongoing maintainability in an environment primarily comprised of Symfony developers.
Additionally, upgrading the data center’s iRODS version to the latest C++ HTTP API, allowing OpenID token exchange, couldn’t be executed immediately. A temporary solution had to be devised.
While I’m not certain if I found the optimal solution, delving into the problem yielded insights worth sharing. I believe my experience could offer some tips for those facing a similar scenario and seeking short-to-medium-term solutions, a version 2 will come.
Python-Powered Wisdom
My initial decision proved wise: Python.
The Symfony Process
component, that executes a command in a sub-process, facilitated the seamless execution of Python scripts, and Python scripts easily interacted with the iRODS server through the python-irodsclient
API.
I managed pretty easily the back-and-forth between the Symfony controller and the Python script, which returned the process output to the controller, that happily was passing it to the Twig template to display.
However, three interconnected issues surfaced. For heavyweight data collections, the fetching time exceeded the timeout set in the Nginx configuration, resulting in an occasional 502 error. I first increased the proxy_read_timeout
, providing more time to address the other two problems. For scaling, it became imperative to handle the fetching process asynchronously, and for a better user experience, a caching solution was mandatory.
Let’s go asynchronous.
The second crucial player in this development scenario is Messanger
, a Symfony component providing a message bus for sending and handling messages asynchronously. Based on a system of envelopes that encapsulate instances of messages, these are dispatched from sender to receiver through transports (e.g., queues).
Symfony’s documentation is always clear (here). You create a Message
class, pass its instance as an argument of the __invoke
function in a brand-new `MessageHandler` class (that is called this way each time the Message
class is instanciated), and instantiate it in your controller like this:
$irodsConnectionMessage = new IrodsConnectionMessage($userName, $userPassword);
then dispatch it:
$messageBus->dispatch($irodsConnectionMessage);
after that we just need to retrieve the ball:
$rootContent = $irodsConnectionMessage->getOutput();
$rootArray = json_decode($rootContent, true);
Ensure proper configuration of the Message and MessageHandler classes in config/services.yaml
like this (autowire set to false to avoid attempting instantiation of the class when parameters have not been defined yet):
App\Message\IrodsConnectionMessage:
autowire: false
App\MessageHandler\IrodsConnectionHandler:
autowire: false
tags: [messenger.message_handler]
Caching for Speed: Redis to the Rescue
Yet, a third problem loomed. Navigating between collections and displaying their content, composed of files and subcollections, meant waiting more than one, even two seconds at each click. Implementing caching was imperative, and Redis came to the rescue.
For scaling reasons, I created a separate Redis container in my Docker Compose (and added the Redis service in the GitLab CI setup) running on the standard port 6379. I then added this in Symfony config/packages/framework.yaml
, to use Redis as the cache provider:
framework:
session:
save_path: "redis://redis:6379"
After installing the Redis bundle in the Symfony container, in the controller dispatching the Message instance, I injected the CacheItemPoolInterface
into the constructor of the class and used it as $cacheItemPool
along with a different cache key for each collection path to store and retrieve the data associated with any path in the cache, like this:
$newPath = $request->get('new_path');
$cacheKey = 'irods_connection_' . $request->getClientIp() . '_collection_content_' . md5($newPath);
$cachedItem = $this->cacheItemPool->getItem($cacheKey);
// Check if the data is already in cache
if ($cachedItem->isHit()) {
// If yes, retrieve and render the cached data
return $this->render('storage/dashboard.html.twig', [
'collection_content' => json_decode($cachedItem->get(), true),
]);
}
// If not in cache, dispatch a message to retrieve the data as seen above
// Retrieve the data from the dispatched message
// Cache the data for future use
$cachedItem->set(json_encode($collectionContentArray));
$cachedItem->expiresAfter(3600);
$this->cacheItemPool->save($cachedItem);
This approach significantly reduced the collection data display waiting time close to zero.
Closing the Curtain on v1: A First Step in the Journey
In conclusion, navigating the complexities of optimizing Symfony, iRODS integration, and enhancing user experience brought me forth valuable insights and efficient solutions. While Version 1 is now behind us, the journey continues as we work on Version 2.