Element AI uses Kedro to apply research and develop enterprise AI models
Our conversation with Element AI is part of a global series to understand how Kedro is perceived around the world. For more on how Element AI is using AI to change enterprise operations, check out their website. And to find out about how the company is using AI for innovation and collaboration, check out this blogpost.
Element AI is an artificial intelligence solutions provider that uses cutting-edge research to help businesses produce accessible and operational solutions. It does so by deploying scalable, responsible, human-centred strategies, through services and software platforms.
The organisation also holds Ethical AI to very high standards, aiming to maximise social returns with initiatives such as the partnership with the Mozilla Foundation to explore data governance. They want to ensure that “AI products and solutions are deployed responsibly, with the appropriate tools and guidance necessary to support companies, governments and citizens”, says Marc-Etienne Ouimette, Head of Policy and Government Relations at the company.
The company deploys solutions which are either product-based, through services, or self-developed platforms. One of which is Element AI Toolkit’s Orkestrator, a workload scheduling tool that manages GPU clusters and storage resources to rebalance tasks dynamically.
Their need for a framework that facilitates work with real-world data, that is scalable and trustworthy is evident. We had a chat with Benjamin Potter, Eric Prescot-Gagnon and Philippe Grangier, all research scientists for optimisation and data science at the data department of Element AI headquarters in Montréal, Quebec, in Canada. We spoke about how they used Kedro on a challenging project to streamline their data engineering processes and optimise collaboration across teams. They told us how they have implemented Kedro on their workflow with Element AI Toolkit’s Orkestrator and how the adoption of an “opinionated tool” such as Kedro has helped improve their work efficiency.
Since the team heard about Kedro, their attention was drawn to features such as modularity and the ability to facilitate working on a team. Their project collected data from several dynamic data sources and needed a system that could handle those inputs and apply transformations to the data pipelines recurrently. Ever since the adoption, Kedro has been revolutionising the way they work, as they explain on the interview below.
Kedro is an open source Python framework that helps Data Scientists create reproducible, maintainable and modular data science code. Kedro is built upon a collective knowledge gathered by QuantumBlack, whose teams routinely deliver real-world machine learning applications as part of McKinsey. In this article, we won’t explain further what Kedro is, but you can read our introductory article to find out more, and find links to other articles and podcasts in our documentation.
The conversation has been edited for length and clarity.
Why did you decide to use Kedro?
We needed a framework that would help streamline dynamic data inputs from several sources and precisely adjust the time-handling of that data, using clock synchronisation. We would then apply data engineering transformations to those data pipelines, in order to optimise their outputs. Due to the project’s size, we realised that collaboration would be key for success. That was when we decided to use Kedro.
How did you implement Kedro on your workflow?
“At Element AI we use a number of internally built tools that facilitate running jobs in our own cluster. Most prominently, we used a tool called Orkestrator, within Element AI’s Toolkit solution, to manage running the compute intensive jobs on Element AI’s Data Center, including running the Kedro pipeline.
We worked on a simulation project to study and improve the Job Scheduling algorithm of the Orkestrator. Specifically, we used Kedro to process data coming from our scheduling software to measure historical performance and create replay scenarios. The pipelines would run inside a Docker container and its outputs were used as inputs to our simulation environment on an independent workflow.”
We mostly used Kedro for its core features, such as the core Kedro Pipeline and Node, along with the Data Catalog to organise our workflow.
“Having an organised data pipeline was key to allow the team to collaborate on the data processing component. Personally, I also found Kedro-Viz, the pipeline visualisation tool quite helpful. It allowed us to see how the individual nodes would come together in the project to compose the pipeline. An extremely useful feature when communicating with others to explain the work that had been done.”
What were your initial challenges with Kedro?
“Kedro is quite opinionated”, says Philippe. “The initial adoption of the tool was challenging due to the nature of the data inputs. However, we found that we could extend Kedro because of its modularity. Being an open source tool allowed us to have a look under the hood which helped us reconfigure the library to our specific needs.”
For Eric, “being open source made it easy for us to try and, although I did not look into adapting the framework, it was essential for our workflow. I really like it.”
“In Element AI we try to use as much open source software as possible due to minimal ‘hassle’ for support and it also allows us to contribute to projects.”
The team found some issues along the way, and reported them on GitHub. They also told us that “because the [Kedro] team was so helpful and there was an open conversation, it was very encouraging for us to continue to using Kedro.”
What is the advantage of adopting Kedro?
“Once we fully adopted Kedro, we realised that collaboration was actually the key-feature we had needed since forever!”
Benjamin told us that using Kedro made life “better than before, when we were using notebooks for everything. The library made it possible for three people to work on the data engineering pipelines. From my past experience, only one person would end up being really responsible for the entire project since it is hard to collaborate using notebooks. For us, extending the team’s ability to work effectively is the real game-changer of using Kedro.”
On which Philippe reiterates, “by forcing us to put every data engineering and data processing step into code and then pushing it to git, it allowed us to collaborate much easier than the norm of sharing a Jupyter notebook. By doing so, creating data pipelines became all about having all-pure modular Python node-functions that we could all collaborate on”. What a joy!
For Eric, Kedro-Viz was “one the great things about Kedro. It was really easy to see what was going on and the way it all works. It allowed me to turn certain parameters on and off on specific nodes and have a cleaner view of the project. That was extremely helpful.”
“Extending the ability to collaborate is really the game-changer of using Kedro.”
Eric told us that him and Benjamin are starting on another project using Kedro. “Our choice of framework is due to the data ingestion that was quite inconsistent on our end, which makes it hard to follow what is happening on the system. We are now reimplementing the entire system and we think Kedro is a good tool for that.” Benjamin complements sharing that this new project involves integrating Kedro on another data processing task that will run on a Cloud environment, to which Eric “was happily surprised to see that Kedro has built-in for multiple cloud environments. We didn’t have to do anything special to implement that, it was already there.”
Share your use-case
Now it’s your turn! Have you used Kedro on an exciting project? Do you have a use-case that you would like to share? Or, do you have feedback on Kedro? Make sure to let us know, and we will contact you for more details.
Want to contribute to Kedro? Check out our Community Contribution Guide and make your Pull Request.
McKinsey and Element AI have entered into a strategic alliance to help clients solve complex AI challenges. As a part of this alliance, McKinsey is a minority investor in Element AI.