AIX360 meets CodeFlare: How to scale explainability using CodeFlare pipelines

Raghu K Ganti
CodeFlare
Published in
2 min readSep 29, 2021

Authors: Carlos Costa, Amit Dhurandhar, Raghu Ganti, Mudhakar Srivatsa, Kush R. Varshney

AIX360 meets CodeFlare (credit: @bfigas Unsplash, Jun 2019)

Explainability of AI models helps users gain insight into a machine learning model’s decision-making process. Such understanding provides key business insights and is essential to fostering trust and confidence in AI systems. AI Explainability 360 (AIX360) is a comprehensive open-source toolkit of state-of-the-art algorithms that support interpretability and explainability of machine learning models donated by IBM to the Linux Foundation AI & Data. As explainability adoption increases and amount of data on which these explanations need to be provided increases, the explanation algorithms face scaling challenges.

CodeFlare is IBM’s open-source toolkit that targets the reduction of time to setup, run, and scale machine-learning tasks. It is built on top of Ray, an emerging open-source distributed computing framework for machine learning applications. CodeFlare extends the capabilities of Ray by adding specific elements to make scaling workflows easier.

With explainability, the bottleneck step typically is the ability to scale out multiple instance explanations, which is an embarrassingly parallel workload and the lowest hanging fruit for us to target.

Low hanging fruit (credit: @yirage Unsplash, July 6, 2018)

CodeFlare pipelines (v0.1.2) released a new feature enabling the “splitting” of a numpy array or pandas dataframe into multiple objects (very similar to what Dask on Ray provides for partitioning). This enables the scaling of compute intensive workloads to be spread across multiple processes. With this feature and the ability to wrap the local explanation into a simple “Estimator” or OR node of CF pipelines, we can achieve this scaling.

Below, we demonstrate the code snippet that wraps the LocalBBExplainer (from AIX360) into an Estimator.

The data is split 4-way as shown below.

With the input being 4 objects to a simple one-node pipeline, we are able to scale it up to 4 ways using CodeFlare pipelines. We observe a 4x linear speedup when using this approach on the standard example from AIX360. The ability to scale AIX360 explanations with a simple wrapper and using the splitter makes it easy for the data scientist to provide explanations for large tabular data and scale it to clusters with simple code changes.

The notebook in its full is available here. The reader is invited to try out CodeFlare and AIX360. Happy coding and scaling!

--

--

Raghu K Ganti
CodeFlare

Researcher at IBM’s T J Watson Research Center. A geospatial enthusiast and machine learning practitioner.