IBM advancing Data and AI at Open Source Summit North America

ted chang
IBM Data Science in Practice
6 min readSep 22, 2021

The Open Source Summit will be held on September 27–30, 2021 in Seattle Washington. This is the largest Linux Foundation event in the North America where the open source developers gather and present their ideas. We are very pleased to announce that several IBM CODAIT team members will be presenting during this event. CODAIT, Center for Open source Data and AI Technologies, is a group of data scientists and open source developers within IBM, with a mission is to make open source AI models dramatically easier to create, deploy, and manage in the enterprise.

Todd Moore, IBM Vice president of Open Technology and CODAIT, will be speaking at September 28th Keynote about “Advance Data, AI and Quantum Using Open Source” and discuss how we are working with LFAI and Data and leveraging Open Source to host, manage, share, reuse Data and AI artifacts across organizational boundaries. Animesh Singh, Distinguished Engineer and CTO Watson Open Technology, will be presenting on various topics, including, AI Pipelines Workflows, Defending Against Adversarial Model Attacks, and AI Marketplace with Execution Engine.

Below is a list for sessions, presenters and abstracts from the IBM CODAIT.

  • Keynote: Advance Data, AI and Quantum Using Open Source — Todd Moore, Vice President — Open Technology and Developer Advocacy, CTO DEG, IBM Datasets, Models and Pipelines have become the three most critical pillars of the AI lifecycle. In this session, we discuss how we are working with LFAI and Data and leveraging Open Source to host, manage, share, reuse Data and AI artifacts across organizational boundaries. We discuss how we are giving developers access to advanced code and capabilities to streamline the development process, using Project CodeNet which teaches AI to code, reducing development and production cycles. Additionally, to address complex computational queries, we detail how Qiskit provides developers access to quantum computational resources, running complex analytical models that answer problems that not even super computers could resolve.
  • AI Pipelines Workflows and ML lineage using Tekton Pipelines — Tommy Li & Animesh Singh, IBM The Tekton Pipelines project provides Kubernetes-style resources for declaring CI/CD-style pipelines. However, it’s not very user-friendly for data scientists since it’s purely defined with Kubernetes custom resources and missing some data-driven features for running AI workflows. Furthermore, Tekton lacking many pipeline features that could be useful in ML use cases, such as looping or recursively running over a subset of the pipelines. Therefore, the Kubeflow pipelines with Tekton (KFP-Tekton) project extend the Tekton pipeline capabilities by building a custom task controller for Tekton. Tekton custom tasks allow any project to develop new and specialized pipeline features. KFP-Tekton introduces a few new concepts such as any sequencer, pipeline loops, and recursion for data scientists who need to run ML workflows on top of a managed Tekton service. In addition, KFP-Tekton brings all the KFP features such as lineage tracking based on Tekton to provide a much better user experience.
  • Integrating Feast Online Feature Store with KFServing — Ted Chang & Chin Huang, IBM Having access to a consistent set of dataset features during different phases of the ML lifecycle is becoming critical. Companies that build and deploy machine learning models may need to manage hundreds of features, and they may even require using the latest features for real time prediction. Feast (Feature Store) attempts to tackle these problems by providing a standard interface and store for retrieving features needed for distributed model training and serving. In this talk, attendees will learn how to build a production ready feature store on Kubernetes by using Feast which will be used to serve the data to the model. Additionally, attendees will see how Feast can be used with KFServing, a serverless model inferencing engine, to retrieve stored features in real time. In this talk, we hope to share how users can get started with using Feast on Kubernetes for their ML needs. Here, we set up an end-to-end demo using Feast and KFserving transformer on Kubernetes to demonstrate how online features can be served to the KFserving for real time inferencing.
  • Defending against adversarial model attacks using Kubeflow -Animesh Sigh & Andrew Butler, IBM The application of AI algorithms in domains such as self-driving cars, facial recognition, and hiring holds great promise. At the same time, it raises legitimate concerns about AI algorithms robustness against adversarial attacks. Widespread adoption of AI algorithms where the predictions are hidden or obscured from the trained eye of the subject expert, opportunities for a malicious actor to take advantage of the AI algorithms grow considerably, necessitating the addition of adversarial robustness training and checking. To protect against and mitigate the damages caused by these malicious actors, this talk will examine how to build a pipeline that’s robust against adversarial attacks by leveraging Kubeflow Pipelines and integration with LFAI Adversarial Robustness Toolbox (ART). Additionally we will show how to test a machine learning model’s adversarial robustness in production on Kubeflow Serving, by virtue of Payload logging (KNative eventing) and ART. This presentation focuses on adversarial robustness instead of fairness and bias.
  • End-to-end AI Marketplace with Execution Engine on Kubernetes — Christian Kadner & Animesh Singh, IBM The Machine Learning (ML) life cycle consumes and produces many artifacts like data sets, models and notebooks. Kubeflow Pipelines are becoming the de facto toolkit to orchestrate ML workflows on Kubernetes, using the construct of “Components” to perform tasks like like data preprocessing, data transformation, model training and model serving. Additionally, data scientists working on Kubernetes need specialized pipeline components for creating secrets, persistent volume claims, config maps etc. These kinds of components are frequently re-implemented.
  • Infusing Trusted AI using Machine Learning Payload Logging on Kubernetes — Tommy Li & Andrew Butler, IBM As more machine learning models are developed and served on Kubernetes, it’s becoming harder to track the incoming data and payloads by just reading the logs. For trusting model predictions, drift, anomaly, adversarial and bias detection need to be built in the platform. Data scientists have difficulty figuring out model behavior on Kubernetes since it is hard to access and process model payloads on Kubernetes. Therefore, it’s important to record and persist the model input and output payloads with the proper schema. These payloads can be used with other tools to explain, analyze, and generate machine learning metrics such as fairness and drift detection for models running in production. This can help AI operators and data scientists visualize and find any potential issues with the model. This talk covers how to use KFServing, Kafka Connect, and AIF360 to serve ML models, persist payloads, and measure the model fairness in a production environment.

To learn how IBM CODAIT is advancing Data and AI in the open source projects, please sign up for the event here. Click the hyperlinks links above to see the detailed of each talks. Hopping to see or talk to you onsite or virtual during the event!

--

--