Integration of SAP FedML Library with IBM watsonx

Natively ingest data from SAP Datasphere using IBM watsonx.ai and directly deploy models to SAP AI Core

Published in

Towards Generative AI

6 min readApr 22, 2024

An abstraction of data connection & load, model training & deployment, and online inferencing for machine learning workflows, to provide platform agnostic end-to-end integration support with just a few lines of code.

**Architecture Diagram**: SAP BTP and IBM watsonx integration with FedML

What is SAP FedML?

SAP Federated ML Python libraries, also known as FedML, is a library designed to allow businesses and data scientists to create, train, and implement machine learning models directly on hyperscalers. It eliminates the necessity to duplicate or transfer data from its original source.

This blog post outlines the steps on how to access data from SAP Datasphere, prepare data, train a model, and deploy it to SAP AI Core, by leveraging FedML Python library in a notebook running on IBM® watsonx.ai™, which provides fully-integrated industry standard tools such as R Studio and Python notebooks for Data Scientists to work in a collaborative environment.

Prerequisites

A Jupyter notebook environment, such as IBM watsonx.ai
Access to an SAP Datasphere instance
Access to an SAP AI Core instance
A Git repository
A container registry

Example use case

The following are the key steps for the end-to-end flow of a predictive analytics use case, in which a Random Forest Classification model is implemented for predicting employee promotions:

Load and prepare data stored in SAP Datasphere
Train the model in watsonx.ai
Serve the model in SAP AI Core
Run inference on the deployed model

Load data from SAP Datasphere

SAP Datasphere provides a business data fabric infrastructure to deliver a semantically rich data layer to ensures data is meaningful and accessible to all users within an organization, providing business context and logic intact across the entire data spectrum. It offers a unified experience for data integration, data cataloging, semantic modeling, and data warehousing. The key features of SAP Datasphere include data federation and virtualization, allowing for seamless access to data across various environments without duplication.

Install the FedML library

The FedML DSP library supports the deployment of machine learning models to SAP AI Core. It can be installed using pip as follows:

try:
    import fedml_dsp
    from fedml_dsp import DbConnection
except:
    %pip install -U fedml-dsp
    import fedml_dsp
    from fedml_dsp import DbConnection

Connect to SAP Datasphere

To create a connection to SAP Datasphere, you need to provide the database user credential and set some parameters.

fedml_config = {
    "address":  <The IP address or host name of the database instance. Required. String>,
    "port": <The port number of the database instance. Required>,
    "user": <The database user. Required>,
    "password": <The database user's password. Required>,
    "schema": <The SAP Datasphere cloud Space Schema. Optional>,
    "encrypt": <"true" . Denotes an encrypted connection>,
    "sslValidateCertificate": <"false" . Specifies whether to validate the server's certificate>,
    "disableCloudRedirect": < "true". Specifies if there should be a tenant redirection for a cloud instance,
    "communicationTimeout": <"0". Value of 0 Disables any communication Timeouts>,
    "autocommit": <"true". Sets auto commit to true for the database connection>,
    "sslUseDefaultTrustStore": <"true". Denotes the use of client's default trust store>
}

Instantiate the DbConnection class for your connection to SAP Datasphere:

db = DbConnection(dict_obj=fedml_config)

To verify that the connection has been successfully established, you can try to get a view by its name:

db.get_view_by_name("epp_train_view")

With just a couple of lines of code and minimal effort, you can now retrieve data from the designated view, import it into a pandas DataFrame, and seamlessly integrate data from various SAP and non-SAP data sources.

data = db.get_data_with_headers("epp_train_view")
employee_data = pd.DataFrame(data[0], columns=data[1])

Train the model in watsonx.ai

Once the data is loaded into the DataFrame, you can begin to process the data, perform feature engineering, and conduct model training as you’d normally do.

IBM watsonx.ai supports popular tools like PyTorch, TensorFlow, and scikit-learn, as well as IBM’s tools. The following code snippet shows the initialization, training, and testing of a Random Forest Classifier model with scikit-learn.

from sklearn.ensemble import RandomForestClassifier

# Initialize a Random Forest Classifier with balanced class weights
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=100)

# Training the model
rf_classifier.fit(X_train, y_train)

# Testing data prediction
y_pred = rf_classifier.predict(X_test)

# Traing data prediction
yt_pred = rf_classifier.predict(X_train)

A Data Scientist can now evaluate the trained model using the test data set to score its accuracy, as well as precision, recall, F1, and other metrics. If the evaluation results are good, the model is saved to a pickle file and uploaded to an IBM Cloud Object Storage (COS) bucket, or other suitable storage, for the next step, in which you build a runtime container image for serving the model on Docker, SAP AI Core, and other cloud container environments.

Deploy the model to SAP AI Core

Prior to deploying a machine learning model to SAP AI Core, it’s essential to establish the connection to the AI Core instance, and you need to create a Service Key and save it as a JSON file, by following the steps (here).

from fedml_dsp import Fedml

try:
    fedml = Fedml(aic_service_key='aicore_service_key.json')
except:
    raise Exception("FedML constructor failed.")

Then, you would onboard any AI Core resources required, such as the Git repository, the AI Core resource group, and a Docker registry, after providing the credentials needed to access them. Please check the documentation for more details.

try:
    fedml.onboard_ai_core(
        create_resource_group=False,
        resource_group='default',
        onboard_new_repo=True,
        github_info_path="github_info.json",
        secret_path="docker_registry_secret.json"
    )
except:
    raise Exception("AI Core resource onboarding failed.")

Next, you can register a new application in AI Core. You can skip this step, if you want to use an existing application for your deployment.

APPLICATION_NAME = "fedml-demo-app"

try:
    application_details = {
        "application_name": APPLICATION_NAME, 
        "revision": "HEAD", 
        "repository_url": "<url>", 
        "path": "<path/to/serving_executable.yaml>"
    }
    fedml.register_application(application_details=application_details)
except:
    raise Exception("Application registration failed.")

Now, you need to create a YAML file serving_executable.yaml in your Git repository at the URL and path specified above. For example:

apiVersion: ai.sap.com/v1alpha1
kind: ServingTemplate
metadata:
  name: fedml-demo-serving-exectuable
  annotations:
    scenarios.ai.sap.com/description: "fedml demo"
    scenarios.ai.sap.com/name: "fedml-demo"
    executables.ai.sap.com/description: "fedml demo serving executable"
    executables.ai.sap.com/name: "fedml-demo-serving-exectuable"
  labels:
    scenarios.ai.sap.com/id: "fedml-demo"
    ai.sap.com/version: "1.0.0"
spec:
  inputs:
    parameters:
      - name: greetmessage # placeholder name
        type: string
  template:
    apiVersion: "serving.kserve.io/v1beta1"
    metadata:
      labels: |
        ai.sap.com/resourcePlan: starter
    spec: |
      predictor:
        imagePullSecrets:
          - name: dockerhub-kevinxhuang-secret
        containers:
        - name: kserve-container
          image: "kevinxhuang/demo:fedml"
          ports:
            - containerPort: 7000 # customizable port
              protocol: TCP
          command: ["/bin/sh", "-c"]
          args:
            - >
              set -e && echo "Starting" && gunicorn --chdir /app/src auto:app -b 0.0.0.0:7000
          env:
            - name: greetingmessage # different name to avoid confusion
              value: "{{inputs.parameters.greetmessage}}"

Assuming you’ve built a model serving runtime container image and pushed it to the Docker registry, you’re now ready to start the deployment.

try:
    deployment_config = {
        "name": APPLICATION_NAME, 
        "resource_group": "default", 
        "scenario_id": "fedml-demo", 
        "executable_id": "fedml-demo-serving-exectuable"
    }
    time.sleep(60)
    deployment_url = fedml.ai_core_deploy(deployment_config=deployment_config)
except:
    raise Exception("Deployment to AI Core failed")

Run inference on the deployed model

Finally, after the deployment is successfully completed, you can send secured API requests to the exposed endpoint(s) and make inference calls.

base_url = aicore_service_key["serviceurls"]["AI_API_URL"] + "/v2/inference/deployments/" + deployment_url[-31:-15]

try:
    endpoint = f"{base_url}/v2/predict" # endpoint implemented in serving runtime
    headers = {"Authorization": fedml.get_ai_core_token(),
               "AI-Resource-Group": "default",
               "Content-Type": "application/json"}
    response = fedml.ai_core_inference(
        endpoint=endpoint,
        headers=headers,
        body=json.dumps(sample_input)
    )
    print(response)
    print(response.json())
except:
    raise Exception("Model inferencing failed.")

Summary

In this blog, you’ve learned how to use SAP FedML to access data on SAP Datasphere, serve a machine learning model on SAP AI Core, and run inference on the model, with few lines of code. It serves as a valuable tool for Data Scientists, Data Engineers, and AI Developers alike, by simplifying and accelerating the process of building machine learning pipelines. If you’d like to experience the end-to-end flow in your own environment, the complete notebook and associated artifacts can be downloaded from the GitHub repository.