Welcome Apache Liminal (Incubating): Getting ML apps to production made easy

Published in

NI Tech Blog

4 min readApr 8, 2021

The challenges involved in operationalizing machine learning models are one of the main reasons why many machine learning projects never make it to production. The process involves automating and orchestrating multiple steps which run on heterogeneous infrastructure — different compute environments, data processing platforms, ML frameworks, notebooks, containers and monitoring tools.

We’ve recently announced the Apache Liminal project incubating under the Apache incubator. Apache Liminal aims to solve the problem of orchestrating and monitoring data and ML systems.

In this post we’re going to introduce the basics of Apache Liminal.

Liminal Basics

Liminal abstracts away from the user various layers of infrastructure such as containers (e.g. docker), runtime (e.g. kubernetes), deployment, monitoring and so forth.

Liminal creates for the user docker images from their code as well as scheduled / constantly running apps which run these images.

Liminal can be easily integrated with an organization’s CI pipelines to allow users to deploy their applications with ease.

One of the guiding principles in Liminal’s design is to intrude as little as possible on data engineers and scientists’ code base. In accordance with this principle, all that is asked of a Liminal user is to add a liminal.yml file to their code repository.

This reduces the amount of different technologies and languages users need to familiarize themselves in order to orchestrate an end-to-end data system.

Defining Liminal Systems

liminal.yml file is composed of a few main sections which help you define your liminal system:

---
name: MyDataScienceApp
owner: Bosco Albert Baracus

First we simply specify the name and owner of the liminal system we are creating.

Services

services:
  - service:
    name: my_datascience_server
    type: python_server
    description: my ds server
    image: myorg/mydatascienceapp
    source: .
    endpoints:
      - endpoint: /predict
        module: serving
        function: predict

In the services section we define the applications of our liminal system that are constantly running. For example, a service of type python_server allows users to serve functions from their code as a constantly running HTTP server responding to incoming requests.

A liminal user need not worry about developing an HTTP based service and how they would build and deploy it, rather, using liminal’s python_server service type would handle it for them.

When the user supplies an image name as well as a source location (relative path in their project from liminal.yml location), liminal builds their source code as a docker image with a functioning server which serves user code. All the user has to do is define the endpoints they want to expose and which functions in their codes should be invoked.

Pipelines

pipelines:
  - pipeline: my_datascience_pipeline
    start_date: 1970–01–01
    timeout_minutes: 45
    schedule: 0 * 1 * *
    metrics:
      namespace: DataScience
    tasks:
      - task: train
        type: python
        description: train model
        image: myorg/mydatascienceapp
        cmd: python -u training.py train
      - task: validate
        type: python
        description: validate model and deploy
        image: myorg/mydatascienceapp
        cmd: python -u training.py validate

In the pipelines section we define pipelines consisting of a set of tasks that are run sequentially and is triggered on a schedule.

Pipelines are useful for purposes such as data ETLs, data science model training and many others.

The list of tasks which comprise a pipeline can enable users to run different types of tasks in their pipeline. For example, using liminal’s python task type a user can easily run parts of their code as a task within a scheduled pipeline of tasks.

A liminal user would not need to know how to develop pipelines but rather receives this functionality from liminal and only needs to define the tasks they wish to run. Similarly to services, a user does not need to create images by themselves but liminal will create it for them. In the example file above we re-use the same image created for the service as both use the same python codebase that belongs to the user.

Running your Liminal system

In order to run liminal system locally we can utilize the liminal cli commands:

liminal build — builds images from user code

liminal deploy — deploys liminal.yml files to liminal server

liminal start — starts liminal server, the current available implementation uses Apache Airflow as a server to run pipelines (as Airflow DAGs)

Below is what the liminal system we create in the above liminal.yml looks like when running:

Running DAG corresponding to user’s liminal.yml on Apache Airflow:

Request to user’s python service:

Python service log:

Try it out yourself

Try it out yourself with Liminal getting started example:
https://github.com/apache/incubator-liminal/blob/master/docs/getting_started.md

What’s next for Liminal

CI Integrations — Simple integrations of liminal with existing organization’s stacks

User Interface — User interface to define liminal systems via wizards instead of yml files

ML Integrations — (Kubeflow, MLflow, Feature store, experiment tracking, model store and more)

Join the effort

Liminal is working under Apache incubator and according to the Apache way. As such, we are very open to contributions and want to grow a robust community to achieve greatness together.

Please join us on our mailing lists at http://liminal.apache.org/

Contribute code and documentation on GitHub at https://github.com/apache/incubator-liminal

Report/Implement issues and features on Apache JIRA