Photo by Oliver Plattner on Unsplash

The Essentials of Custom Coding in Stream Flows

Boris Melamed
Dec 20, 2018 · 4 min read

Introduction

Data professionals can use stream flows to quickly put together realtime streaming applications without having to write flow topology code. Users design flows by using a web-based drag-and-drop UI, and by specifying parameter values. As for the business logic of data processing, some of it can also be implemented by using specialized operators, with no coding. We are continuously adding more out-of-the-box coverage, but there might always be cases where custom coding is required.

This article clarifies when to use which type of custom code operator, and recommends best practices. For more details about stream flows, see streams flow documentation.

(Note: Engineers who want to code streams on a more direct level can look at Streams Studio, Lightweight Streams IDEs, and Topology SDKs in Java or Python.)

So when can you rely on specialized streams flow operators like Filter, and when is it best to write your own code, and how?

Specialized operators or custom code?

When constructing Streams Flows, you can drag and drop various operators from the palette onto the canvas.

As a rule: If there is a specialized operator that fits the bill, use it!

Some specialized operators

Simple flow example that filters data using the specialized Filter operator

But what if there is no specialized operator for what you are building?

Cases for custom code

  • Connect to sources and targets for which there is no specialized operator in the palette.
    Examples: MongoDB, Cassandra.
  • Specific business logic data transformations and calculations, beyond what is covered by Filter and the available Aggregation functions.
  • Parse, convert, or format data according to a format that is not supported in specialized operators.
    Examples:
    — Parse dates and times that arrive as strings in non-ISO-8601 format.
    — Parse data that’s arriving in Avro format.
  • Pre- and post-process data for model scoring.
  • Advanced data extraction using regular expressions and other types of custom parsing.
  • Easily generate sample data, such as for incremental development or demos.

Cloud functions or (Python) code operators?

These are the operators for inserting custom code:

So which one is best to use? It depends. This table points out the differences to help you pick the right approach for each use case.

Comparison table

Develop Code operators effectively

For enhanced productivity, it can help to combine built-in coding support with external tools.

Built-in support

The built-in coding support includes:

  • Python syntax highlighting and validation, as you type.
  • Logging user messages and raising user errors. At runtime, they all conveniently appear as notifications in the streams flow UI, on the Metrics page. You can also download the user log which contains these user messages and errors.
  • As with all streams flow operators, at runtime, you can view sampled data and throughput rate at the inputs and outputs of code operators.

Utilizing external tools

For support that complements built-in coding assistance:

  1. Use an external tool for authoring your code, such as:
    a. your favorite editor or IDE, with support for:
    — auto-complete, refactor, auto-layout, …
    — test/debug your business logic
    — version control integration
    b. or a notebook, where you can quickly run and test your code.

2. Use the developed code and:
— copy it into the code operator editor, arranging it appropriately inside the applicable callback functions (`init`, `process`, `produce`) ,
— or turn it into an external code package, to use from within your code operator.
Here is how to install Python packages and use them in Streams Flows code operators.

Important: In all these cases, keep in mind that in stream flows, this code will be run in a Python 3.5 interpreter.

Known issues

When writing custom code, take into account limitations of streams flow development:

  • The debug facilities and runtime customization are limited.
  • There is no built-in version control.

For coding streams on a more direct level with full development support, have a look at Streams Studio, Lightweight Streams IDEs, and Topology SDKs in Java or Python.

Conclusion

This article can help data professionals quickly build realtime streaming applications by designing stream flows, with no or minimal custom coding.

We are giving the following recommendations regarding custom code:

  • Where available, use specialized operators rather than writing custom code.
  • There are typical use cases that require writing custom code.
  • Based on the provided comparison table, pick between a Code operator and a Cloud Function operator, if you need to write custom code.
  • Consider productivity-enhancing practices when authoring code in Code operators.

Let us know about functional gaps that you would like to be covered by specialized operators.

IBM Watson

AI Platform for the Enterprise

Boris Melamed

Written by

Software Engineer on the IBM Watson Studio Stream Flows team. Hobbies include insightful standup comedy (pictured).

IBM Watson

AI Platform for the Enterprise