DATAops. Building streaming pipelines for production
This the second blog in a series discussing the challenges faced by many organizations in the quest to become data-driven, the first addressed the “Dev” part of DevOps with Lenses SQL.
A recent commission by Stripe recently showed
68 % of organisations believe access to developer talent is a threat to the success of their business
This means that as you transition to be a data-driven organization you have a high risk of failure by simply not being able to hire and retain developers. Once you do find them they also need to get up to speed with the data and your business domain. With Lenses you don’t need an army of rockstar developers to build and visualize data in streaming platforms. DataOps enables the people who know your data: the business user, data engineers, and data stewards to collaborate.
Lenses promotes the data in DATAOps enabling everyone in the organization to participate in data projects and be successful. Let’s now look at how one can actually get you data pipelines reliably deployed and running in a complex distributed infrastructure landscape.
It’s all about Production Pipelines
If you’re not in production it doesn’t count.
That’s a bit of a tongue in cheek statement, but everything needs to be production ready to bring repeatable value. I’m always impressed if data pipelines make it to the squeaky bum time of production. I’m even more impressed if you do this in an automated and repeatable manner.
Making data, automated and repeatable
Lenses also address the Ops in DataOps. With Lenses you have enterprise-ready features so you can monitor and manage your streaming platform;
- LDAP and Kerberos integration
- Role-based security policies allowing multi-tenancy
- Topic while listing and blacklisting
- Alerts with optional Prometheus Alert manager integration
- Helm charts for Kubernetes including our SQL based Connectors
- Cluster monitoring and infrastructure insights
- Data policies to help with GDPR and PII data
Many companies are focused on the infrastructure components, deploying Apache Kafka® or monitoring the infrastructure and so on. While vital, deploying a cluster, whether it’s Kubernetes or Apache Kafka, is just a set of services burning dollars until you run your applications on top and bring value to your business. Does the head of Market Risk at an investment bank care if you can easily add a new server? To some level, yes, but the value addition is the applications you build to generate business insights.
How does Lenses help you build Data pipelines ?
Lenses supports rest and websocket endpoints. These endpoints support the management of all aspects of data logistics. Lenses let’s data-savvy business users construct repeatable data flows, driven from config.
Lenses has a command line interface, you can ask the CLI to export resources, for example, a topic or an alert. Lenses will then export all the necessary configurations. Lets image you are a data scientist and you;
- Create a source connector to stream in Bloomberg data
- You inspect the data with SQL
- You deploy a SQL processor to join and aggregate streams of data
- You deploy a connector that uses SQL to write the results to Cassandra for future analysis.
What you have created a is DataOps pipeline, with no code, only configuration.
Next step is the promotion to production, remember if it's not in production it doesn’t count! A naive approach would to simply use the UI to recreate in production the processors and connectors but you could be missing topics or configs on the topics. You also may not have access due to Lenses governance features and it's certainly not automated.
Lenses can do better. Each resource like topics, processors and connectors are declarative configuration. We can export the configurations for the whole topology as files and version control them. Next we can apply CI/CD pipelines with the Lenses CLI to ask another Lenses instance, for example in production, apply our desired state.
By using the CLI export and import commands we can promote through environments the application landscape.
lenses-cli export acls --dir my-dir
lenses-cli export alert-settings --dir my-dir
lenses-cli export connectors --dir my-dir
lenses-cli export processors --dir my-dir
lenses-cli export quota --dir my-dir
lenses-cli export schemas --dir my-dir
lenses-cli export topics --dir my-dir
lenses-cli export policies --dir my-dir<directory from flag>
│ └── alert-setting.yaml
│ ├── connectors
│ │ ├── connector-1.yaml
│ │ └── connector-2.yaml
│ └── sql
│ ├── quotas
│ │ └── quotas.yaml
│ └── topics
│ ├── topic-1.yaml
│ └── topic-2.yaml
│ └── policies-city.yaml
Getting into Production Faster
DataOps strives to reduce the dependence on developers and enable data experts to build production-ready data pipelines.
Any flow, any data, any stream, in production. One Lens.