How to use Apache Pulsar Manager with HerdDB

Enrico Olivelli
StreamNative
Published in
4 min readOct 22, 2019

Background

Apache Pulsar Manager is a brand new managing and monitoring system for Apache Pulsar, which helps you handle Pulsar clusters with a simple web interface and a few clicks. In September 2019, StreamNative open sourced and contributed it to the Apache Software Foundation.

Pulsar Manager is a lightweight two-tier application, the backend is written in Java and it needs an SQL database to store configuration data and metrics.

Pulsar Manager uses a standard JDBC interface to connect to an SQL database. Currently, there are two supported SQL storages out of the box: PostgreSQL and HerdDB.

HerdDB is an open-source distributed DBMS written in Java and it is built with components from the same Apache Pulsar ecosystem: Apache BookKeeper and Apache ZooKeeper.

HerdDB bundles a native JDBC Driver and implements SQL language by leveraging Apache Calcite features.

It is easy to set up an HerdDB cluster that shares the same ZooKeeper and BookKeeper clusters with Apache Pulsar.

Therefore, HerdDB was designed from the beginning to be embedded into the same client application, as you can do with SQLLite. This way you do not have to manage a separate RDBMS but you can still leverage the benefits of a distributed database, like high availability, without having any shared disk or SAN.

Start Pulsar Manager with HerdDB

Let’s see the basic deployment of Pulsar Manager with HerdDB, that’s the default out-of-the-box configuration when you get Pulsar Manager.

Prerequisite: ensure you have JDK 8 or a higher version on the CLASSPATH to build the Pulsar Manager source.

To build Pulsar Manager locally, you can download the source from the official GitHub repo directly.

git clone https://github.com/apache/pulsar-managercd pulsar-manager./gradlew build -x testjava -jar build/libs/pulsar-manager.jar

With this default setup, the HerdDB database runs inside the same JVM process as Pulsar Manager and stores data on the local disk.

Running HerdDB in this way (embedded mode) is very common in production. HerdDB started this way for the standalone version of EmailSuccess, that is the original project from which HerdDB was spun off as an open-source and independent project.

With the default options, HerdDB is able to store big databases without particular tuning the usage of Heap Memory, it tries to reserve at most one-third of the memory configured for the process.

Overview of HerdDB cluster

In HerdDB, you have tablespaces. A tablespace is a set of tables, the main idea is that transactions and multi-table queries may span only tables from the same tablespace.

For each tablespace, you configure a set of replica nodes and select a leader node (the system is able to automatically promote a replica to a leader role in case of failover).

An HerdDB cluster persists three categories of data:

  • Metadata: service discovery and tablespace management, using Apache ZooKeeper.
  • Journal: replicated write-ahead-log, with support for ‘fencing’, here comes into play Apache BookKeeper.
  • Data: this is actually the data stored in tables, these data are local to each replica.

Apache ZooKeeper stores metadata: management information about tablespaces and support for service discovery.

A client only needs a ZooKeeper connection string to connect to the cluster, as network locations and supported protocols (plain/TLS) are written to discovery metadata.

Apache BookKeeper provides support for ultra-fast writes and for replication, the leader node of each tablespace writes to the journal, replicating each entry according to the configured replication factor. Replicas tail this commit log and keep in sync asynchronously.

BookKeeper fencing feature guarantees that only one node in the cluster can make progress and shields the system from split brains.

Each node of the cluster assigned to tablespace stores a complete copy of the tablespace, this is because SQL queries and transactions may span multiple tables and potentially each record of every table.

Connect Pulsar Manager to HerdDB

You can start your HerdDB cluster and share the bookie of your Pulsar instance.

As BookKeeper client uses ZooKeeper for service discovery and metadata management, the configuration only needed is the ZooKeeper connection string.

This example sets up a simple one-machine cluster.

1. Download Pulsar from the official repository and unpack the binary distribution.

tar zxvf apache-pulsar-2.4.1-bin.tar.gzcd apache-pulsar-2.4.1

2. Start a standalone Pulsar. This starts a Pulsar broker together with a single bookie and a single ZooKeeper cluster.

bin/pulsar standalone

3. Download HerdDB from GitHub and unpack the zip file. For more information, see HerdDB Wiki.

wget https://github.com/diennea/herddb/releases/download/v0.12.2/herddb-services-0.12.2.zipunzip herddb-services-0.12.2.zipcd herddb-services-0.12.2

4. Edit the conf/server.properties file to:

  • switch to cluster mode
  • set the ZooKeeper connection string
  • disable embedded Bookie (we are using the Pulsar one, not the HerdDB one)

So you need to set the configurations as below:

  • set server.mode=cluster
  • set server.bookkeeper.start=false

5. Start the server.

bin/service server start

6. Check the logs.

less service.server.log

You can find lines like:

HerdDB server starter. Node id xxxxxJDBC URL: jdbc:herddb:zookeeper:localhost:2181/herd

Tip: the JDBC URL is the JDBC connection string used in Pulsar Manager.

Now let’s connect Pulsar Manager to your new single machine HerdDB cluster.

8. Clone Pulsar Manager from GitHub.

git clone https://github.com/apache/pulsar-manager

9. Edit the src/main/resources/application.properties file and use the JDBC URL (jdbc:herddb:zookeeper:localhost:2181/herddb) instead of the default one.

10. Start Pulsar Manager and connect it to the Pulsar instance.

./gradlew build -x testjava -jar build/libs/pulsar-manager.jar

Now you have successfully connected Pulsar Manager to your HerdDB. If you want to verify it, you can log into Pulsar Manager Dashboard to double check.

--

--

Enrico Olivelli
StreamNative

Apache Bookkeeper PMC, Apache Zookeeper and Maven committer. EmailSuccess.com and MagNews.com development manager