Leveraging Pulsar’s Next-Gen Streaming Capabilities from a JakartaEE Application
Author: Enrico Olivelli
Jakarta Messaging (previously known as Java Messaging Service) is no longer the only API to handle messaging systems. More and more developers and enterprises are shifting towards new generation streaming platforms like Apache Pulsar™. In this post, we show you how to use Pulsar in a JakartaEE Web Application deployed on Apache TomEE® via the JMS/EJB API.
For the longest time, Java Messaging Service (JMS) — recently renamed Jakarta Messaging — has been the API to handle messaging systems in the Java world.
However, the messaging ecosystem is moving towards the next generation of streaming services like Apache Pulsar. This is largely because Pulsar is free, open source, cloud native, and comes with special features that traditional JMS vendors don’t support well.
As Pulsar becomes the distributed messaging and streaming platform of choice for enterprises and developers, it’s crucial to learn how to leverage it from JakartaEE web applications.
In this post, we show you how to use Pulsar in a JakartaEE Web Application deployed on Apache TomEE via the JMS/EJB API, without installing any additional components to your cluster. If you prefer videos, you can watch our YouTube tutorial for a more in-depth explanation.
Introduction to Java Messaging Service (JMS) API
In a Jakarta Enterprise (JakartaEE) application, you use the neutral, abstract API Java Database Connectivity (JDBC) to connect to SQL databases. With JMS, you do the same, but with Messaging Systems, which allow you to connect and integrate with platforms like Pulsar.
JMS is a set of simple API to interact with Messaging Systems so you can produce and consume messages, manage subscriptions, and handle transactions. There’s also a Simple Message Selection Expression Language to filter out messages while reading from a destination.
In JakartaEE applications, you can use Enterprise Java Beans (EJB) components. There are a few kinds of Enterprise Java Beans, including:
- Stateful and Stateless EJBs: These are used to represent application logic, and can be used in WebServlets, WebServices (JAX-RS/JAX-WS endpoints), or for implementing background tasks.
- MessageDriven EJBs: These special beans are activated by the JakartaEE container when receiving a message from a connector, usually a JMS destination. But the framework is also open to other external systems.
The JakartaEE container provides support for lifecycle management and pooling, provides Context Dependency Injection (CDI), and manages transactions. In general, you’ll have a standard set of APIs to interact with the rest of the system.
Core Concepts of the JMS API
At the core of JMS, there are messages, producers that write messages (who input them into a destination of queues and topics), and consumers who read from the destination and process the messages. Let’s look at some of the main concepts.
- API: JMS offers two APIs–the JMS 1.0 and 2.0. In 1.0, you can find connections and sessions, which connect to the JMS broker. But In 2.0, JMS dropped connections and sessions and introduced “JMSContext” instead.
- Destinations: JMS offers two destinations — queues and topics. In queues, each message you receive is processed by one consumer, and you can browse the queue contents to see which messages haven’t been consumed yet. A topic is very different from a queue. In a topic, you can create multiple subscriptions, with each subscription having a cursor in the topic. The system then dispatches the messages according to the subscription type.
- Consumer styles: You can use the “receive” API that is application driven or the MessageListener method, which is JMS-driver driven.
- Producer styles: This lets you use the “blocking” method, or you can send a message and pass a callback to be executed when the message is sent or there is a failure.
However, JMS doesn’t cover administrative operations, such as:
- Managing destinations
- Managing connection properties (for example, connecting to a broker)
- Defining security models
- Defining resource limits
- Configuring quality of service
This is because every vendor is different and JMS offers vendor-specific management tools that allow you to manage your specific system. JMS APIs only work with administered objects that are provided, created, and managed by the system administrator, like destinations, queues and topics, and ConnectionFactory.
In a JakartaEE application, the container manages everything, but it needs some code to implement the connection to the broker. You can provide this code with the “ResourceAdapater (RA)”archive. Simply deploy a .rar file that contains the code and configure the RA. Later in the live demo, you’ll see how to do this with the Apache domain.
Why Apache Pulsar?
Apache Pulsar is a distributed pub-sub messaging and streaming platform for real-time workloads, and can be used to manage hundreds of billions of events per day. It was originally developed by Yahoo! and donated to Apache. In 2016, it was open-sourced as a cloud-native messaging and streaming platform.
Some key features of Pulsar are:
- Scalable storage (Apache BookKeeper)
- Built-in geo-replication
- Offloading old data to object storage
- Single message acknowledgement
- Native Kubernetes (K8s) support
- Native Schema Registry support (Apache AVRO, JSON, and Google Protobuf)
- Connectors/Integrations (Pulsar IO framework)
- Message processing (Pulsar Functions framework)
- Multiple client bindings: Java, C++. Python, Go, C#
When running a Java enterprise application with a standard JMS system, you’ll find Pulsar’s benefits very tempting. Here are some of them:
- Blazing performance: Millions of JMS messages per second with 99.9 percentile publish-to-acknowledge latency of less than 10 ms.
- Horizontal scalability and object storage offloading: You can scale up or down, compute, and store independently. Pulsar also supports offloading old messages to object storage for infinite storage capacity.
- Consolidation: Because Pulsar is natively multi-tenant and high performance, you can consolidate JMS applications across multiple JMS brokers onto a single Pulsar installation.
- Message replay: Pulsar natively supports message replay and retention. This enables applications to travel back and replay previously consumed messages to recover from misconfiguration issues or production bugs in application code, and test new applications against real data.
- Geo-replication: This is a first-class feature in Pulsar. You can easily replicate your messages to other locations for disaster recovery or global distribution.
- Future readiness: Pulsar supports traditional messaging workloads, but also log collection, microservices integration, event streaming, and event sourcing. You can run these new workloads alongside legacy JMS applications with a single operational model.
Apache Pulsar architecture
As illustrated in Figure 2, Pulsar’s architecture includes various clusters like stateless brokers, a storage layer implemented with Apache ZooKeeper™, enterprise services, and object storage to access your data. There are also producer and consumer applications where you can connect Java apps to Go apps and more. If you’re running on Kubernetes, you’ll have a proxy component to add all the brokers to the clients.
Pulsar also lets you run Pulsar Functions Workers to process data inside the Pulsar cluster. You can also configure and run the Pulsar IO Connectors to connect to other enterprise services, like writing data to Apache Cassandra® or ElasticSearch, or implement a change data capture.
On the storage side, you can connect Object Storage S3 or Google Storage and access data the same way you access the object data. No configuration change is needed on the consumer side.
Pulsar Topics and Subscriptions
In Pulsar’s unified model for messaging, you only have topics that are persistent or non-persistent (which are held only in memory), and partitioned or non-partitioned. Unlike other systems, where several partitions are needed to scale out consumers, Pulsar offers several subscription modes so you won’t need to use partitioned topics.
Tenants and namespaces provide logical and physical isolation of resources, in which we can assign machines or brokers to a set of tenants to physically isolate the data.
Pulsar’s multiple subscription modes–exclusive, failover, shared, and key-shared–makes it a powerful tool to scale out the number of consumers while keeping the ordering guarantees. It also offers two subscription types: durable and non-durable. Durable subscription stores the producer on BookKeeper while the non-durable one stores it in the Pulsar client.
Speaking of producers, Pulsar offers two modes: normal and exclusive. The normal mode allows multiple producers to write to the same topic, while the exclusive mode allows one producer to write to a topic at a given time.
Mapping Pulsar to the JMS Model
Now that you’re up to speed on JMS and Pulsar, let’s look at how we can map these two technologies together. It’s pretty straightforward because Pulsar is more elastic and powerful than JMS.
A JMS topic would translate to a simple Pulsar topic, and a JMS queue can be implemented as a simple Pulsar topic with one “shared” durable subscription.
It’s important to note that no additional proxies of Pulsar protocol handlers are needed, as the mapping is automatically performed on the JMS Client library. This means you can connect to any existing Pulsar cluster very easily.
When you connect systems with JMS, you first need to obtain access to the
ConnectionFactory. Pulsar has its own Pulsar
ConnectionFactory, and configures it by setting the information to connect to the broker. For a step-by-step guide on how to connect to Pulsar using the JMS API from JavaSE, check out our YouTube tutorial.
Demo: Connecting JMS with Pulsar using Apache TomEE
In this GitHub repository, we show you how to create simple applications that use JMS to connect to Pulsar using Apache TomEE. The repo includes two directories: one for the producer, which drives messages, and another for the consumer, which consumes the same messages.
Figure 4 illustrates the code for the producer which is a Singleton EJB that sends a JMS TextMessage using the JMSContext API. In JakartaEE applications, you only need to refer to the
queue — the container will bind those references with the actual
ConnectionFactory or the
queue configured in the system.
Figure 5 illustrates the code for the consumer. The consumer application uses a Message-Driven bean. The system will automatically bind it to an activation point for the
ResourceAdapter. You’ll need to provide the reference to the resource, as the OpenEJB is specific to Apache TomEE. Once you provide the callback, the TomEE will call the function every time it receives the JMS message, and acknowledge the message upon successful execution.
To configure TomEE, we simply need to deploy the
ResourceAdapter (.rar) file, configure it with the information needed to connect to the Pulsar broker, and create the queue reference.
Pulsar provides out-of-the-box capability to automatically create queues at every destination and topic the first time you reference it. If you run the demo, you won’t need to access Pulsar and create the queue.
In this post, we covered how to leverage the cloud-native, open-source Pulsar for next-generation streaming capabilities from a JakartaEE application, and how it can be deployed on a JakartaEE web application through the Apache TomEE server.
Using Pulsar to migrate to your JMS application is an incredibly straightforward process, as you can simply deploy the “Fast JMS for Apache Pulsar” ResourceAdapter to integrate with your JakartaEE or JakartaEE server. You can also use JMS in Java components using the JAR file, if you prefer.