Building the Open Source Hawkular APM at Red Hat

An interview with Gary Brown

2016 has become the year distributed tracing came to the forefront. Around since the 1970’s, tracing moved beyond the academic world with the proliferation of microservices, starting with Dapper at Google. With the announcement of X-Ray at Amazon re:Invent on Thursday, distributed tracing has come front and center in 2016. While there’s a lot of excitement around this new offering, some of the top comments on the Hacker News article announcing the product are about vendor lock-in fears. This is where OpenTracing, an open standard for vendor-neutral instrumentation, comes in. Red Hat recognizes this and built Hawkular APM, a suite of open source monitoring products that includes distributed tracing with OpenTracing support.

Here, I chat with Gary Brown, the lead engineer on the Hawkular APM project.

OT: What is Hawkular APM and how does it fit into Hawkular?

GB: Hawkular is a family of monitoring components that includes Metrics, Alerting, Inventory and APM. This component based approach allows various projects/products to just select the functionality that they require, so they can be used independently or in various combinations.

For example, currently Red Hat OpenShift only has need for metrics and alerting capabilities while Red Hat Cloudforms uses Metrics, Alerting and Inventory (bundled collectively as “Hawkular Services”) to provide its middleware management capabilities.

Hawkular APM is concerned with application level monitoring, and capturing/analysing business (user defined) transactions with associated business metrics. It can be integrated with Hawkular Alerts, and we provide support for deploying it within Red Hat OpenShift.

OT: Why did you build it? Why are New Relic, AppDynamics, Zipkin, or LightStep not enough?

GB: The project was originally called Hawkular BTM, due to its focus on capturing business transaction information, meaning the end to end ‘trace’ including business metrics extracted from the exchanged messages and application data. It achieved this using a Java Agent based on the ByteMan project. So initially distributed tracing was not a main focus, but something required to achieve the goal of capturing business transaction information.

As the project evolved it became clear that the information we were collecting was of wider use, and hence the reason that we refocused the project on the wider scope of Application Performance Management. So now the project provides visibility of three types of information, individual components, distributed tracing (aggregated and instance views) and business (or user defined) transactions.

As Red Hat is 100% open source, we needed an open source APM solution. The only option may have been Zipkin, however that project was not open sourced until six months after Hawkular APM (or BTM as it was at the time) had started and as mentioned before, APM is just one component in a broader monitoring platform.

OT: What kind of companies find Hawkular most useful?

It is still early days for Hawkular APM, as it is not currently distributed in a supported product, however, as would probably be expected, the main source of interest has been from customers that are dealing with highly distributed applications.

OT: Do you see AWS’s X-Ray as competitive to Hawkular? What is your opinion of their approach?

In terms of functionality, AWS’s X-Ray product is competitive to all available distributed tracing solutions. However it is less about the capabilities of the tracing solution but its place as an integral part of a larger platform. If a company is using AWS then it would be a natural choice to use X-Ray — however if using a hybrid or on-premise environment, then tools like Hawkular APM are likely to be more relevant.

OT: Why did you decide to make Hawkular OpenTracing compatible?

GB: I think the two most important guiding principles for any Red Hat project are, everything we do is open source, and support standards to avoid vendor lock-in. Although the initial focus for the project was monitoring JVM based applications using our Java Agent, we were also keen to support monitoring polyglot microservices. Obviously this was not an option with our Java Agent approach so we provided support for zipkin clients to report their data to Hawkular APM. Zipkin being a “defacto” standard was the next best solution given that no standard existed at the time.

While working with the zipkin project we learnt of the efforts amongst a number of projects/vendors to define a standard. What has been so amazing is how quickly the standard has come together, and been supported by so many projects. It is a great testament to the people who have worked so hard on the standard and promoting it.

OT: Do you plan to expand the OpenTracing support as more Zipkin libraries become OT compatible?

GB: We plan to support our own OpenTracing providers in a range of languages. Currently we have support for Java and JavaScript/Node.js, but others will likely follow in the near future.

However we also support the Zipkin data format, so if applications wish to use their client libraries (native or OpenTracing based) then we can also accommodate that.

OT: What are your thoughts on explicit instrumentation (like OpenTracing and Zipkin libraries) vs JVM based?

GB: We think that both approaches have their merits and we should offer flexibility for users to choose their prefered approach.

Even with the explicit instrumentation approach we may want to offer ways to minimise the amount of additional code an application developer needs to write, for example, by configuring interceptors etc.

Ideally we should provide a hybrid approach where some information is captured via integration with common frameworks, but the application is able to add further information (or internal component spans) through explicit use of the API. This is not necessarily easy to achieve currently, with the Java API, as the implementation does not support implicit passing of span context across framework boundaries — but hopefully a solution will be found soon.

OT: What are some gotchas you wish you knew before you started with OpenTracing support?

GB: Instrumenting an application written in an asynchronous programming style can be easier when explicitly passing the span context between associated handlers. This was illustrated in our first blog demonstrating use of OpenTracing with an application written using Vertx.

The OpenTracing standard is still in its early days. Although it has come along way in a short time, application developers should be careful how it is used, if they wish to be able to support multiple backend tracing solutions. An area to be aware of is whether certain tags are more relevant in one solution over another. They should also beware of using multiple references on a span as these may not be handled consistently (if at all) across solutions.

OT: How do you think tracing relates to APM and monitoring in general?

GB: As mentioned previously, our project started with a focus on Business (or User Defined) Transactions. However tracing the transaction across distributed services was a key requirement to achieve our goals.

APM is not dependent upon tracing, as information can be collected from the individual services and analysed in isolation. However when isolating problem areas in your application it can be useful to be able to focus in on the end to end business transactions to perhaps provide greater insight into the problem.

OT: How do you see the tracing landscape evolve in the next few years?

GB: With the trend towards an increasingly higher level distribution of simpler services in an elastic cloud based environment, the need for tracing will become critical. As such it should become a key part of the cloud infrastructure supporting those services.

OpenTracing provides a valuable and important step in standardising the area.

Thanks for reading, feel free to recommend this post if you found it valuable. If you’d like to learn more about OpenTracing,