Distributed tracing for microservices on Oracle Cloud with Spring Cloud Sleuth and Zipkin

For a master table-of-contents for blog posts on microservice topics, please refer — https://medium.com/oracledevs/bunch-of-microservices-related-blogs-57b5f1f062e5

Ok, so you have a cloud native/microservices-style architecture wherein you have multiple services which collaborate with each other to achieve something… great!

Debugging and troubleshooting can be tough

  • multiple (micro) services — each doing their own thing
  • multiple instances per service — after all, our services are stateless and horizontally scalable !
  • sometimes you might not even have access to the underlying machine/VM/node — just a vendor/product specific way to get access to the application logs
  • etc….

There is nothing wrong with the above constraints — in fact, they are inevitable with distributed apps in general (microservices or not), especially when they are running in managed PaaS (platform-as-a-service) environment

So what can we do to make things easier and manageable when its comes to in-depth app level visibility? There is no silver bullet as such, but Distributed Tracing is a tool which when used properly can help us


This blog demonstrates Spring Boot applications leveraging Spring Cloud Sleuth to keep track of app level transactions and transport the trace information to a remote Zipkin server

Although the focus is on Java based apps, the concept applies to any system/service which can produce tracing data in OpenZipkin format

Oracle Application Container Cloud serves as the runtime for

  • the Zipkin server and…
  • … and the Spring Boot apps — inventory and product (we will continue to use the same set of apps as we did in one of the previous blogs with minor modifications to demonstrate the concept)

Architecture

the sample app is available on Github

The below diagram presents a high level overview

High level architecture

Dead simple — thanks to Spring Cloud Sleuth (Zipkin module), the individual Spring Boot apps send the transaction data (traces) to Zipkin which can then be viewed using a dashboard provide by Zipkin itself

Here is a summary of the components/services

Zipkin

Zipkin server is a yet another Spring Boot app and it runs in an embedded Tomcat container (in this case). There is hardly anything required here except

  • using the zipkin-server and zipkin-autoconfigure-ui (for the visualization dashboard) dependencies and then …
  • … using @EnableZipkinServer on the Spring Boot bootstrap class does the trick!

Inventory & Product services

For details of the inventory and product services, please refer to the Microservices service discovery on Oracle Cloud with Spring Cloud and Zookeeper blog post

The important things to know are

  • the apps use spring-cloud-starter-sleuth and spring-cloud-sleuth-zipkin modules (in pom.xml)
  • the application.properties point to the Zipkin server using spring.zipkin.baseUrl attribute
  • Zookeeper based service discovery has been excluded for the sake of simplicity and to focus on a single topic i.e. distributed tracing
  • and it used RestTemplate instead of the FeignClient

Build & deployment

Start by fetching the project from Github — git clone https://github.com/abhirockzz/accs-spring-boot-zipkin-distributed-tracing.git

Build

Zipkin server

  • cd zipkinserver
  • mvn clean install

The build process will create zipkin-dist.zip in the target directory

Inventory service

  • cd inventory
  • mvn clean install

The build process will create inventory-dist.zip in the target directory

Product service

  • cd product
  • mvn clean install

The build process will create product-dist.zip in the target directory

Deployment a.k.a push to cloud

With Oracle Application Container Cloud, you have multiple options in terms of deploying your applications. This blog will leverage PSM CLI which is a powerful command line interface for managing Oracle Cloud services

other deployment options include REST API, Oracle Developer Cloud and of course the console/UI

You can download and setup PSM CLI on your machine (using psm setup) — details here

Start by deploying Zipkin server application first since both our microservices will depend on it

  • Zipkin — psm accs push -n zipkin -r java -s hourly -m manifest.json -d deployment.json -p target/zipkin-dist.zip

Once you have Zipkin up and running, note down it’s URL (highlighted below) from the Applications page in Oracle Application Container Cloud

Zipkin server deployed

Now, update the deployment.json for the inventory app to enter the Zipkin server info

{
"memory": "2G",
"instances": 1,
"environment":{
"ZIPKIN":"<ZIPKIN_URL>"
}
}
  • Inventory service — psm accs push -n inventory -r java -s hourly -m manifest.json -d deployment.json -p target/inventory-dist.zip

Note down the URL — since it’ll be used in the product service

inventory service deployed

Update the deployment.json for product app to include inventory and zipkin co-ordinates

{
"memory": "2G",
"instances": 1,
environment":{
"INVENTORY_SERVICE":"<INVENTORY_APP_URL>",
"ZIPKIN":"<ZIPKIN_URL>"

}
}
  • Product service — psm accs push -n product -r java -s hourly -m manifest.json -d deployment.json -p target/product-dist.zip
Spring Boot (product) service deployed

Everything is ready for us to see things in action…


Test drive

Access the Zipkin server — note the app URL e.g. https://zipkin-<mydomain>.apaas.us2.oraclecloud.com

Zipkin on Oracle Application Container Cloud

Happy path

Start off by invoking the Product service endpoint a couple of times

e.g. curl -X https://product-ocloud100.apaas.us2.oraclecloud.com/product/iPhoneX and curl -X https://product-ocloud100.apaas.us2.oraclecloud.com/product/AppleWatch

the product service internally invokes the inventory service to return a JSON response

{"name":"iPhoneX","description":"Description for iPhoneX","stock":{"inventory":8,"node":"7e8127f0-c1a6-41db-b893-b786b773590b_67000ba4-ee1f-405c-9249-37ecd56b705d"}}

Let’s hop over to the Zipkin dashboard and query for latest traces (by clicking on Find Traces)

Query transactions in Zipkin

Two separate transactions (highlighted) were generated corresponding to our invocations

Noteworthy points

  • each transaction is broken into 2 spans
  • each span is produced by the service hop i.e. product service calling inventory service
  • you can also see exactly how much time did the inventory service contribute in terms of the total time taken i.e. in the first transaction inventory service took 7137 ms (7.137 secs) out of the 8.06 secs spent on the invocation
  • filtering by the service will give you the %age time spent

Filter just by the inventory service and then query Zipkin, this is what you’ll see — about 88% of the time was spent in the inventory service alone (in the first transaction)

Filter by application in Zipkin

Let’s look deeper into a specific transaction by clicking on the first one — this will now give you a detailed split up and the sequence of invocation is obvious

Spawned spans

Clicking on the product span will give more details like invocation timelines and HTTP request information

Parent transaction (product service)

Notice the parent transaction ID in the below screenshot

child transaction (inventory service)

So far so good — let’s stop the inventory service, see what happens and turn to Zipkin for help!

Failure case

To stop (using the CLI) — psm accs stop -n inventory

Invoke the product service again (couple of times) — curl -X https://product-ocloud100.apaas.us2.oraclecloud.com/product/iPhoneX

You should see a HTTP 500 response

{
“timestamp”: 1517114482192,
“status”: 500,
“error”: “Internal Server Error”,
“exception”: “org.springframework.web.client.HttpServerErrorException”,
“message”: “504 Gateway Time-out”,
“path”: “/product/MotoZ”
}
Failed transactions

I like red, but not in this case since it denotes danger — looking further into the a specific transaction will reveal more

explicit error message

Additional considerations

these are items which haven’t been covered in this post but do deserved to be mentioned

  • Writing custom spans
  • tracing other systems (e.g. DB)

Alright, that’s all for this blog post !

Don’t forget to…

  • check out the tutorials for Oracle Application Container Cloud — there is something for every runtime!
  • other blogs on Application Container Cloud

Cheers!

The views expressed in this post are my own and do not necessarily reflect the views of Oracle.
Like what you read? Give Abhishek Gupta a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.