Distributed tracing for microservices on Oracle Cloud with Spring Cloud Sleuth and Zipkin

Published in

Oracle Developers

7 min readFeb 6, 2018

For a master table-of-contents for blog posts on microservice topics, please refer — https://medium.com/oracledevs/bunch-of-microservices-related-blogs-57b5f1f062e5

Ok, so you have a cloud native/microservices-style architecture wherein you have multiple services which collaborate with each other to achieve something… great!

Debugging and troubleshooting can be tough

multiple (micro) services — each doing their own thing
multiple instances per service — after all, our services are stateless and horizontally scalable !
sometimes you might not even have access to the underlying machine/VM/node — just a vendor/product specific way to get access to the application logs
etc….

There is nothing wrong with the above constraints — in fact, they are inevitable with distributed apps in general (microservices or not), especially when they are running in managed PaaS (platform-as-a-service) environment

So what can we do to make things easier and manageable when its comes to in-depth app level visibility? There is no silver bullet as such, but Distributed Tracing is a tool which when used properly can help us

This blog demonstrates Spring Boot applications leveraging Spring Cloud Sleuth to keep track of app level transactions and transport the trace information to a remote Zipkin server

Although the focus is on Java based apps, the concept applies to any system/service which can produce tracing data in OpenZipkin format

Oracle Application Container Cloud serves as the runtime for

the Zipkin server and…
… and the Spring Boot apps — inventory and product (we will continue to use the same set of apps as we did in one of the previous blogs with minor modifications to demonstrate the concept)

Application Container Cloud | ACC | Oracle Cloud

Java Standard Edition and Node.js in the Cloud.

cloud.oracle.com

Architecture

the sample app is available on Github

abhirockzz/accs-spring-boot-zipkin-distributed-tracing

Contribute to accs-spring-boot-zipkin-distributed-tracing development by creating an account on GitHub.

github.com

The below diagram presents a high level overview

Dead simple — thanks to Spring Cloud Sleuth (Zipkin module), the individual Spring Boot apps send the transaction data (traces) to Zipkin which can then be viewed using a dashboard provide by Zipkin itself

Here is a summary of the components/services

Zipkin

Zipkin server is a yet another Spring Boot app and it runs in an embedded Tomcat container (in this case). There is hardly anything required here except

using the zipkin-server and zipkin-autoconfigure-ui (for the visualization dashboard) dependencies and then …
… using @EnableZipkinServer on the Spring Boot bootstrap class does the trick!

Inventory & Product services

For details of the inventory and product services, please refer to the Microservices service discovery on Oracle Cloud with Spring Cloud and Zookeeper blog post

Microservices service discovery on Oracle Cloud with Spring Cloud and Zookeeper

medium.com

The important things to know are

the apps use spring-cloud-starter-sleuth and spring-cloud-sleuth-zipkin modules (in pom.xml)
the application.properties point to the Zipkin server using spring.zipkin.baseUrl attribute
Zookeeper based service discovery has been excluded for the sake of simplicity and to focus on a single topic i.e. distributed tracing
and it used RestTemplate instead of the FeignClient

Build & deployment

Start by fetching the project from Github — git clone https://github.com/abhirockzz/accs-spring-boot-zipkin-distributed-tracing.git

Build

Zipkin server

cd zipkinserver
mvn clean install

The build process will create zipkin-dist.zip in the target directory

Inventory service

cd inventory
mvn clean install

The build process will create inventory-dist.zip in the target directory

Product service

cd product
mvn clean install

The build process will create product-dist.zip in the target directory

Deployment a.k.a push to cloud

With Oracle Application Container Cloud, you have multiple options in terms of deploying your applications. This blog will leverage PSM CLI which is a powerful command line interface for managing Oracle Cloud services

other deployment options include REST API, Oracle Developer Cloud and of course the console/UI

You can download and setup PSM CLI on your machine (using psm setup) — details here

Start by deploying Zipkin server application first since both our microservices will depend on it

Zipkin — psm accs push -n zipkin -r java -s hourly -m manifest.json -d deployment.json -p target/zipkin-dist.zip

Once you have Zipkin up and running, note down it’s URL (highlighted below) from the Applications page in Oracle Application Container Cloud

Now, update the deployment.json for the inventory app to enter the Zipkin server info

{
 "memory": "2G",
 "instances": 1,
 "environment":{
 "ZIPKIN":"<ZIPKIN_URL>"
 }
}

Inventory service — psm accs push -n inventory -r java -s hourly -m manifest.json -d deployment.json -p target/inventory-dist.zip

Note down the URL — since it’ll be used in the product service

Update the deployment.json for product app to include inventory and zipkin co-ordinates

{
 "memory": "2G",
 "instances": 1,
 environment":{
 "INVENTORY_SERVICE":"<INVENTORY_APP_URL>",
 "ZIPKIN":"<ZIPKIN_URL>"
 }
}

Product service — psm accs push -n product -r java -s hourly -m manifest.json -d deployment.json -p target/product-dist.zip

Spring Boot (**product**) service deployed

Everything is ready for us to see things in action…

Test drive

Access the Zipkin server — note the app URL e.g. https://zipkin-<mydomain>.apaas.us2.oraclecloud.com

Zipkin on Oracle Application Container Cloud

Happy path

Start off by invoking the Product service endpoint a couple of times

e.g. curl -X https://product-ocloud100.apaas.us2.oraclecloud.com/product/iPhoneX and curl -X https://product-ocloud100.apaas.us2.oraclecloud.com/product/AppleWatch

the product service internally invokes the inventory service to return a JSON response

{"name":"iPhoneX","description":"Description for iPhoneX","stock":{"inventory":8,"node":"7e8127f0-c1a6-41db-b893-b786b773590b_67000ba4-ee1f-405c-9249-37ecd56b705d"}}

Let’s hop over to the Zipkin dashboard and query for latest traces (by clicking on Find Traces)

Two separate transactions (highlighted) were generated corresponding to our invocations

Noteworthy points

each transaction is broken into 2 spans
each span is produced by the service hop i.e. product service calling inventory service
you can also see exactly how much time did the inventory service contribute in terms of the total time taken i.e. in the first transaction inventory service took 7137 ms (7.137 secs) out of the 8.06 secs spent on the invocation
filtering by the service will give you the %age time spent

Filter just by the inventory service and then query Zipkin, this is what you’ll see — about 88% of the time was spent in the inventory service alone (in the first transaction)

Let’s look deeper into a specific transaction by clicking on the first one — this will now give you a detailed split up and the sequence of invocation is obvious

Clicking on the product span will give more details like invocation timelines and HTTP request information

Parent transaction (**product** service)

Notice the parent transaction ID in the below screenshot

child transaction (**inventory** service)

So far so good — let’s stop the inventory service, see what happens and turn to Zipkin for help!

Failure case

To stop (using the CLI) — psm accs stop -n inventory

Invoke the product service again (couple of times) — curl -X https://product-ocloud100.apaas.us2.oraclecloud.com/product/iPhoneX

You should see a HTTP 500 response

{
 “timestamp”: 1517114482192,
 “status”: 500,
 “error”: “Internal Server Error”,
 “exception”: “org.springframework.web.client.HttpServerErrorException”,
 “message”: “504 Gateway Time-out”,
 “path”: “/product/MotoZ”
}

I like red, but not in this case since it denotes danger — looking further into the a specific transaction will reveal more

Additional considerations

these are items which haven’t been covered in this post but do deserved to be mentioned

Writing custom spans
tracing other systems (e.g. DB)

Alright, that’s all for this blog post !

Don’t forget to…

check out the tutorials for Oracle Application Container Cloud — there is something for every runtime!

Oracle Application Container Cloud Service — Create Your First Applications

Tutorials for Oracle Application Container Cloud Service. Learn to create your first applications.

docs.oracle.com

other blogs on Application Container Cloud

Latest stories and news about App Container Cloud — Medium

Read the latest writing about App Container Cloud. Every day, thousands of voices read, write, and share important…