Distributed tracing for microservices on Oracle Cloud with Spring Cloud Sleuth and Zipkin
For a master table-of-contents for blog posts on microservice topics, please refer — https://medium.com/oracledevs/bunch-of-microservices-related-blogs-57b5f1f062e5
Ok, so you have a cloud native/microservices-style architecture wherein you have multiple services which collaborate with each other to achieve something… great!
Debugging and troubleshooting can be tough
- multiple (micro) services — each doing their own thing
- multiple instances per service — after all, our services are stateless and horizontally scalable !
- sometimes you might not even have access to the underlying machine/VM/node — just a vendor/product specific way to get access to the application logs
- etc….
There is nothing wrong with the above constraints — in fact, they are inevitable with distributed apps in general (microservices or not), especially when they are running in managed PaaS (platform-as-a-service) environment
So what can we do to make things easier and manageable when its comes to in-depth app level visibility? There is no silver bullet as such, but Distributed Tracing is a tool which when used properly can help us
This blog demonstrates Spring Boot applications leveraging Spring Cloud Sleuth to keep track of app level transactions and transport the trace information to a remote Zipkin server
Although the focus is on Java based apps, the concept applies to any system/service which can produce tracing data in OpenZipkin format
Oracle Application Container Cloud serves as the runtime for
- the
Zipkin
server and… - … and the Spring Boot apps —
inventory
andproduct
(we will continue to use the same set of apps as we did in one of the previous blogs with minor modifications to demonstrate the concept)
Architecture
the sample app is available on Github
The below diagram presents a high level overview
Dead simple — thanks to Spring Cloud Sleuth (Zipkin module), the individual Spring Boot apps send the transaction data (traces) to Zipkin which can then be viewed using a dashboard provide by Zipkin itself
Here is a summary of the components/services
Zipkin
Zipkin server is a yet another Spring Boot app and it runs in an embedded Tomcat container (in this case). There is hardly anything required here except
- using the
zipkin-server
andzipkin-autoconfigure-ui
(for the visualization dashboard) dependencies and then … - … using
@EnableZipkinServer
on the Spring Boot bootstrap class does the trick!
Inventory & Product services
For details of the inventory and product services, please refer to the Microservices service discovery on Oracle Cloud with Spring Cloud and Zookeeper blog post
The important things to know are
- the apps use
spring-cloud-starter-sleuth
andspring-cloud-sleuth-zipkin
modules (inpom.xml
) - the
application.properties
point to the Zipkin server usingspring.zipkin.baseUrl
attribute - Zookeeper based service discovery has been excluded for the sake of simplicity and to focus on a single topic i.e. distributed tracing
- and it used
RestTemplate
instead of theFeignClient
Build & deployment
Start by fetching the project from Github — git clone https://github.com/abhirockzz/accs-spring-boot-zipkin-distributed-tracing.git
Build
Zipkin server
cd zipkinserver
mvn clean install
The build process will create zipkin-dist.zip
in the target
directory
Inventory service
cd inventory
mvn clean install
The build process will create inventory-dist.zip
in the target
directory
Product service
cd product
mvn clean install
The build process will create product-dist.zip
in the target
directory
Deployment a.k.a push to cloud
With Oracle Application Container Cloud, you have multiple options in terms of deploying your applications. This blog will leverage PSM CLI which is a powerful command line interface for managing Oracle Cloud services
other deployment options include REST API, Oracle Developer Cloud and of course the console/UI
You can download and setup PSM CLI on your machine (using psm setup
) — details here
Start by deploying Zipkin server application first since both our microservices will depend on it
- Zipkin —
psm accs push -n zipkin -r java -s hourly -m manifest.json -d deployment.json -p target/zipkin-dist.zip
Once you have Zipkin up and running, note down it’s URL (highlighted below) from the Applications page in Oracle Application Container Cloud
Now, update the deployment.json
for the inventory
app to enter the Zipkin server info
{
"memory": "2G",
"instances": 1,
"environment":{
"ZIPKIN":"<ZIPKIN_URL>"
}
}
- Inventory service —
psm accs push -n inventory -r java -s hourly -m manifest.json -d deployment.json -p target/inventory-dist.zip
Note down the URL — since it’ll be used in the product service
Update the deployment.json
for product app to include inventory
and zipkin
co-ordinates
{
"memory": "2G",
"instances": 1,
environment":{
"INVENTORY_SERVICE":"<INVENTORY_APP_URL>",
"ZIPKIN":"<ZIPKIN_URL>"
}
}
- Product service —
psm accs push -n product -r java -s hourly -m manifest.json -d deployment.json -p target/product-dist.zip
Everything is ready for us to see things in action…
Test drive
Access the Zipkin server — note the app URL e.g. https://zipkin-<mydomain>.apaas.us2.oraclecloud.com
Happy path
Start off by invoking the Product service endpoint a couple of times
e.g. curl -X https://product-ocloud100.apaas.us2.oraclecloud.com/product/iPhoneX
and curl -X https://product-ocloud100.apaas.us2.oraclecloud.com/product/AppleWatch
the product service internally invokes the inventory service to return a JSON response
{"name":"iPhoneX","description":"Description for iPhoneX","stock":{"inventory":8,"node":"7e8127f0-c1a6-41db-b893-b786b773590b_67000ba4-ee1f-405c-9249-37ecd56b705d"}}
Let’s hop over to the Zipkin dashboard and query for latest traces (by clicking on Find Traces)
Two separate transactions (highlighted) were generated corresponding to our invocations
Noteworthy points
- each transaction is broken into 2 spans
- each span is produced by the service hop i.e. product service calling inventory service
- you can also see exactly how much time did the inventory service contribute in terms of the total time taken i.e. in the first transaction inventory service took 7137 ms (7.137 secs) out of the 8.06 secs spent on the invocation
- filtering by the service will give you the %age time spent
Filter just by the inventory service and then query Zipkin, this is what you’ll see — about 88% of the time was spent in the inventory service alone (in the first transaction)
Let’s look deeper into a specific transaction by clicking on the first one — this will now give you a detailed split up and the sequence of invocation is obvious
Clicking on the product span will give more details like invocation timelines and HTTP request information
Notice the parent transaction ID in the below screenshot
So far so good — let’s stop the inventory service, see what happens and turn to Zipkin for help!
Failure case
To stop (using the CLI) — psm accs stop -n inventory
Invoke the product service again (couple of times) — curl -X https://product-ocloud100.apaas.us2.oraclecloud.com/product/iPhoneX
You should see a HTTP 500
response
{
“timestamp”: 1517114482192,
“status”: 500,
“error”: “Internal Server Error”,
“exception”: “org.springframework.web.client.HttpServerErrorException”,
“message”: “504 Gateway Time-out”,
“path”: “/product/MotoZ”
}
I like red, but not in this case since it denotes danger — looking further into the a specific transaction will reveal more
Additional considerations
these are items which haven’t been covered in this post but do deserved to be mentioned
- Writing custom spans
- tracing other systems (e.g. DB)
Alright, that’s all for this blog post !
Don’t forget to…
- check out the tutorials for Oracle Application Container Cloud — there is something for every runtime!
- other blogs on Application Container Cloud
Cheers!
The views expressed in this post are my own and do not necessarily reflect the views of Oracle.