By Charles Li

One of the frequently asked questions from new site reliability engineers is: Where to begin when troubleshooting a problem in a cloud environment? I always tell them: You should begin with understanding the problem. Let me demonstrate the reasons and methods with a real troubleshooting case.

There are many applications behind www.ebay.com. Each application serves a unique subset of the URLs, such as /a/* and /b/* in following example. The HTTP requests are distributed to the applications by layer 7 policies on the load balancer:

Policy A: if request = http://www.ebay.com/a/*, then send traffic to application-a

Policy…

Charles Li

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store