Have you ever wondered what happens under the hood when we make a simple Google search? or How Google makes sure that we get all those numbers of accurate search results within a snap of a finger? , Ever thought of how do they manage such a large scale distributed systems?
Before every other thing, What the heck is a Distributed Systems?
A distributed system is basically a network of autonomous systems/servers connected using a middleware which can share resources, capabilities, files and so on. The goal is to make the entire network work as a single computer.
Distributed Programming is the art of solving the same problem that you can solve on a single computer using multiple computers — Mikito Takada
Let me explain with an example of a simple google search. Whenever We make a search on Google, a front end service may distribute the query to hundreds of query servers, each searching within its own index. The query may also be sent to a number of other subsystems which process advertisements, checks spelling, look for specialized results like images, videos, etc. Results from all these services are selectively combined in the results page. We call this model a “Universal Search”. In general thousands of machines and many different services might be needed for one universal search.
On a monthly basis, Google handles over a billion search requests. To handle these requests google uses over a million computers.
Need for a Tracing Infrastructure in Distributed Systems
In such complex systems, poor performance in any of the subsystem would affect the entire search cycle. An Engineer looking at the overall latency would know there is a problem but may not be able to guess which service is at fault since new services would be added or removed every day and a small issue in one application might affect the performance of other. Moreover, each service would have been developed by different teams and perhaps in different languages. Taking all these into an account, a system/tool which can help us understand the system behavior and reasoning of latency would be invaluable.
What’s so special about Dapper?
Although there are other tracing systems like Magpie and X-Trace, certain design changes which were made in…