The virtual glue for distributed programming — Observing

Last time, I wrote an article broaching the subject of distributed programming and Event Driven Architecture. We talked about how you have to think differently about what you’re sending and who could be getting those events. We also discussed sending references rather than object graphs, we considered the details of our objects lifecycle, and we agreed not to assume timeliness nor actual delivery of those events. You can dive into the whole thing here.

Well, let’s say we want to eavesdrop on this barrage of messages to see what the neighbors are talking about, figure out what all that gossip is about.

Wait a minute weirdo… why would anyone want to do that?

There are lots of reasons to listen to everything going on in your neighborhood… I mean your network. As an example, you would expect a healthy system to send out the same general amount of messages; if any of the servers are mismatched, you can safely predict if something is awry with the software or the load balancing. This comes in really handy when you start throwing machine learning and AI against your data and can quickly predict when your system is getting ready to spin down, hence potentially retiring resources to reduce costs. Or if your system is about to spiral out of control and be able to see the original root cause and not just the aftermath. Having a recording of the behavior of your systems and the details of its behavior is priceless when it comes to hardening and evolving your systems.
Another reason is to literally audit everything going on in your systems; this is sometimes a regulatory requirement or it could be for a security critical application that requires some level of forensic analysis to be done on the activity. Consider a security breach that requires a detailed root cause analysis on what exactly happened and how it occurred; having the exhaustive details about all the data flows between the systems would prove invaluable in these scenarios.

So, obviously, this can be a ton of data and storing all that data in a searchable manner isn’t the cheapest thing in the world. Furthermore, how can we determine who did what and how the different messages related to each other. A little bit of metadata can go a long way; adding correlations, timestamps and other information can help us figure out who did what and when and whether we should keep it. 
For example, creating some kind of common “session identifier” to correlate all things that occur that can be related to one another. This can help in our storage issue wherein we can create a single collection object to add the individual events to. This can be inserted into a searchable document store like Mongo, ElasticSearch or Hadoop. 
Another critical piece of information is the occurring timestamp. This tells when something occurred and how useful it is to you. For example, detailed behavior event information is probably not that useful a week later. That’s when you can summarize that data and throw it away. Lots of services provide you with simple data aging policies that let to ice the data somewhere cheaper or delete it altogether. 
So, when you’re observing all the traffic, you want to be cautious and proactive not to overwhelm your listener, especially since that data can burst at any time. Do that by buffering the messages and inspecting the metadata provided with the messages to see which ones you want and which you don’t. 
Next time I’ll talk about actually consuming these events and doing something meaningful with their payloads. You might think that it’s the same, but you’ll have to come back to find out.

If you liked this article and want to be notified when new we put new ones out, go ahead and click that bell on the top right of the page. If you liked it, give us a clap or two… or fifty! Nothing says I was “mildly amused” more than rewarding us with fake internet currency! Also, finally, give us a shout, or simply shout at us in the comments section below