Deviations from the norm

Published in

engineering-udaan

5 min readAug 13, 2023

An object at rest will stay at rest, and an object in motion will stay in motion with a constant velocity unless acted upon by an external force.
- Newton’s first law of motion

An external stimulus is often necessary to change the status quo. Sometimes deviations occur without impetus, because of environmental triggers — intensified competition, political upheaval, a pandemic — they are all examples of such extraneous impulses. Organizations tend to depend heavily on “metrics” to make sense of these changes. However, the purist in me is not always fully convinced with the attributions. Attributions are sometimes incomplete and on occasion, incorrect or misleading. This coupled with our need to interpret metrics based on our biases can compound the problem.

For instance, while environmental awareness has increased in the past century, so has global warming. Does that mean a growth in environmental awareness increases the likelihood of global warming? Correlations tempt us to believe that actions lead to consequences (causation). Comprehensive evidence-based attributions are necessary to lead to sufficiently deterministic conclusions. Not convinced? Here’s another example: If you launch a new feature on your app that replaces an existing feature in a similar real estate (say the top section of the app), just by virtue of it being new, the feature will end up getting a lot more engagement. That does not imply that the feature is inherently better than the one it replaced. Over time, it is quite possible that the new feature’s engagement tapers off as the novelty wears off, and it becomes as good or worse than the feature it replaced. This phenomenon represents a regression towards the mean.

While it is easy to pontificate, it is in fact incredibly hard to accurately identify and attribute a root cause to a change. In this context, would it not be easier if a framework or tool could unravel some of these vagaries?

An illustration

Imagine this simple usecase of adding an item to the cart in the context of an e-commerce app.

The illustration above considers a destination like the cart and the various paths through which one can get to it . In this case p1 is the busiest or the most popular path, followed by p2 and p3 hence p1>p2>p3. Furthermore, regardless of what the external circumstances were, the norm would continue to be p1>p2>p3. There could be of course changes in the ratios, but the overarching relationship would not change.

Now imagine a change was introduced in the system that deviated from this norm, for example, if the relationship p1>p2>p3 no longer held true or the ratios of p1:p2:p3 changed. Such deviations are usually very hard to identify let alone attribute unless they are obvious. The obvious ones would be removing the add to cart button from the search page or an outage on the product page. Maybe there was an additional path p4 that was introduced?

While there are tools available that can figure out the anomalous patterns with some degree of accuracy, the attribution as to why this occurred or what caused this change needs analysis that is almost always very time consuming.

Further dissections

Despite the advances in software technology, AI, networking, data engineering…we still do not have efficient mechanisms to make attributions conclusive. It’s not as though the benefits of proper attributions are not understood. Yet, we uneasily accept the output metrics, often relying on our intuition to tell us if the analysis done was sufficient. In my view, the reasons behind this could be the following:

1. Lack of proper coverage when it comes to instrumentations

2. Incomplete data governance

3. Limited analytical tools that focus on the problem of attributions

A possible solution

Given the problem statements above are fairly universal, even more so in an ecommerce organisation where there are many different influencing factors, both internal and external, udaan considered building a tool that attempted to reduce some of these ambiguities.

The way we went about doing it is to figure out event-based dependencies and have them both instrumented and observed. For instance, if an event like tapping a button led to adding to the cart, the source of that event generator would be captured. As a result, attributions started to become clearer. Later relationships were graphed and pictorially represented. The next logical step was to integrate all of this into a tool . We realised there wasn’t a tool which addressed all of these concerns that I had listed and we went about building something in house. We called it percept insight .

I will perhaps discuss percept insight (pi) in a separate blog later as this isn’t about pi . What pi would help with though in this context was to show the variances in relationships as new features got introduced. We would see some features cannibalising existing ones while the net result remained unchanged. While this was no doubt interesting, what this also told us is that no specific feature alone can move the needle significantly. That it takes time for habits to build, for features to build acceptance and for other features to wear out. But we already knew all of that, didn’t we? pi also showed us exactly how those feature sets changed over time and allowed clear attributions to form, leading to more specific actions.

For example: Let us assume that we start seeing a sudden dip in orders placed from a certain geography. This could be because of a plethora of reasons
Was there a sudden change in pricing? (internal or external factor)
Are the warehouses in that region not operational? (external factor)
Did the tech infrastructure in that geography have issues? (internal factor)
Was there a new feature or workflow that got introduced recently? (internal factor)

There could be others as well….

pi is powered by netra, udaan’s observability platform. What is built on top of it are data structures that allow for specific attributions that lead to better explainability. How? That is a topic for later.
For now though I happy to have a little more clarity into what is actually causing the deviations from the norm.

In conclusion

In an environment where variances are large it becomes important to have leaf level events instrumented and observed. Also the input metric definitions should add up to macro output metrics such that deviations can be observed and attributed accurately. While it is good to conduct A/B experiments and such, there are inherent issues in looking at metrics in isolation. Also unless it is a long running A/B the attributions can be suspect due to recency biases.
Lastly, a tool or framework that can take care of both instrumentations , visualisation and detection of these deviations helps in ensuring proper explainability.

Deviations from the norm

An illustration

Further dissections

A possible solution

In conclusion

Written by Kaushik Mukherjee