Cutting MongoDB Cost at Capillary: Unveiling the Space-Saving Power of Restructuring!

Pankaj Kumar Singh
Capillary Technologies
6 min readSep 6, 2023

Capillary Technologies, a leading provider of enterprise-level CRM solutions, employs a sophisticated Loyalty system that is rich in configurations and intricacy. This intricate system efficiently handles millions of transactions on a daily basis. Brands have the flexibility to tailor loyalty rules to align with their specific business needs, all without necessitating any alterations to the core product itself. Our systems leverage these meticulously crafted rules to assess and carry out actions for every user transaction, a process that takes place on a daily cadence. Each of these operational directives is meticulously logged as evaluation records within Mongo, ensuring a comprehensive record of all proceedings.

The Wallet Buster: How MongoDB Costs Became Our Critical Bottleneck!

Venturing into New Frontiers: Amid the myriad challenges of onboarding new brands, we faced a significant hurdle — the escalating MongoDB costs,

To address this issue, we embarked on a journey of document optimization. In which, based on the nature of our application, we ultimately saw a 60% reduction in storage space which in turn translated to 40–60% cost savings.

What makes MongoDB the Ideal Choice for this Use Case?

Consider a scenario where a brand has configured below rules to be evaluated for every transaction.

For each transaction, meaning when a customer makes a purchase from brand A of amount 100 rupees,

* Award 10 redeemable points i.e You can make use of these redeemable points when conducting a new transaction with brand A.

* Award 10 promised points i.e these points work like redeemable points, but you need to wait for a certain number of days (decided by the brands) before you can use them.

These rules keep evolving over time, hence it is challenging to define a fixed schema. Furthermore, the collection experiences high write activity. In this context, MongoDB’s strengths come to the forefront. Its schemaless nature allows for flexible data structures, accommodating dynamic rule changes seamlessly. Additionally, MongoDB’s robust support for high write operations makes it exceptionally well-suited for our use case.

Unraveling the Enigma: The Astonishing Surge in Execution Log Size!

Under typical circumstances, most brands had a limited number of rules configured in our loyalty system, resulting in execution logs(Execution logs save rule evaluation results from transactions in the capillary system within a MongoDB collection) ranging from 200KB to 500KB. However, some brands experienced a substantial surge in configured rules, leading to a significant increase in the size of the execution log collection, now ranging from 13 MB to 16MB. The execution logs collection experienced an average 40-fold increase.

Unleashing the Magic: How We Masterfully Optimized MongoDB Collection Size!

After observing a considerable increase in document size for a couple of brands, from 200 KB to 13 MB, our initial response was to restructure the document in a way that would reduce its size while ensuring that the essence and meaning of the data remained intact.

Restructuring alone wouldn’t lead to any changes in the size of the JSON document. Therefore, we’ll emphasize specific strategies we employed to attain this size reduction in our scenario.

  1. Minimize key duplication.
  • In the given image, the first document contains the key ‘aId’ repeated two times. As the document size or the number of objects increases, the repetition of the ‘aId’ key will grow proportionally, following a linear relationship.
  • Conversely, in the second document, we observe that despite the data’s continuous growth, the key ‘aId’ appears only once throughout the entire document. Similarly for aPerformed.

2. Reduce data redundancy.

  • In the first image, ‘invoice.basketIncludes(\”pants\”,\”jeans\”)’ and ‘invoice.basketIncludes(\”shirts\”,\”polo\”)’ is duplicated twice i.e these text is present in rules and notes. By transforming the collection as shown in the second image, we eliminate this repetition.

3. Exploring One-to-Many Relationships within a MongoDB Document.

  • Upon examining the two collections presented below, a distinct one-to-many relationship emerges within the first collection. Specifically, a rSetId is associated with multiple rId, we can clearly see in the pre-revamp document that we were storing rSetId in every json object, which we can effectively extract out in the 2nd document avoiding repetition.

4. Enums can be substituted with unique identifiers (IDs).

  • As a standard practice, we typically include enumerated values (RULE_EVALUATE, EVALUATE, ACTION_EVALUATED) in our MongoDB collections for readability purposes.
  • We can streamline the enumeration by introducing a mapping of unique identifiers, where, for example, 1 corresponds to RULE_EVALUATE, 2 corresponds to EVALUATE, and 3 corresponds to ACTION_EVALUATED.
  • Trading off readability is acceptable in this case since it leads to a reduction in the size of the MongoDB document and faster search queries.

Eureka: Unleashing 30–60% Size Reduction in Our MongoDB Document!

Upon incorporating the previously mentioned recommendations into our MongoDB document, an impressive size reduction of 30 to 60 percent was successfully attained. Herein, we present a JSON document that showcases the application of the aforementioned techniques for restructuring.

Pre-Revamp(Size=628 Bytes)
Post-Revamp(Size=370 Bytes)

While we cannot provide an exact estimate of the potential size reduction for the document, we can certainly anticipate a substantial decrease. As seen in the previously shared example, the reduction was approximately 41.08%, calculated as [(628–370)/628]*100. Likewise, we noted reductions of around 30 percent in certain cases, and even up to 60 percent in others depending on the nature of the data repetition.

Conclusion.

  1. Even though Mongo provides a schema-less option and the ability to dump unstructured data, If possible, we should explore re-structuring the data to avoid this type of high write volume to mongo server.
  2. With this restructuring, we were able to save up to 60% of the space for mongo documents, resulting in lower infra costs.
  3. MongoDB attempts to keep frequently accessed data in memory to speed up query performance. With a smaller dataset, more data can fit into memory, reducing the need for disk reads and improving search speed.
  4. In larger datasets, search operations may require more frequent disk reads, which can be slower compared to in-memory operations. Smaller datasets are more likely to be entirely or mostly cached in memory, resulting in faster access times.

--

--