Persistence Layer Library — Boosting Relational DB Performance

Gal Koren
skai engineering blog
5 min readJul 28, 2020

Experienced developers are familiar with the following tradeoff:

Which means: when you optimize your code, it naturally becomes more obfuscated.
We, at Kenshoo, were able to completely smash that coupling to the dust by developing a library dedicated just for that.
And now, it is open source.

We have business rules

Kenshoo’s main product is a campaign management service. Each campaign typically contains millions of search engine keywords which users can edit, remove, or create.

We used to have a KeywordService class responsible for persisting a keyword object and applying business rules such as validations.
A validator is a business rule that checks the input and aborts the mutation if it isn’t valid. For example:

  • The text of the keyword must not exceed some maximum length.
  • The bid of the keyword must not exceed the budget of the campaign.

Client-driven need

One of our popular features is a bulk-import tool where clients can upload large CSV files with keywords to be ingested in the database. Well, not directly in the database… This feature must support all of the above-mentioned business rules. The output is a report with the failed entities and the reason for each.

We are also obligated to provide an SLA of 1,000 keyword entities per second.

And… oops. Guess what? We received an angry phone call one day. Apparently, the performance does not optimize itself as the codebase continuously grows.

Too many SQL queries

Using a profiler, it quickly became evident that more than 90% of the time was spent executing SQL queries.
This usually happens when the business layer is simple, as follows:

We have around 50 validators. Some 10 of these need to query the database to retrieve additional context. For instance, some keyword validator needs to access the budget of the campaign. A different validator needs to access the permissions of the user. These are not properties of the keyword object but rather properties of related entities.

So for 1,000 keywords, we access the database as follows:

  • 10,000 read queries (not necessarily for keywords)
  • 1,000 write queries

What if I told you we could do it with only 2 queries?

  • 1 read query to fetch all the contexts required by all the validators for all keywords.
  • 1 write query to save all the keywords in bulk.

Bulk, bulk, bulk

I can’t say it enough. Persisting one entity at a time is a performance killer.
All the Object Oriented books give you this classic example:

And that’s all very well — for NoSQL servers like Cassandra, or when all you have is a UI where the user updates one item at a time, or for a low scale system.
However, that’s no good for us.
We needed something like this:

The Persistence Layer (PL) Library

It’s easy to say “use bulk”, but here is a partial list of what you’d have to do:

  • Each validator should declare which fields it needs from the database. These could be fields of the validated entity itself or fields of related entities. Recalling our keyword example, these fields would be the keyword bid, the campaign budget and the user permissions.
  • Collect the validators triggered by the requested commands. Collect the fields required by these validators and construct an SQL query. This query needs to support JOIN for fields located on related tables.
  • Collect the results into a lookup map so you can validate each item with the correct context.
  • Generate a batch query for the UPDATE or INSERT.

And that’s what the PL library is for.

How fast is it?

Thanks to the PL library, the keyword bulk-import process got 50 times faster.
Other processes became at least 3 times faster.

But why is the code neater now?

We already embraced the 3-tier architecture even before PL was introduced. The diagram above is an example of some classes implementing our keyword entity. It’s pretty neat already, isn’t it? Well, not exactly… This diagram is somehow deceiving because it doesn’t show you that the DAL and Service Layer are very thin while more than 90% of the code is within the Business layer. What’s going on inside the Business layer? A wild wild west…
PL deals with the Business layer. It provides interfaces for the business rules which are plugged together into a thin class called KeywordPersistence replacing the fat KeywordService we had before, thus observing the Open/closed principle.

Below is a schematic 3-tier diagram of the classes involved in our keyword entity today. The green classes/interfaces are from the PL library, while the gray classes are specific keyword rules that we plugged in.

Is it another ORM like Hibernate?

As the adoption of PL increased in Kenshoo, more and more features were added to support every scenario we encountered, such as supporting one-to-many relations (entities having child entities, recursively).

And yet, the answer to the above question is “no”, because:

  • It is based on a command pattern (see example).
  • There is no POJO to represent a record. Instead, PL is declarative — its classes represent the structure (not the data) of the records, and allow users to define what should be updated instead of mutating the state of a mutable POJO.
  • There is no session. Instead, PL executes each bulk within a new transaction.
  • There are no “eager” or “lazy” fetching strategies.
    Instead of a predefined fetching strategy, the flow, which is the set of business rules, precalculates an execution plan to fetch exactly the fields required by the flow.
  • A major part of PL deals with business rules, which are not an aspect of ORM.

Try it online

Here is a small demo that could be run using a web browser.

Got questions?

Leave a comment, and I’d be more than happy to answer.

--

--