The Power of Aerospike Expressions - Version 5.2+

What is New?

Since I wrote a blog post a couple of months ago covering The Power of Aerospike Predicates, Predicate Filters have moved to an EOL status with legacy support to be continued for the foreseeable future. They have been replaced with Aerospike Expressions. This post is an update for my previous blog post, replacing the Predicate syntax with the Expression syntax.

Why use Expressions?

Aerospike is a high throughput/low latency distributed NoSQL database. Records may be retrieved on the basis of a primary key, secondary index query, or full scan using the most basic API functionality. As Aerospike only supports retrieval of records via a primary key, a secondary index, a record UDF function, or a full set (table) or namespace scan, what if you require greater selectivity than one of those basic retrieval methods? Obviously, the selected records may be filtered further by executing code within your client application. But via the utilization of Aerospike Expressions, which are an adjunct to the basic APIs, you can avoid unnecessary network traffic and take advantage of the available CPUs in the cluster of servers in the Aerospike database.

Reducing network traffic is particularly important if your client applications aren’t physically colocated with your Aerospike DB. When the clients are in a different Amazon AWS Availability Zone than the Aerospike DB servers, then they are implicitly in separate physical locations. The cost for data traffic between Availability Zones can be significant. As of the date of this blog, the AWS on-demand cost for data traffic between Availability Zones in the same Region is $.01/GB.

The Aerospike feature that provides the additional selectivity for database API calls is referred to as Expressions. Filtering of results via Expressions is performed in parallel across all the servers on the cluster, providing the desired selectivity, while minimizing network traffic and leaving less work for the individual client applications to perform. And, of course, this leads to lower latencies and higher throughput for the applications than the alternative of filtering within the client applications.

Expressions use record metadata and record bin values. They are may be used in conjunction to secondary index queries, scans, batch, read, write, delete functions as well as record UDF functions. They can include the following data values in their expressions:

  • Record storage size
  • Record last update time
  • Record expiration time
  • Record digest modulo
  • Integer, string and GeoJSON bin values
  • List and map element values

The available operators include:

  • Logical AND, OR and NOT expressions
  • Integer <, <=, ==, !=, >=, and >
  • String == and !=
  • GeoJSON “within” and “contains” comparisons
  • String regular expressions
Photo by Avery Evans on Unsplash

Credit Card Transaction Example

As an example of the utilization of Expressions, let’s assume that your Aerospike database contains credit card transactions. All transactions for an individual within a given month are stored with a Complex Data Type (Map or List) within a record keyed formed using the concatenation of the unique ID for the user along with the month.

When a transaction is initiated using the credit card for a large amount, say $1500, you want to view all transactions that have been made on that same credit card over $300 within the past 12 months. This is to determine whether the owner of the card regularly initiates large transactions or instead, this is an anomaly that needs to be flagged for special attention/validation for potential fraud/theft of the credit card.

Assuming that your database contains one record per user per month, there are 12 potential records to review. But, rather than the client application program retrieving all 12 records from the Aerospike DB and then scan through all 12 records, we take advantage of the distributed Aerospike DB servers to perform the desired filtering. A single batch read API, augmented with the Expression is executed, only returning the records for the months with the desired records (transactions greater than $300).

In this example, the primary key for the records is a concatenation of the credit card number and the month separated by a vertical bar (“|”). So for credit card #869387936, the primary key for the January record would be “869387936|Jan”. The bin within the records is a List with the amounts of all the credit card transactions against that credit card for the month of January stored as integers in units of cents. It is important that you only store the values exclusively as integers. We will only be looking at the records for 5 months. The records in the demonstration DB look like the following:

+-----------------+-----------------------------------+
| PK | xacts |
+-----------------+-----------------------------------+
| "869387936|Jan" | LIST('[1995, 595, 52500]') |
| "869387936|Feb" | LIST('[2150, 1995, 25578, 9829]') |
| "869387936|Mar" | LIST('[1750]') |
| "869387936|Apr" | LIST('[1995, 2578, 3650]') |
| "869387936|May" | LIST('[26500, 1995, 575]') | |+-----------------+-----------------------------------+

The Java logic fragment required to retrieve the desired records is as follows :

The output is as follows:

[1995, 595, 52500]
No record for month
No record for month
No record for month
No record for month

A Simple Variation

Let’s try something different; retrieval of all records that include at least one value of between $250 and $500 within the transaction List. The syntax for that may be defined as:

The output when using that expression is:

No record for month
[2150, 1995, 25578, 9829]
No record for month
No record for month
[26500, 1995, 575]

Documentation

The documentation regarding Aerospike Expressions can be found at:
https://www.aerospike.com/docs/guide/expressions/

--

--

--

Curated articles about Aerospike covering application development, data modeling, solutions and operations.

Recommended from Medium

Hungarian Software Testing Forum #HUSTEF2017

And there you have it. Golang’s feature set is a result of a

Passwordless Sign-In With Google’s 1 Tap for Web

Roblox Vehicle Simulator Codes (February 2021) — New Vehicle Update!

Kubernetes hands on series: Labels and Selectors

Systemd Systemctl a Poem

A Short Introduction to SQL

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Reuven Kaswin

Reuven Kaswin

More from Medium

Batch Operations in Aerospike

Myntra Hacker-Ramp Journey

Redis vs Memcached 比較

Failure Modes for distributed applications

FMECA