Sorted Results from a Secondary Index Query — Part II

Piyush Gupta
Aerospike Developer Blog
4 min readOct 2, 2019

Aerospike, a distributed NoSQL database, stores records distributed evenly across all nodes of the cluster using a hash value derived from the user’s key. While this scheme gives awesome performance at scale for record reads and writes, when running secondary index queries on a set of records in a namespace, it presents the challenge of getting sorted results.

In Part I, we explored using Map data type to sort results from a Secondary Index query in Aerospike.

In this blog, Part II, we continue the same example using the List instead of the Map data type for sorting. Using the ordered List data type, we can append the results as they are retrieved from the secondary index query run on records in a persistent namespace (data on disk) into a scratch record in a pure in-memory namespace. The appended results are automatically inserted in the sort order in the scratch record based on the first item in the list. We can then retrieve the sorted results, paginating as necessary using the List Operation APIs for consumption by the client.

Maps can be used if the sort key value is unique for each record and you are sure that the value associated with the key will not be overwritten. A list of the list value type, as shown in this example, allows you to sort based on the first value of list element, as long as each list value has the same number of entries. You can opt to keep or remove duplicate values.

Example: We have data on disk on people — name, house#, street name, city, county — millions of records. We want to search for all the people on a given street by doing a partial match on street name. However, we want the result set sorted and paginated by house number. Multiple people may live in the same house, so multiple record entries may be present for the same house number.

  • Create the Secondary Index (SI) query or scan the entire data set and filter the results to a desired predicate filter:

Example data:

Starting Data Set Sample
Statement stmt = new Statement();
stmt.setSetName(sn); // “people”
stmt.setNamespace(ns); // “test”, a disk storage namespace
//If SI filtering is needed ...
stmt.setIndexName(“county_idx”);
stmt.setFilter(Filter.equal(“county”,”Santa Clara”));
stmt.setPredExp(
PredExp.stringBin(“street”),
//any prefix- any suffix partial string match,
//ignore case or newline
PredExp.stringValue(“.*”+street+”.*”),
PredExp.stringRegex(RegexFlag.ICASE | RegexFlag.NEWLINE)
);
  • Consume the results in a call back, one record at a time and store back in an in-memory namespace for sorting and paginating:

SI on county query, records returned where street = “mary”:

Qualifying records after filtering on SI and Predicate Filter
RecordSet recordSet = client.query(queryPolicy, stmt);

...
...
while (recordSet.next()) {
record = recordSet.getRecord();
vl.add(record.getString(“housenum”));
vl.add(record.getString(“street”));
vl.add(record.getString(“city”));
vl.add(record.getString(“county”));
vl.add(record.getString(“name”));
tempRec = client.operate(null, tempKey,
ListOperation.append(lPolicy, “tempBin”,Value.get(vl)));
vl.clear();
}
  • Read back a sub-set (paginate) or all results from the in-memory namespace, sorted by desired sort key:

Sorted results stored as a list of lists in in-memory namespace:

for(int i=0; i<size; i++) {
List<?> lItem = (List<?>)(ll.get(i));
str = “”;
for(int j=0; j<5; j++) {
if(lItem.get(j)!=null) {str+=lItem.get(j).toString()+” “;}
}
lPeople.add(str);
}

(Total number of records is equal to size. To paginate, say in sets of 10, loop over i in sets of 10, till size is exhausted.)

With fail on cluster change and Strong Consistency mode guarantees, this can be enhanced to account with code exceptions for timeouts and consistent performance under cluster network events.

Putting it together… sample code for the method is shown below:

public ArrayList<String> getPeopleByStreetUsingLists(String ns, String sn, String street, String queryId) { ArrayList<String> lPeople = new ArrayList<String>();
if(street.isEmpty()) return lPeople;
if(queryId.isEmpty()) return lPeople;
Record record = null;
Statement stmt = new Statement();
stmt.setSetName(sn); // “people”
stmt.setNamespace(ns); // “test”, a disk storage namespace
//If SI filtering is needed ...
stmt.setIndexName(“county_idx”); //SI pre-built
stmt.setFilter(Filter.equal(“county”,”Santa Clara”));
stmt.setPredExp(
PredExp.stringBin(“street”),
//any prefix- any suffix partial string match,
//ignore case or newline
PredExp.stringValue(“.*”+street+”.*”),
PredExp.stringRegex(RegexFlag.ICASE | RegexFlag.NEWLINE)
);
QueryPolicy queryPolicy = null;
RecordSet recordSet = client.query(queryPolicy, stmt);
Key tempKey = new Key(“scratch”, “temp”, queryId);
ListPolicy lPolicy = new ListPolicy(ListOrder.ORDERED,
ListWriteFlags.DEFAULT);
//We will sort people data values as list with
//first item in the list being housenum.
//Must use append() with ORDERED, needed to retrieve sorted values.
int size = 0;
Record tempRec = null;
List<String> vl = new ArrayList<String>();
client.delete(null, tempKey); while (recordSet.next()) {
record = recordSet.getRecord();
vl.add(record.getString(“housenum”));
vl.add(record.getString(“street”));
vl.add(record.getString(“city”));
vl.add(record.getString(“county”));
vl.add(record.getString(“name”));
tempRec = client.operate(null, tempKey,
ListOperation.append(lPolicy, “tempBin”,Value.get(vl)));
vl.clear();
}
if(tempRec!=null) {
size = tempRec.getInt(“tempBin”);
//Use size to paginate if desired. If expected size
//is within 128KiB boundary, entire record can be retrieved.
//Single server read operation fetches sorted data.
tempRec = client.operate(null, tempKey,
ListOperation.getByIndexRange(“tempBin”,0,size,
ListReturnType.VALUE));
List<?> ll = tempRec.getList(“tempBin”);
String str;
for(int i=0; i<size; i++) {
List<?> lItem = (List<?>)(ll.get(i));
str = “”;
for(int j=0; j<5; j++) {
if(lItem.get(j)!=null) {str+=lItem.get(j).toString()+” “;}
}
lPeople.add(str);
}
if(lPeople.size()==0) {
lPeople.add(“No match found.”);
}
return lPeople;
}

--

--