Firestore/Datastore: unlock the query filter capabilities in Go

Published in

Google Cloud - Community

5 min readMay 26, 2020

When you have to store document oriented data, you want to store them in document database. Google Cloud Platform proposes 2 types of document DB engine: Datastore and Firestore. The 2 products are very similar with a different story. Datastore comes from App Engine and Firestore comes from Firebase suite (a part of a documentation is still under the Firebase domain).

Datastore/Firestore overview

The products are serverless and scale automatically with the traffic. You have API to reach the platform, you can create document with transaction and batch capabilities.

One powerful feature is the capability to trigger events on creation, on update or on deletion on a document. You can plug a Cloud Function on this event for performing additional processing. For example, for sanitizing blog post messages.

Another similarity, you are charged on the number of operation performed (read/write/delete) and on the volume of data stored (indexes size included).

Query limitations

The two products are very similar in their strengths and pricing, but also in their weaknesses. The main one is the query limitations.

You can’t exclude a value with !=.
You have to create 2 range conditions: > and < for having the same result
If you use > and/or < range condition, it must be on the same field. It’s not possible to have several fields with > or < condition
The equivalent of the IN clause is limited to 10 elements maximum
When you filter on range on a field AND on equality on another field, you have to need to create a composite index beforehand
It’s impossible to filter on nested fields

Post processing solution

In my Go project, I coped with these issues and I had difficulties to meet the product owner expectations.

The consumer must be able to filter on any fields, with any conditions

That’s why, I developed the JsonFilter library that post process the Datastore/Firestore results (but, at the end, can be also applied on any array of struct).

The standard process is the following

Get the filter string and “compile” it against the target struct of your query
Query the data into Firestore/Datastore and map the result in the struct array (not performed by the library)
Apply the filter to eliminate the array entries that doesn’t match the filter.

Let’s go deeper on the usage of the library, the filter capability and how to customize it.

Filter compilation

The filter string is submitted to the library for a compilation against the struct type of the array of struct (typically Firestore/Datastore query result).

type structExample struct {...}...filter := jsonFilter.Filter{}

if filterValue != "" {
   err := filter.Init(filterValue, structExample{})
   if err != nil {
      //TODO error handling
   }
}

The compilation checks, by reflection, if the fields contained into the filter definition exist in the struct type in parameter (here structExample type). And if the types are compliant; for example, you can’t applied a range filter on string type. Possible errors are described in the documentation.

Complex and nested types are take into account (pointers, reference to another type, maps/array/matrix struct)

Customizable filter format

For my needs, I defined a format for the filter. This format fits well with a string that you can provide through HTTP GET URL parameters (my use case)

key1=val1,val2:key2.subkey=val3

Where:

key1 is the JSON field name to filter. You can use composed filter to browse your JSON tree, like key2.subkey
= is the operator. != > < are also available
val1, val2, val3 are the values to compare
The values are separated by comma ,
The different fields are separated by colon :

However, it’s possible that this opinionated format won’t match with all the use cases. You maybe also want to limit the depth scan of your nested fields. For this, you can customize the options of your filter

filter := jsonFilter.Filter{}

	o := &jsonFilter.Options{
		MaxDepth:             		4,
		EqualKeyValueSeparator:    	"=",
  		GreaterThanKeyValueSeparator: 	">",
		LowerThanKeyValueSeparator:   	"<",
		NotEqualKeyValueSeparator:    	"!=",
		ValueSeparator:       		",",
		KeysSeparator:        		":",
		ComposedKeySeparator: 		".",
	}

	filter.SetOptions(o)

Apply the filter

You can apply the filter on an array of struct (the same struct type as provided in the compilation/init step). Typically, the Firestore/Datastore query result have to be mapped into an array of struct before applying the filter.

The action scans all the elements of the array and evict this ones that don’t match.

results := getDummyExamples()if filterValue != "" {
   ret, err := filter.ApplyFilter(results)
   if err != nil {
      //TODO error handling
   }
   results = ret.([]structExample)
}

That’s all. All the not wanted values are removed from the result, following these rules:

Each element of the filter must return OK for keeping the entry value
The equality =, comparable to IN sql clause: at least one value must matches.
The not equality !=, comparable to NOT IN sql clause: all values mustn’t match.
The Greater Than> and the Lower Than <: only applicable one numeric can be compared.

Issue tackled and limits

The target of the library is to propose more advance filter that Firestore/Datastore offers by default. It allows you to unlock the limitations:

Allow to filter on any type and nested type (map, array, map of array/object, array of map/object,…)
Allow to use several filters IN on the set of data
Use more than 10 elements in a IN condition
Allow to use several filters NOT IN on set of data
Allow to compare several range with > and < operators
Don’t required any composite index creation

Thereby, all the native limitations are tackled!

Performance concern

Because the library scans all the Firestore/Datastore results, the performance could be a concern. In fact, it should not.

Indeed, the filters should be applied only on a small array of struct and the filtering overhead is very small. If you read thousand of document, you will pay a lot for nothing!

In addition, your API response time will take more time because of the high number of documents to recover and the filtering duration.

The library is designed for offering convenience to the consumer on their filters, it has not to be used as a core database feature.

Any wish or ideas?

I have update ideas for the library and I didn’t implement it because I didn’t need them. For example,

There is no wildcard capability, like *, to replace any JSON field name, or part of it.
There is no wildcard, like *, or regex capability on filter values.

According with my needs, I will implement these evolution or others.

I will be happy to discuss and to improve this library if you have special requirements. Don’t hesitate to open issue on Github project and to contribute to it!