Start querying your Azure Service Fabric Reliable Collections today
Reliable Collections are a powerful option for storing data on Service Fabric applications. But, while their persistence and replication make them highly available, there is no good way for admin, ops, and external services to look at data in IReliableCollections without writing another stateless service to expose a custom REST API.
This blog post is about what you can do to make verifying and querying the data in your Reliable Dictionaries easy to do today, without writing custom code or writing a new service. More than being able to carry out a CRUD operation against our them, we are going to be able to send our Reliable Dictionaries SQL-like WHERE clauses, and even use indexing to speed up those queries, all from the command line or python.
With that, here are some quick notes before we start:
- The queryable middleware currently only works on ASP.NET stateful services. So you need to be building on ASP.NET.
- The queryable middleware uses Reverse Proxy to receive requests from the client. You will have to enable Reverse Proxy on your cluster, but you will not have to expose a public endpoint.
- You can follow along using the sample application BasicApp
Step 1 : Making a queryable stateful service
The basis of this experience comes from the querying middleware and indexed IReliableDictionary written by Jesse Benson. By adding in the middleware, you are creating endpoints on Reverse Proxy that take OData queries, which are received by an OData controller on the service. That controller then parses your query, scans the dictionary, and returns your result, including forwarding your query to other partitions.
To start off, you can download the sample application here, or you can make a new Stateful ASP.NET service.
- Navigate to MyApplication > Add > New Service Fabric Service
- Create a new service .NET Core 2.0 > Stateful ASP.NET Core > API
- Add the Nuget package MyASPSvc > Manage NuGet Packages > Install ServiceFabric.Extensions.Services.Queryable
- Go to Startup.cs > using ServiceFabric.Extensions.Services.Queryable;
- And add in Startup.cs > Configure() > app.UseODataQueryable();
- In MySvc > RunAsync, create a dictionary or indexed dictionary to query against and try populating it with some entries.
Step 2: Interact with your service using reliable collections cli
I have built out a command line interface (rcctl) you can use to interact with your service. Installing it is easy, if you have python/pip, install using:
pip install rcctl
To get started querying your reliable collection, read the rcctl walkthrough, which is very comprehensive. You can query your cluster, and execute updates to values. You can use the -h flag to find out more about any function (e.g. rcctl dictionary query -h)
While rcctl is a great start, if you find yourself wanting more control, rcctl is based on a python library called sfquery, which you can use directly or through an interface in a python notebook.
Step 3: Add indexing to use LINQ in your application and to make your external queries fast
One of the great recent improvements to Queryable is the addition of indexing logic.
This lets you create IReliableIndexedDictionaries, which are IReliableDictionaries which can have indexes that map from a property of the value, to a set of the original keys. For example, a dictionary
Dictionary <ProductSku, ProductPackage>
can have an index:
Dictionary <ProductPackage.Weight, ProductSku>
What that means is queries that include a “WHERE Weight = x” can be drastically faster.
We take advantage of indexing both for external queries, and by using LINQ queries in your application against IReliableIndexedDictionaries. Which you can use without having to send REST calls.
Consider the following query, where the dictionary has type <UserName, UserProfile> and Age and Email are properties of UserProfile:
The way the middleware works is that if qdict is an IReliableDictionary is that it will call CreateEnumerableAsync() on that dictionary, and for each value, will check if the clause is met. This means that every query requires an entire iteration through the Value set of the dictionary.
If we want an IReliableIndexedDictionary instead, we can make one using
- using ServiceFabric.Extensions.Data.Indexing.Persistent;
Now, if you then carry out the same query against the dictionary
- $filter= Value/Age ge 20 and Value/Email eq ‘email@example.com’ and Value/Age le 20
the middleware will scan your WHERE clause, find that it is filterable because you are doing operations on AGE and EMAIL which you have indexes on. Then it will:
- Do a lookup on Index<Age, UserName>, getting the set of keys with Age≥20
- Do a lookup on Index<Email, UserName>, getting the set of keys with Email=’firstname.lastname@example.org’
- Intersect the sets of keys to only keys that are the same in both sets
- For each Key in the final set, do a lookup against the original <UserName, UserProfile> dictionary to get the value.
What does that mean for speed? For queries whose values are randomly dispersed in a dictionary, here is what we observed:
To find out more about indexing, visit the repository here.
To learn how to use LINQ in your application, read the snippet here
While indexing and querying support today is not multi-framework or quite built into the product, we already have the tools today to start experimenting. We can build indexing logic straight into our application code and do fast lookups using LINQ. Externally, we can query use rcctl or python to query both regular and indexed dictionaries, making it easy for us to validate our data, without having to write stateless services.
- Jesse Benson wrote the foundation on which this experience is based
- Shalabh Mohan Shrivastava helped me write the Queryable-Indexing integration and rcctl