Co-op Source
2 min readSep 30, 2016

--

You should take a look at the Rete algorithm. Imagine if your database could know, in advance, the structure of all the queries that will ever be asked of it (not the complete query with parameter values, just the structure of the scans and joins, etc.) It could build custom index tables (graph) that would be optimized for just those queries and no others. This is kind of what Rete does. The rules/queries you give it build up a graph of alpha/beta nodes (filter and join nodes) that are very efficient at incremental updates and determining which “materialized views” need to be updated.

The Rete graph uses structural sharing, similar to how immutable data structures are implemented, to avoid unnecessary recomputation during updates. The firing of rules is analogous to incremental event sourcing in your description. Consider the LHS of a rule as a query over the data, the RHS as the event that “fires” whenever new data matches the query.

I won’t go into the details but I think you’ll see a very similar pattern to what you are describing. There was some database theory work done (at UT Austin I think) where Rete and similar algorithms were paired/used with/as databases query engines. Given that mainstream database design thinking is starting to consider main memory-only databases as a real possibility, the use of the Rete (or similar) algorithm is a good candidate for replacing the SQL query engines so common in secondary storage scenarios. At some point we will start co-locating the data along with the query/indexes. Maybe…

Also, take a look at Datomic. It has an interesting architecture and moves query processing out to the “peers” to offload the read-side processing away from the transactor and indexer. It’s schema attributes encourage de-normalized schema design and allows for ACID transactions, the main tradeoff being in write throughput.

Thanks for the post. I’m very much interested in finding ways of keeping the business logic sane and the data store fast.

--

--