Extending Datomic Pull Queries

Published in

AdStage Engineering

5 min readOct 19, 2018

Datomic is a hosted database, made by the creators of Clojure. It provides a pleasant and powerful interface for developers and extends the principles of Clojure into the database layer. Read the introduction to Datomic for more information.

We use Datomic as the database of several of our products at AdStage. One of those products, Report, is a data visualization dashboarding tool for online advertisers which we built with Datomic, Fulcro, Clojure, and ClojureScript. Unfortunately, Datomic doesn’t support blobs (binary large objects) and ordered collections which are both needed by Report.

This post details the custom solution we built to work around Datomic’s lack of blob store and ordered collections for our Report product.

Building the Blob Store

When users make a new data visualization in Report, it requests the metrics and metadata of the relevant ads from our Data API. Report calls a single data visualization a widget and the data populating it a data-stream. Widgets collate data from our many sources of advertising metrics and metadata, and as a result each data-stream contains very specific and unique data. Here is one of our rendered charts next to the data-stream containing its data.

;; CTR by Month data-stream
{:data-stream/series
 {:plots
  {:0-metrics-ctr
   {:plot-name    "Facebook",
    :tooltip-name "Facebook CTR",
    :data-points  [1.59458465538327 2.00280548628429 1.33447208284403],
    :legend-name  "Facebook CTR",
    :network      "facebook",
    :metric       :ctr},
   :1-metrics-ctr
   {:plot-name    "AdWords",
    :tooltip-name "AdWords CTR",
    :data-points
    [0.556948274421325 0.487455307016367 0.398452503720537],
    :legend-name  "AdWords CTR",
    :network      "adwords",
    :metric       :ctr}},
  :meta {:currency-code "USD", :currency-symbol "$"},
  :time-range
  ["2018–02–01T00:00:00Z"
   "2018–03–01T00:00:00Z"
   "2018–04–01T00:00:00Z"]},
 :data-stream/start-date "2018–02–01T00:00:00Z",
 :data-stream/end-date   "2018–04–30T23:59:59Z"}

Data-streams can contain a lot of data. We don’t need to do advanced queries on them and we never mutate them, so it was a natural fit to store them as blobs. Datomic, currently, does not support blobs.

We chose Postgres for our blob store since it has native blob support and our Datomic instance is hosted on top of it. We created a table in Postgres with two columns, the id and blob data, and serialize/deserialize them from Report with the library nippy. Below is how we store our data-streams:

Datomic’s FAQ explicitly recommends against storing blobs, so we felt confident in our decision to use a different data store. Using a different data store, however, means we lose out on Datomic’s very expressive and concise query languages!

Pluck over Pull

Datomic Pull queries are our primary method of querying data used to render the UI. They accept a tree like data structure of attribute names and return data in the same shape. Here is a sample query for the data that our widgets need to render:

(require '[datomic.api :as d])
(d/pull db
        [:db/id
         :widget/title
         :widget/data-stream]
        entity-id);; => {:db/id              17592186088168
;;     :widget/title       "CTR by Month"
;;     :widget/data-stream 17592186088371}

When queried, the attribute :widget/data-stream returns a reference to the data-stream in the blob store but not the blob of the data-stream itself. We needed a simple mechanism to query for the data in Datomic as well as the blob store. From this need came Pluck, a layer that sits on top of Datomic Pull query.

Pluck consists of two parts: a depth first tree traversal of the Pull request and a multimethod that is extendable by adding callback functions for each attribute you want to override or add. When pluck reaches an individual attribute (a leaf node) of the query, it dispatches to the specific pluck implementation. Each pluck method implementation is passed the attribute name, database, and the Pull result.

To integrate our blob store with the rest of our data in Datomic, we implemented the :widget/data-stream pluck method:

(defmethod -pluck :widget/data-stream
  [k {:keys [db blob-store] :as env} pull-result]
  (let [data-stream (:widget/data-stream pull-result)
        blob-id     (:data-stream/id (d/entity db data-stream))]
    ;; fetch the data-stream from the blob store
    (data-stream/find-by-id blob-store blob-id)))

When this method is called, we grab the data-stream id from Datomic and use data-stream/find-by-id to extract and deserialize the blob data from Postgres. The deserialized blob is returned with the rest of the widget data joined to the :widget/data-stream attribute. With the Pluck layer included, querying the widget data looks like this:

Pluck is an effective way to extend Datomic Pull queries in a consistent manner. It also allows us to work around many of Datomic’s other limitations, including a lack of ordered collections.

Let’s Plucking Order those Vectors

Datomic lacks native ordering, so applications have to create an order attribute and do their own sorting. Our users create dashboards to organize their widgets. By default, a query for our dashboard’s widgets returns an unordered vector:

(let [conn    (d/connect (e/env :db-url))
      db      (d/db conn)
      dash-id 17592186072163]
  (d/pull db
          '[:db/id
            {:dashboard/widgets [:widget/order]}]
          dash-id));; => {:db/id 17592186072163,
;;     :dashboard/widgets
;;     [{:db/id 17592186072165, :widget/order 2}
;;      {:db/id 17592186072169, :widget/order 0}
;;      {:db/id 17592186072173, :widget/order 1}
;;      …]}

This fails to solve our user’s needs as most users don’t want their widgets in a seemingly nondeterministic order! We extended our original :dashboard/widgets query with Pluck to order them.

(defmethod -pluck :dashboard/widgets [k {:keys [db]} pull-result]
  (sort-by :widget/order (:dashboard/widgets pull-result)))(let [conn (d/connect (e/env :db-url))
      db   (d/db conn)]
  (p/pluck {:db db}
           '[:db/id
             {:dashboard/widgets [:widget/order]}]
           dash-id));; => {:db/id 17592186072163,
;;     :dashboard/widgets
;;     [{:db/id 17592186072169, :widget/order 0}
;;      {:db/id 17592186072173, :widget/order 1}
;;      {:db/id 17592186072165, :widget/order 2}
;;      …]}

With this Pluck method, our dashboard queries return an ordered list. By switching from Datomic Pull to our Pluck API, we’ve been able to keep Datomic’s powerful query syntax while splicing in new data types and adding properties to our data that Datomic doesn’t natively support.

Conclusion

Choosing Datomic has required investing resources to build custom solutions, but the custom solutions were straightforward to build, integrate, and maintain. Furthermore, since Datomic queries take reified data structures as input, we have been able to leverage Clojure effectively to easily extend them. This has enabled Report to be even more powerful than it would have been if we had just used Datomic as-is, and has helped us deliver value to our customers.

If you are interested in Datomic, Clojure/ClojureScript, Fulcro, or any combination thereof check out AdStage, we’re hiring!

Extending Datomic Pull Queries

Building the Blob Store

Pluck over Pull

Let’s Plucking Order those Vectors

Conclusion

Written by Spencer Apple