EventSync: the event-driven management missing piece.
Event driven has been becoming very popular those past years especially with cloud platforms and their elasticity; led by serverless products.
Even if there are many different event management products on Google Cloud, there is one missing piece. But, before going to the pain point, I would like to recap from the basics.
Actually, events are pretty simple:
- Something occurs somewhere
- The event is produced with its context
- Services consume the event
The event production
On Google Cloud, services like Cloud Storage or Firestore can generate native events.
Many others cannot, but Eventarc has been designed and released to listen to the audit logs and generate events when the pattern matches.
The event consumption
Before reaching the consumer, the events are transported through a channel. The medium used on Google Cloud is Cloud PubSub.
Cloud PubSub can deliver messages, the events in that case, by HTTP push. Thereby, Cloud Run, Cloud Functions, App Engine, and, in fact, any other HTTP servers are able to receive and consume the events.
In case of high throughput, it’s recommended to pull the messages from PubSub. The developers have a better customization on the message flow control.
The fan-out pattern
One strength of PubSub is the fan-out, i.e. the fact to duplicate messages to several destinations.
In detail, the messages, or events, are published on a PubSub topic.
Then, on a topic, one or several subscriptions can be created.
The consumers get the messages from their subscription.
Every message published in a topic is duplicated in each subscription.
Like that, the fan-out pattern is achieved!
The missing fan-in pattern
However, the opposite pattern, i.e. the fact to wait several events to generate a new one after synchronization, is missing.
So, how to synchronise several events on Google Cloud?
The event orchestration option
One solution is to not take the events on-the-fly, but to orchestrate all the operations, i.e. to always have the control on each step and event in the runtime environment.
This solution implies to manage the fan-out AND the fan-in in the same pipeline. Like that, the pipeline knows exactly what it has been created and therefore what it has to wait.
You can find 2 great videos made by Martin Omander with Mete Atamel about “Choreography” and with Guillaume Laforge about “Orchestration”
Orchestration does not fit all use cases
As always, one-size-does-not-fit-all, and it’s the same with orchestration and that solution doesn’t always work.
I will take a real world example from my company.
The Carrefour group is composed of several countries.
Each country has their own local data platform.
The group also has its own data platform that aggregates the data from the different countries.
One of our use cases is to aggregate the products sold in our stores and to group them by suppliers to track which products have been sold, where, when, how many,…
The suppliers like this king of aggregates and are ready to buy them!
Because our stores do not close at the same time, especially because there are different time zones (between France and Brazil for example), we would like to perform the aggregate, only once, and when all the local data platforms have been updated with the daily data.
Here, the solution is to wait for an event from each local data platform, and, when all the events have been received, to trigger a new one to start the aggregation processing.
You can easily imagine that it is not possible to set up a global orchestrator that will run all the data ingestion in all countries, and then wait for completion!
Here is the missing piece; and EventSync fills the gap!
The EventSync Project
To solve the fan-in pattern, I usually always implement the same solution based on Cloud Run and Firestore.
- I log the events received on a dedicated endpoint in Cloud Firestore
- I check after each event storage, if there are all the expected events.
- If so, I publish a new PubSub message
- Else, do nothing
The EventSync project is only a generic, customizable and ready-to-use version of that implementation I repeated several times.
Event synchronization use case
The first purpose is to wait for different types of events and when all are received, an event sync message is generated.
The events must all be received in a defined time window (named
Get events over a period use case
Sometimes, you don’t want an automatic trigger, but a tool that aggregates the events over an
observationPeriod and delivers them on demand.
That’s the second use case. The
/trigger API generates an event sync message on demand, even if the trigger conditions are not met.
How does it work?
The application is designed to run on Cloud Run. The container is publicly available here:
You can also use the version if you want a specific one. Refer to the GitHub tags section
You must provide a JSON config in the
CONFIG environment variable to define the number of event sources that you expect (also called Endpoints) and other app options.
The endpoint’s configuration contains an
eventKey that will determine the URL endpoint to reach by each event source, typically
The security is ensured by the Cloud Run platform and the IAM service.
Be sure to have granted the correct permission to the Cloud Run runtime service account (firestore, firestore admin and PubSub publisher) and deploy the app.
More detail in the project documentation
To send event to the endpoints, you have to configure your event sources:
- With PubSub, configure the push subscription to the HTTP URL
- With Eventarc, provide the name and the path of the Cloud Run service
- With Cloud Scheduler, Cloud Task, Cloud Workflow and any other HTTP request producers, set the HTTP URL!
However, be sure to have granted the correct permission (roles/run.invoker) on the event source environment to be able to reach the EventSync Cloud Run service, and therefore, the endpoints.
As output, when an event sync message is generated, all the events in the
observationPeriod are included in the output message, grouped by endpoint.
The consumer app will be able to extract the useful value from that structure. An integration report in our case, to know the quality and freshness of the data source.
More details in the project documentation
EventSync is here now, for free!
You have now a solution for your fan-in use cases. It could help you in many situations.
By default, it costs nothing. The free tier of Cloud Run, Firestore and PubSub largely cover the cost of this app.
I already planned several updates and increments on the product (see bellow).
If it does not fit your expectations, don’t hesitate to open an issue on GitHub to discuss it for a future implementation.
Or even to contribute to that project!
The limitations are documented here
Advanced config and miscellaneous
Here a list of current and future feature of the service:
- Automatic trigger deactivation (Included): in that case, only the
/triggerAPI can deliver events to the target
- keepEventAfterTrigger (Included): is set to false by default. It means the events won’t be taken into account and included in the event sync message. But you can override that behavior and keep the events even after a trigger
- Reset the context (Included): you can discard all the existing events in the current observation period by using the
/resetAPI. Interesting to clean up the context, after configuration update for example
- Endpoint minOccurence (per endpoint): (planned) consider that the endpoint meet trigger condition if there is a min number of occurrence logged on that endpoint
- Accepted HTTP Verb (per endpoint): (planned) consider valid an event only if it has been submitted with a specific HTTP verb (prevent pre-flight operation)
- Keep first/last/all events (per endpoint): (planned) on an endpoint, the event sync message will include only the first, the last or all the messages received on the endpoint
- Low latency event submission: (planned) perform a HTTP response to the client immediately after the correct save in Firestore on the event. The trigger’s conditions check and the event sync message generation (if any) are performed asynchronously. This feature required Cloud Run deployed with throttling equal to false. Can incur additional charges
- Filter conditions on Endpoint: (TbD) check the event content (header and body) to validate if it meets conditions to accept it as a valid event. Interesting if you receive logs status event, and you want to keep only the status=OK events
- Event sync message as Eventarc custom event: (TbD) in addition to PubSub target, another target could be Eventarc custom event for a better integration in Google Cloud event ecosystem. I’m waiting for the official release of custom events for Eventarc before going further.