Event Driven Salesforce

Published in

onefinestay tech

6 min readSep 7, 2016

2016 was the year that onefinestay would finally answer the increasingly desperate cries of our sales team to upgrade to Salesforce, and that a platform team with almost zero Salesforce experience would step forward to take up the challenge.

In this post I’ll briefly describe the event-driven solution that we delivered and then move on to explore some of the more interesting design challenges that we had to tackle during the development of it.

But firstly, if you are to take away just one piece of advice from this post, take this: know your integration points! By this I mean having a decent enough understanding of the Salesforce objects that you will be using (e.g. Lead, Opportunity, Order, Account) and exactly what objects in your application’s sales flow these correspond to — and if they even exist yet. If you get this right first time you’ll save a lot of time down the line!

Now, with that little gem of wisdom out of the way, let’s see how onefinestay did it!

Our Salesforce Microservice

Onefinestay has dozens of microservices that provide a platform for our website, mobile apps, operations and other internal systems. These publish events whenever some state changes, e.g. a booking, a home, a calendar or a payment, and so the architecture for our Salesforce integration seemed obvious — a new service subscribing to the interesting “sales” events that would push this state to Salesforce.com.

All of our interactions with Salesforce.com would then be event driven, but we found no case studies of this to learn from — so we had to learn the fun way!

The salesforce microservice was built with Python on our own microservice framework nameko and uses the simple-salesforce REST client for Salesforce’s Force.com REST API calls.

We encountered a number of challenges that were particular to our event-driven approach that I’ll now go through. Note that the example code shown from here on has been truncated and simplified for demonstration purposes.

Concurrent Event Handling

We handle a lot of events. When a booking changes, multiple services will publish events microseconds apart: calendaring, finance, payments and the booking service itself. All of these stampede towards the salesforce service and may lead to event handlers trying to work on the same remote object at the same time.

fix: The Force API supports HTTP PATCH so we only need to update the fields on a record that have changed — not the entire object. Salesforce will let you concurrently update multiple fields on the same object.

fix: Bail out of the Force API request at the earliest opportunity. This is a rule we always obey, not only to reduce stress on the applications but to minimise our Salesforce API requests — which have a daily limit. I’ll say more on this principal later.

    @event_handler('booking', 'booking_updated')
    def handle_booking_updated(self, event_data):
        order_updates = {
            change: event_data[change] for change
            in event_data['changes']
            if change in SalesforceOrder.sync_fields and
            change not in UPDATE_FIELDS_BLACKLIST
        }

        if not order_updates:
            return

        order = self.get_salesforce_order(booking_ref=booking_ref)
        self.salesforce_client.Order.update(
            order.sf_id, order_updates
        )

Our services are distributed across multiple VMs and each service runs many concurrent worker threads handling incoming requests, and sometimes two workers do try and work on the same remote object field at the same time. We saw this as a symptom of bad design, but with a deadline approaching we decided to address the symptom rather than the cause — which I’ll return to later.

fix: Mark certain event handlers to have exclusive access on a remote object, achieved under the hood by a lock on a database table row.

    @exclusive_lock('payments')
    @event_handler('payments', 'payment_updated')
    def sync_payments(self, event_data):
        pass

Too Many Events

We’d often get 2 or 3 services publishing events at almost the same time, all telling us roughly the same thing and were left with duplicated API calls, race conditions and wasting our API rate limit.

fix: More fine-grained events that allowed the salesforce service to ignore some of the louder noises, such as ignoring BookingUpdated events in favour of BookingFooBar or a really specific fact such as:

    class BookingOrderOutOfSync(Event):
        def to_event_payload(self):
            return {
                'fields_to_sync': [...],
                'data': {...}
            }

But we are also attempting to sync a large number of business objects to Salesforce (I’ll come back to this too) and by struggling to unnecessarily handle these events we’re risking not handling the core business objects.

potential fix: Find alternative ways to provide guest sales with the platform data that they need.

Event Order

There is no guarantee that when the salesforce service receives an Event as part of a stream that it is in the correct order. We might process a booking change that actually is older than the next one that comes in.

fix: Event correlation paths. Each event now carries meta-data which holds the creation time and its path through the platform. In many cases this data tells us that we can ignore the payload entirely, such as conditioning on the source of the event stream or whether it was started by an automated process or not.

strategy: Our approach on this one has actually been to take this as an edge case and to collect logging data on the event streams and error reports from guest sales to determine the extent of any problem here before we act. The ELK stack has proved a vital tool throughout our Salesforce project.

potential fix: A request pipeline. We discussed putting all incoming event payloads into a queue which is processed in the background, using the event meta-data for prioritisation.

Salesforce Downtime and other Network Issues

Salesforce does go down for maintenance and if you’re responding to events in real time you’re going to have to handle these moments. You may also experience your own network issues or downtime, and under such scenarios you may want to replay any failed Salesforce API call.

fix: Using Sentry and Logstash to alert us and give us visibility to all errors was vital.

fix: Database queue for failed requests. A cron job runs every 10 minutes or so and retries each failed request up to 10 times giving us over an hours worth of resilience. This is a simple try/except around our Salesforce API connections, implemented with a decorator.

    @event_handler('booking', 'booking_confirmed')
    @retry_on_error(retry_for=(TimeOutError, SalesforceError))
    def handle_booking_confirmed(self, event_data):
        pass

standby fix: In the short term we also had some scripts on standby to manually sync data that we saw had not made it to Salesforce.

The Result

We built a monitored, resilient, event-driven microservice that syncs platform data to Salesforce.com in real-time.

The Retrospective

The core lessons learnt were:

We went into the exercise without properly exploring the Salesforce toolkit, principally relying on the Force.com REST API. This resulted in some non-optimal solutions to design problems which have now become tech-debt.
We did not plan our integration as well as we could have. Mappings between business objects and Salesforce objects should have been done upfront, agreed with the stakeholders and then stuck to. Getting this wrong has a high development cost, but leaving them “fuzzy” does too because this allows developers to make their own assumptions — and these will inevitably be wrong!
We did not consider carefully enough about what data actually needs to be stored on Salesforce or how else it could have be provided to the user. Consequently, we now have unanticipated storage and API rate limit demands.
The sheer volume of “sales” events turned out to be higher than expected, resulting in more development cycles (and thus a higher cost) to reach a stable application.
We built a system that handled bookings from London, Paris, New York, LA and Rome, but because of the demands these already place on storage and rate limits, we are aware that we face scaling challenges ahead.

A Better Way

We have no doubts around our fundamental architecture: microservices and event-driven synchronisation. Our guest sales need real-time updates to bookings and our platform architecture was set to provide that. But we made mistakes and there are better ways to address our individual concerns.

Better Way #1: Lightning Connect
Over half the data we sync to Salesforce is a waste of our time. Salesforce already provides features to visualise data stored outside of Salesforce.com. With an OData adapter we could have set up a custom tab on the guest sales UI that listed all the platform data they needed.

Better Way #2: TreeSave API
We individually sync data from event payloads one by one, but in fact we could have bundled related data into a single API call, e.g. construct a Tree for a collection of Order and Order Item changes to avoid the “exclusive lock” pattern.

Better Way #3: Composite Resources
Useful to bear in mind, we could have improved our application’s performance by minimising the number of round-trips between our service and Salesforce.com.

With the above 3 changes we could remove the defensive locking of event handlers, massively increase available storage and substantially reduce API calls, allowing us (hopefully) to rapidly scale up our bookings.

A Happy Ending?

To be continued…