Earlier this year our company selected Hubspot Enterprise as the platform to consolidate our content publishing, inbound marketing, CRM and sales systems. We’re very happy with the decision and in recent months have begun deeper integrations with legacy systems in preparation for more consolidation and retirement. Upon leveraging the workflows and webhooks, however, we recently discovered issues with duplicate entries in legacy database and our telesales team manually de-duplicating records.
The instinctual reaction was that something was wrong with the web forms, and we needed to prevent users from double-clicking on submit buttons. Upon further research, however, we discovered Hubspot was resubmitting webhooks using an exponential backoff causing the duplicate entries. One of our engineers stumbled upon this forum discussion that eluded to the problem; the webhooks have a short timeout and if your service exceeds it, it considers the attempt a failure and retries using exponential backoff.
We’d already built a microservice “proxy” to Hubspot that brokers API and webhook interactions between our other systems, but the interactions were for the most part, synchronous, given the legacy apps were .NET and PHP apps with SQL Server and MySQL backends respectively. We had to wait for the response of these apps before logging the attempt in Elasticsearch, our new central store for analytics and real-time monitoring. Unfortunately some of those requests may exceed the 2-second timeout with geo queries, thus causing the duplicate data.
Using Google PubSub and Cloud Functions To Go Async
To solve the issue, we decided to break up the synchronous, serial operations into asynchronous operations. The “proxy” would merely accept the webhooks from Hubspot and immediately publish a message to a PubSub topic. A cloud function with topic A as a trigger, would then make the request to the legacy API and await a response. This function would then publish the response to another PubSub topic. Another cloud function with topic B as trigger, would proceed to store the data in Elasticsearch.
I’m pleased to share that the revised design worked great, eliminated the risk of timeout (provided publishing to PubSub doesn’t encounter issues), and prepared the system for scaling in the future. Better yet, our telesales team no longer has to manually de-duplicate records and we’ve reduced strain on Hubspot’s systems (hopefully so they can send us many more leads).