1 Million Leads: Lessons in PostgreSQL Data Modeling

Stephen Grable
Broadlume Product Development
5 min readMay 5, 2021

--

If you’re in the business of building an application to help your clients manage and sell more customers at scale, then this article is for you.

At Broadlume, we work with thousands of small and medium size flooring dealers to help them manage online leads from potential customers.

Last week, we hit over 1,000,000 leads in our PostgreSQL database.

Currently we process about 60,000 leads per month

Our application, DealerHQ, is the central aggregation for customer outreach through online free sample ordering, website form submissions, phone calls, website chat widgets and more.

We’ve been in development on this leads management system for a little over 2 years and we’ve learned a lot. Some of those lessons and learnings I’ll expand on later in this article.

Our Ecosystem

As a company, Broadlume provides a suite of high quality products that help our clients gather and sell potential customers.

All customer outreach funnels into a single application called DealerHQ. In DealerHQ, our clients can view detailed lead information, use advanced lead filtering, search lead contents using PostgreSQL text search vectors and more.

When we process leads, we immediately attempt to match it with an existing customer’s information. If we find a match, we automatically associate that lead to a centralized customer in our database.

This matching system allows dealers to view all outreach from a single contact without doing any extra work. This concept is built on an internal matching system and a unique PostgreSQL database model which I will describe in more technical depth later in this article.

Leads Processing Walk-Through

Let’s walk through an ideal case of processing a phone call lead via a tracking number on our client’s website.

Step 1.

The customer calls a tracking number on the client’s website to ask about a product they saw online.

(You can use products like Twilio or CallTrackingMetrics to generate these tracking numbers and capture these types of leads)

Step 2.

When the call ends, a webhook sends the phone recording and lead data to our DealerHQ REST API endpoint.

Step 3.

Our system then creates a lead and will associate it with a retailer and a specific store based on the number they called (if our client has multiple physical store locations).

Step 4.

After the lead is created in our database, we then search for a record in our system with the same phone number. If one exists, we associate the lead to a central database record we call the inferred_prospect. If we don’t find a match, we will create a new inferred_prospect for this phone number.

These four steps are the result of years of backend work within our application code and a very close relationship with customers who helped steer us in the right direction.

The keystone to this entire application, from the database to the user interface, is an entity we call the inferred_prospect.

Inferred Prospects: The Data-Rich Join Table

Our (currently in beta) contacts index page utilizing the inferred_prospect table

inferred_prospects is a database table we use as a join table between our leads table and the contacts table. This is not your regular join table that simply matches foreign keys, this is a data-rich join table that powers our entire DealerHQ application.

We treat the inferred_prospects table in our PostgreSQL database as the source of truth about a single contact or potential customer.

The inferred_prospects table has many associations but we do not edit or override its main data attributes, which are the email and phone_number columns.

We keep this data immutable because we use these columns as the matching mechanism to aggregate leads together. Keeping these main table attributes “locked” is integral to our data integrity and the system as a whole.

Contacts: The User Generated Table

contacts is our mutable table that is user generated and has a one-to-one relationship with inferred_prospects. Within our interface, users can create a contact from any lead. When that occurs, we immediately pull in all associated leads into the contact view through the shared inferred_prospect.

Users can then edit the contacts table safely because we know that the inferred_prospects entry will stay intact with the important data that a user might want to re-expose later.

Database Modeling and Architecture

Let’s spec out how you would setup your database to recreate this system.

Below is a diagram of the steps our system takes to process leads from multiple channels. The matching mechanism outlined in Step 2 is out of scope of this article but is key to the system as a whole:

Below is a basic diagram of our database modeling set up in PostgreSQL:

The inferred_prospect table is never visible to the user and only serves as a persistent datasource

User Experience

This database model allows us to create an incredibly simple and flexible user interface for our users. With this data model, users can manage repeated outreach from a single customer, clean up their dashboards by deleting contacts and maintain important customer data on the inferred prospect that can always be re-exposed if a user chooses.

This three table setup has allowed us as an engineering team to build an intuitive leads management application. This gives our clients a straightforward way of managing millions of leads.

--

--