SEO At Sesame

Wesley Connor
Sesame Engineering
4 min readApr 6, 2021

--

Scaling our sitemap from 50 to 6,000,000 url’s

Sesame is disrupting healthcare in the United States, injecting modern consumer standards into an outdated industry by connecting patients with quality doctors at affordable cash prices. for the company and, as a result, doctors are now offering more services in more locations and treating more patients everyday. An important part of our growth is ensuring search engines like Google have an up-to-date of Sesame’s URLs. As part of our SEO efforts, we needed to upgrade our sitemap from a hard coded list of static URLs to a dynamic and fast new implementation. Here is how my team did it.

First, we built a sitemap.

Sounds simple, right? Wrong. Our content is appointment-based, so we needed to generate these pages frequently to stay accurate and up-to-date. In addition, we had the added challenge of making sure Google continues to recognize the most up-to-date information. The content of the links is easily and securely accessible from our microservice backend, so we need to build a tool which can transform the data into the sitemap XML format.

Enter cloud functions. One of the great things about working for Sesame is the freedom to choose the right tool for the job. Cloud functions are lightweight, serverless containers where you run short-lived code. This provided the perfect solution to our problem because it allowed us to schedule them as frequently as we wanted without worrying about the resources associated.. Our services and location split along US state boundaries quite well. So we decided that running a function per state would allow us to run many jobs in parallel and gave us the flexibility to update some states more often.

The code,therefore needed to ingest data, transform to XML, and write to a Google Cloud bucket.

Here’s how it works. Our services publish all changes to the entities they own to Pubsub. Using these messages we construct a combined/denormalized view of our inventory that is easier to query. By combining our services, locations, and availability into a single microservice, we can quickly query for the right services in each state. With one call to our search microservice and another to our location microservice for relevant cities, the cloud function simply has to iterate the lists transforming all the combinations into URLs.

Due to the constantly shifting nature of doctors appointments, it’s important the sitemap is as up-to-date as possible. Telling robots (and users) that an appointment is available when it isn’t is a bad experience for everyone involved.

To avoid this, we have to trigger an update to the index every 30 minutes using a cron job, we also want to run these jobs in parallel for speed. Running all states sequentially would take about 15 minutes and probably exhaust the memory of the cloud function. The beauty of the cloud-function implementation is that it is actually listening to a PUBSUB queue of its own. This means that the original cron trigger actually spawns off a new pubsub message for each state, 1 message becomes 50, each containing an individual state to process. Then, the cloud function can read from the queue and process individual states as fast as possible.

In practice, we limit parallel runs of the cloud functions so as to not overwhelm backend microservices. Running 10 in parallel, we still finish building all states in under 2 minutes.

Once all the states have written their URLs as JSON to the Google Cloud Storage (GCS) bucket, another cloud function runs (30 minutes after) which will find all JSON documents, convert them to XML and group them into a single sitemap-index.xml file. This separation of concerns means multiple microservices can produce URLs in parallel and not have to worry about synchronization.

Public access

Having our sitemap in a Google storage bucket allows for fast, reliable access for search robots. Still, the URL is not ideal. Ideally, a sitemap index is available from the root domain, e.g. https://www.sesamecare.com/sitemap-index.xml, but this is not the case for a Google bucket. Ambassador is a highly configurable and high performance load balancing api gateway. It is central to our infrastructure and is perfect for allowing us to easily configure a redirect from our main domain to the GCS bucket.

From there we needed to update our robots.txt to refer to the new link and the site is live! 6 million URLs updated every 60 minutes.

The future

As we expand into more cities in the United States continue growing the number of services our doctors offer to patients nationwide, just how quickly this sitemap will grow. Tens of millions of links is likely, and we will start to reach some constraints of the cloud function setup. With this in mind we will need to look toward a streaming-based technology so we can update the sitemap in real time.

--

--