Scaling Ancestry.com: Providing Billions of Personalized Hints

Published in

Ancestry Product & Technology

4 min readJul 21, 2022

When I arrived at Ancestry more than 20 years ago, my new coworkers were wearing t-shirts proudly proclaiming “200,000 subscribers”! It was a tremendous milestone for the company, but we also had a bigger vision. Our mantra was “prepare for 10x growth.” Little did we know that genealogy would become one of the most popular hobbies in the U.S.

Over the years, we would be challenged to scale our features for far more than 10x the growth we were imagining then. We have always been excited to hear feedback from our customers when they have discovered new and interesting family connections through our service.

Our customers work on building their online “family trees”on Ancestry, with the family tree nodes representing a person or an ancestor for them. Ancestry also hosts a vast variety of Content Collections (historical records such as census, birth, marriage, death, military and immigration) and we in-turn create a “recommendation” or a “Hint”, that is information from Ancestry that is likely to be relevant to a person in a user’s family tree.

When Hints are created, there is a vast amount of data to scan through to find personalized, relevant information for a person in a customer’s family tree. Also, in addition to the current Content Collections we have, new content is added at an amazing pace! As a user of the site, I am constantly surprised at the volume of new content we are adding — recently adding the US 1950 census collection.

Hints

When a user edits their family tree, for example, adding a new person or updating an existing one, we provide Hints (e.g. a Content Collection record associated with that person) applicable to those changes in nearly real-time. In addition to our existing Content Collections, the Ancestry ecosystem is updated with new user-generated content which includes photos, stories and records. This new information must be readily accessible as a Hint. We have 30+ billion historical records, 13+ billion ancestral profiles and more than 800 million photos and stories. Our technology must manage both the volume and quality of the information to ensure that the most useful Hints are provided to a user in a matter of seconds.

Preparing Hints for the User

In the early days at Ancestry, we had a fantastic home-grown search engine that could search through our large Content Collections very quickly. We used this search engine very effectively to quickly find Hints. As our user base increased, and as millions of historical records grew into 10s of billions, we realized that we had to move to a different technology to accomplish our goals.

We realized that by storing relationships to Ancestry content in a graph database, we could find new Hints for our users in a much more efficient and performant way.

The graph database knows all about the historical records and relationships within the family trees. It has no knowledge of Ancestry users or how they have interacted with past Hints. We wanted a customer experience that is personalized to the way they use our product. We decided to prepare or pre-generate Hints for the user and store them in a relational database. In addition to “prepared” Hints, we also save a user’s reactions to Hints so we can better serve Hints and customize the experience to individual users. We prepare a lot of Hints — nearly 60 million a day.

This generates lots and lots of data, “Big Data” at its best. The size and scope of the data set is a technical challenge. We found that breaking up our large data set into manageable pieces was a good approach to the problem.

We decided the best approach for breaking up our data would be to implement a custom sharding algorithm at the application layer.

This was the only way we could continue to use our Hints database. This created a huge increase in development complexity. Also, we added additional sharding logic to know exactly how to find our distributed Hints.

Keeping the prepared Hints relevant and fresh for our customers is one of our biggest challenges. It isn’t practical to peruse through the billions of people in our online family trees to find when a user needs new, more relevant Hints. We determined it was necessary that we focus on where we prepare the Hints by watching where the customer was working in their online family tree. We receive user activity events through a well-partitioned queuing system that helps with keeping the Hints up to date.

We also prepare Hints in areas of the customer’s family tree where our predictive models say the customer is most likely to visit next. This allows us to create Hints that enable our customers to grow their trees most efficiently. This is an ongoing effort for us and lots of fun.

Continuous Improvement

Be it finding Hints, adding new types of Hints, or delivering Hints, we are always trying to do it faster, cheaper, and more intelligently. It is very challenging (and sometimes even aggravating) but our team loves having complex and interesting problems that need innovative solutions.

If you’re interested in joining Ancestry, we’re hiring! Feel free to check out our careers page for more info.

Scaling Ancestry.com: Providing Billions of Personalized Hints

Written by Davehallmeyer