Single Customer View — A Data Architecture Case Study

10 min readNov 8, 2023

This is the second delivery of a series of data architecture case studies designed to prep both current and aspiring data architects for their technical interviews. This specific case is about Unique Customer View (UCV) or Single Customer View (SCV)

These scenarios aren’t mere theories; they reflect genuine situations I’ve faced professionally. Whether I was the interviewer or the interviewee, the essence of these challenges persists. Note I’ve made a few tweaks for clarity and confidentiality but the core challenges remain the same.

A quick heads up: Most companies I’ve worked with (or for) believe there’s no single “right” or “wrong” answer. It’s not just about the solution, but the journey you took to get there and the reasoning behind your choices. So, while I’m sharing one possible approach, there are countless other viable ones out there. If you have an alternative method or find an oversight in mine, please join the conversation! Through collaborative discourse and shared wisdom, we all evolve.

Travel4You — Case Statement

Travel4You is a digital travel agency with over two decades of active business. Despite its vast offerings in flight, hotel, and other travel services, its IT setup remains outdated. The company is now prioritising an IT renovation, especially in areas like Customer Relations and Sales. This case study focuses on our efforts in the Customer domain. Currently, Travel4You uses a unique identity system and multiple customer views. Their plan is to adopt leading Out-Of-The-Box (OOTB) solutions for better identity management and a unified customer view, aiming to enhance user experience and increase revenue.

The solution for this case should be focused on Creating a Unified Customer View explaining the data model behind the solution. Support it with the data entities, relationships, and the overarching architecture.

The original case also calls for us to provide a data strategy for creating a consolidated view of the customer. This strategy needs to outline its fit with the organisation's defined functional and non-functional requirements, paying special attention to scalability, security, and performance. However, since this topic is quite big, we will focus this article only in the data model, the approach and the different challenges and decisions that have to be made to logically design the Single Customer View. I’ll delve into the second part of this discussion in an upcoming piece.

Proposed Solution

Introduction

In retail businesses that have expanded their product lines, whether through growth or acquisitions, they frequently encounter significant data management issues. They often get a fragmented view of their customer information, and occasionally, they see duplicate data, which can lead to disorganisation and confusion. This gets pretty frustrating for customers as well. They’re asked to input their details multiple times, and the company doesn’t seem to recognise them when they move on to check out different products.

These companies may maintain a central database where each customer has one record, but with various details scattered across different systems. Or they might deal with multiple systems that redundantly store the same customer’s information. With the first case, the task is straightforward: simply compile all the information coherently in one spot. However, with the second one, the challenge is considerably greater, requiring the company to piece together information from various systems to form a single, unified customer view.

Current Data Model

For our Travel4you scenario, let’s assume a simplified version of their current flights and hotels customer data entities. As for the ‘others’ system, we’ll treat it as if it’s specific to car rentals.

The very simplified view of a flights system database would be just the customer main information and the IDs added by the customer on this system, like passport or national identity ID.

Note there is an association of the user with the customer. There are multiple ways of relating customer with its user but we will assume in this case that it’s specified in the customer data entity.

For the Hotel booking system database the customer record is even simpler. Just the main contact information and one document that the customer provides just to verify the customer identity at check-in.

This other system also specifies the user ID at customer level.

The Car Rental system shows different entities too. In this case the customer needs to provide proof of a driving license and it could hold different driving license records when they are expired so it needs a separated data entity.

Also the car rental line of business requires emergency contacts. This is mandatory and they are stored in another entity.

Lastly, we assume the user ID is part of the customer data entity too.

Solution Data Model

Our solution proposal for the SCV (Single Customer View) database has the main customer data entities and their relationships.

As we can see, there are some existing data entities along with some new ones. This has resulted from a process of “merging” and “refactoring.” We’ve combined the Customer data entity because all systems have a structure that’s pretty much alike, and we’ve reworked other data entities since we need to store all the data in a way that’s easy to understand and consistent.

In designing this data model, I’ve made the following assumptions. Keep in mind, there are plenty of other Single Customer View (SCV) models based on different business assumptions; these are the ones I used for my model:

We’ve merged the Customer data entity at the row level, not the attribute level. Doing it by attribute could have made this model way too complex. In real-life scenarios, you might use a hybrid approach where you combine customer records using various attributes from different systems.
Each record in the merged or refactored data entities will have two additional fields: ‘Data Source’ and ‘Data Source Customer Number’. These will identify which system the record is from and its corresponding reference. We’ve kept it simple by just using the Customer Number, but we could’ve gone for a more complex design with three fields: ‘Data Source’, ‘Data Source Ref Type’, and ‘Data Source Ref ID’. In some cases, this could point to other entities and identifiers.
We’re ensuring no data is lost. The Customer Profiles data entity will hold any customer information that doesn’t fit into the Customer data entity.
To keep the model straightforward, we’re assuming there’s just one IAM (Identity and Access Management) system for all three systems. We acknowledge different user IDs may be linked to the same customers across different systems, even if they represent the same person.
The business has set a hierarchy for system data relevance as follows: 1) Flights System, 2) Car Rental System, and 3) Hotel Booking System. Other criteria could also dictate priority, like the most recent connection date, last update, or contact verification date, but we’ve kept it simple.
The business wants to keep all customer contact details, like various phone numbers or email addresses. These will need verification: mobile numbers through SMS and emails via a sent code. Postal addresses can remain unverified for now.
We’re assuming all customer-entered data is accurate and valid. Another team will address data validity and quality, but that topic is too extensive to cover in detail here.

Now, let’s take a closer look at the different data entities from the model mentioned above.

A Note About the Customer Number

The three distinct systems each use their own sequences for ‘Customer Number’, and it would be quite risky to use the original customer numbers directly, as this could lead to sequence duplication. To prevent such duplication, a new sequence for the SCV (Single Customer View) customer number should be created. A mapping table won’t be necessary because the original records will already be linked via the ‘Customer’ and ‘Customer Profile’ entities. While there may be some complexities concerning the customer sequence and the mapping if the source systems are still operational, we’d prefer not to make this model more complicated than it needs to be.

Customer

The ‘Customer’ is a “Merged” data entity that displays a single row from one of the systems, selected based on the system’s relevance order. So, if a person is listed in all three systems, we should see only one row for them in the Customer data entity, with ‘Data Source’ set to Flights System, and the ‘User ID’ would be the specific one they use for that system. If a person only shows up in the Hotel Booking System, then their record, including the ‘User ID’, would come from that system, making ‘Data Source’ the Hotel Booking System.

Customer Contact Detail

This is a refactored data entity, operating under the assumption of “no data loss,” which means we need to retain all verified customer contact details from across the various systems. It’s important to maintain the data source to pinpoint which system logged these customer details. For instance, if the same customer appears in two systems with identical information but different telephone numbers, we would have the primary customer record, including contact details, in the ‘Customer’ entity, and an additional ‘Customer Contact Detail’ record would be created with the relevant data source and a reference to the data source customer number.

Customer Documentation

The ‘Customer Documentation’ entity will contain all official identification documents, which includes driving licenses. Therefore, the separate entity for ‘Customer Driving Licenses’ in the Car Rental System will be eliminated, and its data will be transferred to the ‘Customer Documentation’ entity. We’ll just include a new ‘Document Type’ in the lookup — “Driving License” — and migrate the relevant data to this entity.

Emergency Contact Detail

I considered incorporating this data entity into ‘Customer Contacts’, but for the sake of clarity in the model, I decided it’s best to keep it separate. As it stands, this entity isn’t present in any other system, so it essentially remains as it was initially.

Customer Profile

We’ve established this entity to hold data from the source systems that can’t be accommodated within the model due to various reasons, but we still want to preserve for future use. This approach will prevent data loss.

What Happen With the Missing Data

Another aspect to consider is missing data. Let’s say we have a customer who exists only in the Car Rental system. According to our guidelines, this customer’s information would be included in the SCV system. However, when we add this data, we might realise we don’t have an ‘Address’ field; we’ve never captured this data in this system. We could develop some strategies for handling ‘missing data’, such as setting default values or establishing rules. Alternatively, we may simply acknowledge that some customers will have incomplete records. This issue also arises with the fields ‘ID Number’ and ‘ID Type’ in the customer records of the Hotels and Car Rental systems. Even if we wish to incorporate this information into the Customer Documentation table, we lack the expiration date. Therefore, we opt to include this information in the Customer Profile as incomplete information, rather than as a complete record in the Customer Documentation table.

And The Most Complicated Part

This scenario is often a major headache for data and privacy consultants, as well as customer operations teams: figuring out which customer records in different systems belong to the same person. It’s crucial to be extremely certain that Record A in System 1 is the same as Record B in System 2, because any mistake could lead to sensitive information being exposed to the wrong individual.

The data privacy team will decide what constitutes sufficient evidence to determine that two records represent the same person. We could consider using the following pieces of information to confirm someone’s identity:

1. First and last name (take care with spelling corrections!)
2. Date of birth
3. Address
4. National identity number, driver’s license number, or passport number
5. Phone number
6. Email address

I would recommend these rules for identifying records that belong to the same person:

Rule 1 — If records have the same national ID number, driving license number, or passport number, and one other matching field, we can be pretty sure they’re the same person. This confirms there wasn’t a data entry error with the document ID, giving us high confidence in the accuracy across different systems.

Rule 2 — When a document ID isn’t available, we should match several fields to ensure accuracy. My suggestion for nearly 100% certainty is to use the first name, last name, date of birth, address, and at least one contact detail like a phone number or email address.

We should apply Rule 1 where possible; otherwise, we default to Rule 2. If we find very few matches, it’s necessary to analyse the data to understand where the discrepancies lie and why.

Conclusion

I’m a big fan of this Case Study. It’s a fascinating project in Data Architecture, and I’ve had a great time working on and building out the solution. If you’ve got comments or questions, don’t hesitate to drop them below. Wishing you the best of luck!