The trailblazer approach to data quality projects

22 min readMar 30, 2022

A collage of illustrations of a compass, a caution barrier, hand tools, and bears roaming outside a wooden fence during a lightning storm — © MIO

Planning for data quality at a business scale means navigating a tricky pair of truths:

A data quality plan can break down — maybe even catastrophically — if it’s based on assumptions that turn out to be incorrect.
To avoid assumptions, the data must be parsed, explored, and understood… which would mean starting the data quality process as a prerequisite of planning for data quality.

But the worst effects of this catch-22 can be prevented, simply by avoiding the standard “plan it all, then do it all” top-down project planning tactic.

So instead, we use what we call the trailblazer approach:

Enough top-down planning is done to give the initiative direction, but not so much that it fossilizes the details of the journey.
Each project of the initiative is relatively small-scale, so ROI from the project is returned quickly and any problems stemming from a wrong assumption in the plan are limited in scope.
Knowledge gains from prior projects combine to form a solid foundation for the large-scale data quality.

We named this the trailblazer approach because it’s what successful, real-life trailblazers actually did.

Consider the explorers Lewis and Clark.

Their expedition had a narrow goal, and they explored only a fraction of the land west of the Missouri River.

But the information that they brought back informed future expeditions. And those expeditions, in turn, informed others, leading to the American settlement of the west.

With trailblazer data quality, you can launch data quality in a sustainable, realistic, and targeted way: a way that’s rooted in fact and experience, not assumptions and suppositions. Here’s how.

WHY IS DATA QUALITY HARD?

NO RELIABLE MAP

Making a plan requires knowing what you want to achieve. So what’s the goal of a data quality project?

The obvious answer is something like “to get high-quality data.” But what does that mean, exactly?

Definitions of data quality often include a phrase along the lines of making data “fit for purpose”¹ or “fit for use”².

But neither purpose nor use are a monolith across an organization, if we set aside vague ideals like “serving customers” or “innovation.”

Data scientists and front-line customer service representatives, for example, use data in very different ways:

A data scientist needs the data to be fit for deep learning, artificial intelligence, and predictive modeling across your entire customer base.
A customer service representative needs the data to be fit for immediately and accurately responding to a single individual customer at a specific moment in time.

An illustration of a woman sitting behind a desk talking to a person across the desk. The desk has a nameplate that says “customer service” on it — © MIO

These are very different purposes, and they approach the data and the entities (people, places, and things) in it at very different scales.

An illustration of a man looking thoughtfully at a laptop which is showing a cluster graph — © MIO

And that’s not even the end of the your options. What if the purpose of your data quality is to:

Make you compliant with the European Union’s General Data Protection Regulation (GDPR) so that you don’t get fined into insolvency, by:
- Being able to find all relevant data.
- Being able to put data in the required formats.
- Being able to delete relevant data.

These examples all define data quality in terms of the use and purpose featured in definitions. They all tell you what data quality is supposed to help you achieve, in a business sense: analytical success, operational success, compliance success.

What they don’t tell you is how to do it.

And they also don’t tell you about how you’re missing important information about the data you’ll be working with. Because you probably are, even if you don’t know it yet.

CHALLENGES

Planning a project usually presupposes that you have, or can get, the information about the object of the project before actually undertaking it.

Data quality planning has to be an exception, because the umbrella of data quality extends into those very preparatory activities.

So building a plan for data quality faces some challenges that other projects won’t. At least, they won’t once data quality is in place.

SURPRISE DATA

An illustration of an exclamation point — © MIO

It’s not uncommon for an organization to not know about all of the data it has available.

Of course, that usually doesn’t mean that literally no one at the company knows about it. Instead, it’s that the data isn’t located where the “official” record would expect it. The people who work with that data on the ground do know where it is.

Sometimes this happens because the system doesn’t fully meet end users’ needs: the proper place for a particular piece of needed data might be missing from the system, or it might be too difficult to access given the user workflow.

To cope, end users introduce a workaround. Often this consists of putting the data in an unexpected place: a Notes field, Excel, a shared drive, or even an actual piece of paper in a physical filing cabinet.

Those users can find the data, but at other levels of the organization, the location — and maybe even the existence — of the data isn’t clear at all.

In very subtle cases, the data is exactly where you might expect it to be, but it doesn’t conform to the expected specifications. In these scenarios, the existence of the data isn’t the surprise — it’s the form.

In other cases, data is unknown because there’s no data management or governance structure to enable data discovery.

Without that support, the people in charge of an operational system have no way to discover when a system that they don’t oversee contains data related to theirs, except by chance. If that related data is being stored using a workaround, that goes double.

Real-life example:
An organization recorded an identifier number for each entity it knew about. There were various special cases for this identifier, some time-limited and some not, in actual operations.
The system that front-line data entry staff used only had one field for the identifier, and the official specification for the field covered only the basic case. But staff used that field for all forms of the identifier anyway. At the beginning of the project, the organization didn’t realize that the spec was incomplete.

STUCK DATA

An illustration of a closed padlock — © MIO

Another challenge is when you know about data, but can’t get to it.

Inaccessible data isn’t often literally inaccessible. It can be, of course: for example, if it’s encrypted and you do not have the key.

But more often, data is functionally inaccessible.

Accessing the data may take so much time and effort that, by the time it’s finally available, it’s too out-of-date to be useful.

Functionally inaccessible data might also happen when the company lacks the technical resources to get to it. This can happen particularly when the data is available to the system of origin, but the company doesn’t have the technology to make the data available outside of that system (often a legacy or custom system).

Real-life example:
An organization was transitioning to a new software product. One of the old systems contained essential data, but did not produce output compatible with the new system. The system’s developer did not provide a transition service, and a service provider had already turned the project down for being too complicated.

BIG DATA

A third challenge is the sheer amount of data you need to work with.

To get high-quality data, especially at the micro level required by operational initiatives, you have to do the following for each source:

Access it.
Identify and verify the data you expected to be there.
Identify and verify the data you didn’t expect to be there.
Identify the data you expected to be there, but wasn’t.
Establish the reliability of the source with respect to the other sources you have.

This is no small feat, because even an SMB can have a dozen or more operational systems. Larger organizations, or organizations with particularly high data volumes, can have hundreds or even thousands of sources of data.

An illustration of a man in a business suit as an avalanche of balls falls on him from above — © MIO

It’s difficult enough to wrangle data at that volume when every other aspect is straightforward. Usually it isn’t. Almost every data quality project also involves a large variety of data, and many will also bring some degree of velocity, adding more dimensions of challenge to the project.

Real-life example:
A company wanted to control their entity resolution process internally, instead of periodically sending it out to a service. But they had thousands of sources of data, representing tens of millions of individuals, and many of those sources had their own formats. The labor required to manually carry out their complex process on all of those sources was prohibitive internally.

ROUGH TERRAIN AHEAD

So put the lack of a reliable map to data quality together with all those challenges: surprise data, stuck data, big data.

They introduce a significant degree of uncertainty into any data quality project. You need to expect challenges. And not a series of predictable, one-at-a-time challenges, like an obstacle course.

In data quality projects, the expected and the unexpected can pile up on each other, creating something much more like the rugged terrain of an unexplored wilderness. That’s what you need to be ready for.

TRAILBLAZING: BUILD TO LAST

So how do you create objectives, define process, and start implementing a project where you don’t know exactly what you’ll be working with or what types of problems you’ll encounter?

A top-to-bottom approach is really stuck in the dilemma discussed earlier. Until you start the data quality work, you can only base your plan for the data quality work on assumptions, estimations, and best-guesses.

That’s not the kind of foundation anyone wants for a high-value project. But, somewhat incredibly, the pure top-to-bottom approach is still the route some organizations choose to go for data quality.

You can imagine the reasoning:

Data quality is important.
Our important company activities are surrounded by complex thought and process.
Data quality should be surrounded by complex thought and process.

The fundamental problem with this logic is that it forgets about growth and politics.

Companies don’t spring out of the ground fully-formed, with a library of handbooks and an office building full of middle managers already in place.

That complex thought and process that the top-to-bottom approach represents developed from months, years, and decades. Not to mention hours — days — years — of discussion, disagreement, and decision-making over what those thoughts and processes should be.

That’s how you build something that lasts. True, you don’t have years to get data quality into place. But incremental doesn’t mean slow.

3 illustrations of a building, 1/3 built, 2/3 built, and fully built — © MIO

Think of Lewis and Clark. Each day, they moved only a short distance. They gained a little bit of knowledge. But their movement was purposeful, and their knowledge accumulated.

That movement and accumulation created a collected body of knowledge. It was started using assumptions, but the end result was based on experience. It was infinitely more reliable for future travelers than Here be dragons (probably) ever could have been.

The trailblazer approach takes that natural evolution of institutional structures and intentionally accelerates that process. Working with purpose, you can make progress faster.

WHY TRAILBLAZE?

True, Lewis and Clark didn’t face problems identical to yours. You won’t have to camp in one place all winter, there are no bears in your server room, and no matter how many whitepapers you read about the cloud, a thunderstorm won’t develop in your office.

An illustration of an encampment with a wooden fence, that has bears roaming outside of it in the snow and a lightning storm overhead — © MIO

But the challenges that they faced are thematically very similar to yours:

“Yet at the very moment of doing this [Lewis] knew that much of what was offered was based on nothing more than guesswork, dimly understood Indian tales, or academic logic concocted as a substitute for actual observation. On occasion he must have felt completely adrift: how could he stake his success on the reliability of the very charts he was supposed to correct during his travels?”³

Lewis knew that he had little reliable information, and he still had to plan an important venture. This is the same dilemma data quality project planners face.

This dilemma is why the top-down, all-up-front approach has the clear flaws that we discussed previously. Using the trailblazer approach can mitigate them.

Here are some of the ways the trailblazer approach can help:

It better equips you for encounters with the “unknown unknowns.”
It can help mitigate the negative effects of intra-company politics on projects.
It can help you deliver short- and medium-term results, not just long-term.

The first two of these three benefits deal with your response to factors that are largely out of your control. This ability to adapt to the uncontrollable is a key advantage of the trailblazer approach.

All of these benefits rest on the same basic foundation: the trailblazer approach’s incremental setup.

Remember, incremental is not a synonym for slow. It means cumulative, or additive. Piece-by- piece.

You can move quickly and incrementally at the same time. That’s what the trailblazer approach makes possible.

COPE WITH THE UNKNOWN

The phrase “unknown unknowns” initially inspired ridicule⁴ when Donald Rumsfeld thrust it into the public eye (or ear) in 2002⁵, but it’s since been accepted as a reasonable description of a particular scenario.

Specifically, the scenario in which something you never thought to prepare for comes crashing out of the blue and ruins your day.

An illustration of a piece of paper labeled “Things to do today” that has a smoking hole through the middle of it — © MIO

The very nature of this unknown makes it hard to describe in detail in advance of its happening; presumably, if you’d an idea what it might be, you would have prepared for it.

It could be something like a new regulation, a natural disaster, or a change in direction from the organization’s leadership.

For illustrative purposes, let’s consider a natural disaster.

Imagine that your data quality plan calls for you to introduce a particular source system into the project as the next step. Just days before you begin, a wildfire starts near the source’s data center. The area is evacuated and the data center shut down. Normal operations may not resume for weeks.

If you committed to the trailblazer approach, you have two factors working in your favor in this scenario:

You (hopefully) equipped yourself with tools and process that are flexible, expecting the unexpected. That better prepares you to respond now.
Each of your projects has a tightly-defined scope, including this one. That means you have fewer resources at loose ends as you figure out the best way to adjust.

MITIGATE POLITICS

You have to endure office politics no matter what kind of project you’re working on, but that goes double when your project touches multiple peoples’ areas of authority.

Data quality projects will inevitably do this, introducing plenty of opportunities for politics to impede your work.

High-level politics

There’s the kind of intervention that disrupts the whole project: someone high-up comes in with a late-breaking opinion and the power to put the brakes on everything… and does.

An illustration of an unplugged power cord — © MIO

You can try to avert this kind of intervention ahead of time. But if it occurs, this kind of political delay can be treated more or less as a natural disaster.

The benefits of the trailblazer approach in this situation are the same ones outlined in the previous section: preparedness to adapt and limitations on the effects of the disruption.

Mid- and low-level politics

Mid- and low-level politics can also cause issues for data quality projects.

One of the most basic issues is simply disagreement on some point between stakeholders: what source is most important, or how a particular piece of data should be defined, or anything along those lines.

If your project requires every last conflict to be ironed out before you try to implement anything, you’ll be arguing hypotheticals for a long time. Probably long past the point when your competitors have put their data quality in place and started to see its benefits.

An illustration of a caution barrier — © MIO

The trailblazer approach helps alleviate these kinds of derailments in multiple ways.

For one, consider the scenario where stakeholders disagree on some fundamental aspect of the project. With the trailblazer approach, each project should have a tightly-defined scope.

When disagreement stalls one project, other projects where there are agreement can still proceed. The overall initiative doesn’t get hung up on a single dispute.

The trailblazer approach can also help to resolve disagreements that revolve around probable outcomes. This is done by, when a point of contention is discovered, designing a project that will reveal the answer.

The trailblazer approach encourages this kind of focused project, so it does not require any changes to the general approach to embark on the new project. With the answer to the debate set to be resolved by the project outcome, the discussion around the project can move on.

GET FASTER RESULTS

When skeptical stakeholders (and/or auditors) ask questions about what kind of return you’re seeing from data quality, it’s a lot better to have tangible results than to ask them to come back in 12 months.

With the trailblazer approach, you can start getting results almost immediately. All this entails is defining your initial project scopes so that they are compatible with the timeframe you want to see returns in.

An illustration of a running cheetah — © MIO

Those initial results not only provide a more solid foundation for your future projects, but act as assurance to the skeptical that data quality isn’t vaporware.

Faster results also allow trailblazer users to achieve a level of self-driven course correction that isn’t possible during projects with a longer results cycle. This is another aspect of the trailblazer approach that can appeal to skeptical stakeholders.

HOW TO TRAILBLAZE

Real-life explorers didn’t all follow the same template for trailblazing. You could easily occupy your next year reading about their efforts.

Some of those real examples you don’t want to follow.

In 1878, for instance, George De Long and James Gordon Bennett Jr. were 100% sure there was an open polar sea, and they laid their plans for an Arctic voyage to the North Pole accordingly. Their ship was crushed in ice and sank, and 20 members of the 33-person crew died. They never reached their goal.⁶

An illustration of a three-masted ship trapped in ice — © MIO

Successful examples of trailblazing — the ones you want to follow — have some things in common:

A commitment to goals over process
Defined accomplishments
A sense of realism
Embracing the need for problem-solving

COMMIT TO GOALS, NOT PROCESS

Just like incremental doesn’t mean slow, exploring doesn’t mean aimless wandering.

The trailblazer process is all about setting goals. What it’s not about is deciding on a specific process, route, and direction for reaching that goal, then clinging to that decision no matter what happens along the way.

Our example real-life trailblazers Lewis and Clark had a very specific goal. Thomas Jefferson spelled it out:

The object of your mission is to explore the Missouri river, & such principal stream of it, as, by it’s course & communication with the waters of the Pacific Ocean, may offer the most direct & practicable water communication across this continent, for the purposes of commerce.⁷

Later in his instructions, Jefferson acknowledged the uncertainty surrounding the expedition:

As it is impossible for us to foresee in what manner you will be recieved by those people, whether with hospitality or hostility, so it is impossible to prescribe the exact degree of perseverance with which you are to pursue your journey… if a superior force… should be arrayed against your further passage… you must decline it’s further pursuit, and return.

He was clear about his reasoning:

…in the loss of yourselves, we should lose also the information you will have acquired. by returning safely with that, you may enable us to renew the essay with better calculated means.⁸

The primary thing that sets the trailblazer approach apart from the all-up-front approach is its prioritization of the goal over the process. When the process is secondary to the goal, it is acceptable — in fact, obligatory — to change the process.

This is useful in the short term, of course. If you’ve invested in a project and it’s not working, the trailblazer approach says you should cut your losses, consider everything you learned, and use that to guide you as you try again. It’s a much more appealing prospect than indefinitely sinking resources into something that isn’t working is.

An illustration of a compass pointing north-northeast — © MIO

But it’s also useful in the long term. If some aspect of your organization changes, so that the old process no longer works, you’re prepared to change the process to keep up. With the trailblazer approach, your data quality can evolve long-term as the company does, without holding it back.

Real-life example:
A company wanted to accomplish a particular kind of entity resolution. They thought other organizations would have tried the same thing already, establishing an approach of how to solve the problem — but they hadn’t. People who had tried something similar had created solutions with such tunnel vision that they couldn’t be applied to any other problem.
The company realized they’d need to figure things out on their own, and that they’d need a tool that offered functionality that wasn’t locked down to a particular use case in order to help them. The trailblazer approach brought them the broad set of capabilities they needed, in a way that they could shape to address their particular challenges.

DEFINE ACCOMPLISHMENTS

Lewis and Clark’s party didn’t disperse into ones and twos in order to cover as much ground as humanly possible, but they also didn’t refuse to split up at any cost.

With an eye on the main mission, they divided their efforts in the way that made sense toward achieving their overall goal, pivoting the goals of a particular day or undertaking to meet unexpected conditions.

Starting your project with a scoped, focused goal will get you much better results than a vague objective to “get data quality.” That means clear, measurable, and above all meaningful KPIs.

This is uniquely important to the trailblazer approach precisely because the commitment to a particular process is flexible. When the success of the project can’t be measured in terms of adherence to the predefined process, it has to be measured with respect to the goal.

In addition, data quality projects aren’t carried out by single individuals. They’re the product of a team, and having clear KPIs allows every member of the team to have a common vision of what the project is supposed to achieve.

It’s essential to remember that the KPIs should be meaningful — that they should really indicate an accomplishment. Sometimes, that means that a simple quality score for a data set isn’t enough.

You need to be thoughtful about choosing your KPIs and ensure that they reflect the accomplishment of the business goal, not just the completion of the mechanics of data quality.

Real-life example:
A company wanted to perform data quality on financial account data. The number of accounts with high-quality data was an obvious choice for a KPI. But this KPI alone was incomplete.
To reflect the company’s real goals, a KPI that measured the dollar amount associated with accounts that had low-quality data was needed. This KPI could show the severity of the effects that low data quality had on the company.

BE REALISTIC

Data quality, once in place, makes exciting things possible. But the more resources you put into it, the more important it is to make sure you’re directing those resources appropriately.

Lewis and Clark’s expedition was not the first time that anyone of European descent ventured west of the Missouri River (in fact, they encountered the British at least once on their journey⁹), and of course the Louisiana Territory and beyond was well populated with Native Americans.

Lewis wasn’t going in completely blind. He knew about major features of the terrain: the existence of the Columbia River, that it exited into the Pacific Ocean, that there was a mountain range they would likely need to cross.

But he didn’t have information that he could truly rely on, not to the extent of trusting his own life and those of his fellow explorers to it.

An illustration of mountains, a river, and the mouth of a river — © MIO

If Lewis and Clark had chosen a purely top-down strategy, they would have chosen a chart or made some amalgamation of them, and decided to trust that. They would have equipped their party based on the distances, geological features, and timespans indicated by that chart. They would leave some space for the unexpected, but would otherwise assume that they were fully informed.

In the context of literal wilderness explorers, this pure top-down approach sounds obviously like a bad idea.

It’s just as misguided to deploy it in a data quality context, although you shouldn’t throw out every aspect of top-down:

If you have information about the types of challenges you’ll encounter, it’s smart to prepare assuming that you’ll have to confront those challenges, even if the information isn’t completely reliable.
The very act of defining your ultimate goals for data quality requires you to take a top-down perspective.

The key to being realistic in your use of top-down is to avoid overcommitment.

As an example, one of the most hazardous ways to overcommit comes, unfortunately, at the beginning of your data quality journey: when you pick a tool.

You should be guided by your top-down perspective when you consider factors like:

Type of tool. Do you want software, services, or both?
Technical needs. Is there existing data governance or other programs it needs to work with? What kinds of sources will it have to access?
Known problems. Do you know you’ll have to perform international address cleansing, or find errors in goAML files?

But you have to remember that your plan isn’t perfect, and your tool needs to be ready to help you with what you aren’t expecting. A tool that can do exactly what you think you need isn’t enough, at least not for long-term success.

Your first data quality effort can’t have the goal of overcoming all the data quality challenges of your entire company. That idea is as unrealistic as it would have been for Lewis and Clark to attempt to map the entire continent in one expedition.

An illustration of menacing-looking mountains — © MIO

Instead, choosing realistic, achievable goals opens the doors to success, both now and long-term.

Your first project should give you insight: into your data, your systems, and the realities of how deploying data quality in your organization’s culture will work. That puts you in the strongest position to achieve your ultimate long-term goals.

Real-life example (and warning):
We’ve seen it more than once: a successful pilot project concludes, but the next initiative is supposed to have a much larger scope, making it “important enough” for the top-down approach. Planning for that larger project then stalls out indefinitely.

PROBLEM-SOLVE

Uncertainty is inherent in data quality. For the best chance of success, you need to embrace opportunities to solve problems on the fly.

Some “theoretical geographers”¹⁰ of Lewis and Clark’s time believed that the Missouri River’s headwaters were very close to the Columbia River. But Lewis had also heard the opposite — he “had read somewhere that many miles of treeless plains bordered the upper Missouri.”¹⁰

This presented a problem. The expedition would to set out on the Missouri River by boat. If there were no trees once they reached its headwaters, a portage between rivers, not to mention building boats for the trip down the Columbia, would be impossible.

So Lewis designed a collapsible iron boat frame. The frame was to be carried on land, and covered in hide when it was needed. It was a strong example of problem-solving spirit, although it didn’t work as he’d hoped.¹¹

The Lewis and Clark expedition was also prepared to live off the land. They had some specialty equipment, like the iron boat, but they also had the basics like adzes, axes, and chisels, so they could create what they needed.¹²

To succeed at data quality, you need to equip yourself with the tools that you’ll need to problem-solve — especially for the problems you don’t foresee.

An illustration of a screwdriver, a hammer, and an adjustable wrench — © MIO

If you don’t have the fundamental capabilities to perform data quality at the most foundational levels, encountering the unexpected will be a serious setback.

No matter how confident your theoretical geographers are in their specialty tool recommendations, the fundamentals — finding data, exploring it, profiling it, extracting it — are what will let you bail yourself out when the unexpected happens.

Real-life example:
An organization needed data quality to support a project that was driven by regulatory objectives and had to respond to several government bodies. Although initial requirements were set, the organization knew that regulations are prone to change.
When searching for the tool, they recognized the value of flexibility for meeting future requirements.
Within six months of initial deployment they needed to generate new metrics and add to their reporting. The flexibility of the trailblazer approach to data quality in their project was instrumental in making the right information available.

CONCLUSION

Data quality can be transformative for an organization and its operations.

But planning for data quality isn’t straightforward. A purely top-down approach will be based on assumptions, inviting delays and derailment when something unforeseen comes up — which is inevitable.

Yet resolving all those assumptions before planning means carrying out data quality in order to plan data quality. And a purely bottom-up approach doesn’t provide enough structure or framework to support a long-term deployment, particularly in an enterprise environment.

The answer lies in following the example of real-life successful trailblazers like the Lewis and Clark expedition.

With the trailblazer approach, you set top-down goals that are guides, while remaining flexible on the route you’ll take to get there. You use small-scale projects to build knowledge and experience, rather than risking all your resources on one project.

The trailblazer approach is an approach of flexibility and problem-solving, committed to accomplishing specific goals rather than carrying out a specific process.

With trailblazer data quality, you can launch informed data quality in a way that minimizes the risk of the inevitable “unknown unknowns” you’ll face, and prepares you for sustainable, long-term success.

¹ “Ensuring the quality of ‘fit for purpose’ data”: John Ladley, https://www.cio.com/article/3124402/analytics/ensuring-the-quality-of-fit-for-purpose-data.html, 17 October 2016.

² Gartner Magic Quadrant for Data Quality Tools: Mei Yang Selvage, Saul Judah, Ankush Jain, 24 October 2017

³ “The Way to the Western Sea: Lewis and Clark Across the Continent”: David Lavender, 1998. University of Nebraska Press. p 29.

⁴ “Rumsfeld’s unknown unknowns take prize”: John Ezard, https://www.theguardian.com/world/2003/dec/02/usa.johnezard, 1 December 2003.

⁵ “DoD News Briefing — Secretary Rumsfeld and Gen. Myers”: Federal News Service Inc., http://archive.defense.gov/Transcripts/Transcript.aspx?TranscriptID=2636, 12 February 2002

⁶ “In the Kingdom of Ice: The Grand and Terrible Polar Voyage of the USS Jeannette”: Hampton Sides, 2015. Doubleday.

⁷ “The Way to the Western Sea: Lewis and Clark Across the Continent”: David Lavender, 1998. University of Nebraska Press. p 390.

⁸ “The Way to the Western Sea: Lewis and Clark Across the Continent”: David Lavender, 1998. University of Nebraska Press. p 392.

⁹ “The Way to the Western Sea: Lewis and Clark Across the Continent”: David Lavender, 1998. University of Nebraska Press. p 152.

¹⁰ “The Way to the Western Sea: Lewis and Clark Across the Continent”: David Lavender, 1998. University of Nebraska Press. p 25.

¹¹ “The Way to the Western Sea: Lewis and Clark Across the Continent”: David Lavender, 1998. University of Nebraska Press. p 226–228.

¹² “The Way to the Western Sea: Lewis and Clark Across the Continent”: David Lavender, 1998. University of Nebraska Press. p 24.