Blockchain Tracing & Determinism

Much like phrenology, blockchain tracing has seeped into jurisprudence and been undeservedly elevated to the level of forensic science. But is it sufficiently deterministic to be employed so unabashedly?

Published in

ChainArgos

10 min readSep 5, 2024

A deterministic system is one where “trying again” ought to be an exercise in futility.

When blockchain tracing encounters a legal system there is often some discussion concerning whether the methods are “deterministic” or not.

But the word “deterministic” has a very specific definition that does not map well to everyday experience or colloquial use of the term.

It is common to see practitioners of “blockchain tracing” using the term “deterministic” loosely which only serves to generate more confusion not less.

Here we are going to explore this confusion around the term “deterministic,” clarify concepts so we can move beyond colloquial vocabulary, and discuss how most blockchain tracing tools in use are deterministic under some definitions and non-deterministic under others.

To oversimplify, a deterministic system is one where the same set of inputs to the system, yields the same set of outputs.

For instance, a DNA test shouldn’t produce two different results from one saliva sample. If this were the case, DNA evidence would be inadmissible in court, because such a test would not be sufficiently deterministic.

Similarly, it’s entirely possible for a blockchain tracing methodology to be inconsistently deterministic, or strictly speaking, non-deterministic in its entirety.

For instance, if a blockchain tracing system was deterministic, we would expect the same blockchain address would return the same traces regardless of when the tracing was conducted.

In the end, as we will see, many components of blockchain tracing tools are deterministic — but many tracing workflows are not.

While this difference may appear subtle, it is critical when considering how to evaluate the reliability of both blockchain tracing tools and methodologies, especially for criminal cases.

A blockchain tracing algorithm can be deterministic while, simultaneously, the tracing process built around it is not.

For the American system of jurisprudence, at least, that subtle difference where the algorithm is deterministic but the blockchain tracing process itself is not could be sufficient to create “reasonable doubt.”

Just as we don’t expect the defendant’s DNA test to come back with a different result depending on the day of the week it was taken, courts shouldn’t rush to admit blockchain tracing testimony in court until we know whether the day of the week affects the trace results.

Deterministic

But first, let’s understand what the term “deterministic” means. Wikipedia defines,

a deterministic system is a system in which no randomness is involved in the development of future states of the system.

This is clear as far as it goes.

An easily-testable consequence of a system being deterministic is that when presented with the same inputs it always generates the same outputs.

That sort of behavior is not the definition of “deterministic” — but an algorithm in which there is zero randomness will always give the same output for the same input.

We suspect that when most people hear that something is “deterministic” this is what they understand.

Note to programmers: strictly speaking if you depend on any undefined behaviors (like dictionary ordering in older versions of python or some uses of goroutines in go) then your program is not deterministic. There are programs that were not deterministic and became deterministic simply by changing system versions. Which is why it’s important to be careful when making statements that have legal implications in respect of questions of determinism.

Non-Deterministic

Now let’s see what the opposite of “deterministic” means and again, a good place to start is with the Wikipedia definition,

a nondeterministic algorithm is an algorithm that, even for the same input, can exhibit different behaviors on different runs, as opposed to a deterministic algorithm

However, being “non-deterministic” is a far more subtle property than many realize.

And as we will see, there are startlingly-close connections between the way these subtleties are treated in theoretical computer science and by blockchain tracing practitioners.

In Savage’s standard intro theory textbook we get a brief history lesson and discussion on the subject:

Nondeterministic models — models that may have more than one possible next state for the current state and input — were introduced during this time as a way to classify languages.
One might ask if such a model has any use, especially since to the untrained eye a nondeterministic machine would appear to be a dysfunctional deterministic one. The value of an NFSM [non-deterministic finite state machine] is that it may recognize languages with fewer states and in less time than needed by a DFSM [deterministic finite state machine].

This question of whether non-deterministic machines can be faster is related to the famous P vs NP problem and is likely why we have such a deep and detailed analysis to fall back on. A common construction used to define and study this problem is described in the same text as:

An NFSM can be viewed as a purely deterministic finite-state machine that has two inputs, as suggested in Fig. 3.7. The first, the standard input, a, accepts the user’s data. The second, the choice input, c, is used to choose a successor state when there is more than one.

The “choice input” is a second input source from someone that is, in essence, a “good guesser.”

Note that something similar exists for Turing Machines and all the other models of computation.

The key point here is that theoretical computer scientists model the “non” in non-deterministic systems as a second input source.

Here we find our connection to blockchain tracing because the “choice input” looks a lot like someone vetting blockchain address tags and keeping the system up to date. It is a second pair of hands that also exercises some control over the system.

Tracing

Notice that under the definitions above we are always talking about algorithm input.

But the input is not merely the user’s question.

If a blockchain tracing system relies on any tags or labels or associations that are maintained outside the tracing algorithm then the tracing algorithm can be viewed as treating that information as a “choice input.”

This presents us with two options — the set of blockchain address tags is either part of the “input” or part of the “algorithm.”

And while that may not sound like it makes any material difference, it has a material impact on how the output of any blockchain tracing ought to be interpreted.

Let’s look at each case.

Where the Address Tagging is Part of the “Input”

If the blockchain address tagging is part of the “input,” then the tracing algorithm is non-deterministic from the perspective of someone that only does tracing.

The tags could be changed at any time, because recall that the tags are part of the “input” (i.e. they are one set of variables fed to the blockchain tracing algorithm).

It may be possible for a given blockchain tracing system to be configured such that tags are frozen as-of some time for a given trace.

But unless that happens — and is clearly documented along the way — the tags serve as the choice input in what is now a non-deterministic setup under standard theoretical computer science definitions.

Where the Tags Are Part of the Tracing Algorithm

If, instead, the address tags are considered part of the algorithm then we have a different problem.

In this case, a different algorithm is being applied each time the tags are updated.

So while the algorithm is “deterministic” — the investigators who use the blockchain tracing tool are working with a constantly-changing algorithm (because the tags are constantly being updated and changed).

Test results, certifications and other claims about system performance would need to be re-checked often.

How often?

As we will discuss below, this question itself requires a scientific process to answer.

So?

You may be wondering if this is really a problem.

Who cares if occasionally more OFAC addresses are tagged in the system? But that is not the right question to ask.

Rather, consider what happens if address tags are updated as a result of findings made with the tool itself.

What if an investigator, while looking at a case, updates tags relevant to that case?

This is not deterministic at all.

A human made a machine-assisted decision which changed the outcome of future queries presented to the system.

Humans are not deterministic!

And if we have human decisions inserted between two runs of an otherwise-deterministic algorithm in ways that can influence the results of the second run we have an overall process that is not-deterministic.

An Example

Imagine an investigator looking at a darknet marketplace or mixer service or some other blockchain-hosted illicit service with reasonable complexity and a lot of addresses.

Now assume the government seizes the servers used to run this illicit service and hands a spreadsheet of addresses involved in the illicit service to a blockchain analytics company.

For our purposes let’s just pretend the list of addresses was stored on the servers in a clearly-marked spreadsheet and not talk about how the process of searching those servers may itself be non-deterministic.

Now that blockchain analytics company then runs a deterministic algorithm to identify users of the service.

So far this is unquestionably a deterministic process.

Now what if the government goes and arrests a few large users and finds they all have interactions with unknown addresses which all transact with entries on that spreadsheet?

What if these users transacted with those intermediary addresses believing them to belong to the service originally being investigated? Now we are making decisions based on human-to-human discussions and inferences.

If an investigator has these new addresses tagged as part of the darknet service is the system deterministic? Or if they are tagged as a new darknet service is that deterministic?

Or what if an investigator identifies addresses which only transact with entries from the spreadsheet and has them tagged as part of the service as well? Maybe they had a discussion with their supervisor about whether this meets the required standards to tag? All of these are human decisions.

The tracing software may be deterministic — but when relevant tags are changed during analysis the overall process may not be. The more back-and-forth there is the worse it gets.

Creative readers are probably already working through the conditions under which future traces should and should not be properly called deterministic.

As none of these rules are codified anywhere, people will talk past each other until these issues are resolved.

To compound the problem now let’s introduce a second investigator.

If the second investigator’s queries depend on any of the decisions made by the first investigator, is the second investigator’s use of these blockchain tracing tools deterministic? Do they even know?

One solution of course is for each investigator, before starting a new case, to note the date and time and then — assuming the system can even do this — ensure to enter that timestamp before each query which concerns the case.

This may feel heavy-handed but surely anyone working on continually-updated software — particularly where some updates stem from the results of prior use of the same software — is not undertaking a deterministic investigative process.

System Reliability & Testing

All of this has a huge impact on how we should think about reliability and testing blockchain tracing.

For a deterministic system tests are easy — the same input should always yield the same output.

So whenever changes are made such that the same inputs no longer yield the same outputs — say a bug is fixed, or new algorithms are deployed (nobody expects systems to remain static forever)— any reliability testing that had been done previously, needs to be redone.

This includes error analyses like precision and recall, sensitivity and specificity, and any other statistical measures that would be of interest to users of these blockchain tracing tools.

And this is precisely how traditional crime labs work with respect to both periodic testing of existing equipment and the qualification of new equipment.

Further, there is a large body of research which tells us how often microscopes and PCR machines require recalibration and maintenance.

The US Federal Aviation Administration has extensive rules which control when an airplane design change requires a full recertification or a smaller “supplemental” process.

These are hard problems and we should all hope a lot of science was involved in drafting the criteria for the “Changes to type certificates affecting fuel tank flammability” and “Aging Airplane Safety — Damage Tolerance Data for Repairs and Alterations” rules.

Blockchain tracing is, conceptually, no different.

Now notice that for a system where the algorithms are deterministic but the tags are constantly changing, we are going to need to redo a lot of testing (if any testing was even done to begin with).

How many tag changes necessitate a full re-run of all sensitivity analysis?

Who knows?

That in and of itself requires study.

Realities

We are not suggesting that every time a tag is changed, a feature added, or bug fixed that all confidence in a blockchain tracing tool should be reset to zero until a full battery of tests are completed.

But, at the same time, we also do not think most blockchain tracing procedures can be properly called “deterministic.”

There is also plenty to suggest most blockchain tracing systems have woefully inadequate reliability testing, or worse still, none at all.

The default position should be that any non-trivial change to a blockchain tracing system necessitates a full re-test.

But evidence that certain types of changes have limited, or no impact on results should also be taken in to account.

Adding tags for a service that never appears during an entire investigation is not going to render that system non-deterministic with respect to that specific investigation.

But adding tags for the service under investigation, as discussed above, surely does.

Note also that reliability standards — backed by actual as opposed to pseudo-science — are needed to evaluate test results.

Software changes will sometimes result in output changes, but it cannot be the case that two different sets of outputs from the same deterministic process are both 100% correct.

Further, a change that moves from 95% to 96% “accuracy,” on whatever measure, is probably an improvement but there need to be standards and rules.

Similarly a small change to a blockchain tracing algorithm may not impact reliability.

Or it might.

This needs to be tested.

Very small changes in software can cause massive changes in results.

A one-line change could have prevented Ariane 5 Flight 501 from exploding or allowed Mariner I to reach Venus.

A single line could also have saved the Mars Climate Orbiter.

Of course these problems are not unique to rockets.

But space programs represent such giant and high-profile uses of public funds that failures will always make the newspaper and investigations will be both thorough and public.

And, because it is impossible to fully test a function that only gets deployed 10 months and hundreds of millions of kilometers from launch, mistakes will get through.

If blockchain tracing is to be used in court alongside DNA and fingerprint analysis the vocabulary and testing procedures need to get a lot closer to rockets than they are today.