#SSI101: Attest, Identify, Authenticate, and Verify

The core “functions” or verbs of today’s and tomorrow’s identity systems

Juan Caballero
Spherity
13 min readNov 11, 2019

--

Identity systems can be very abstract, heady things to grasp intuitively. One good trick for making glossaries more concrete and human-scaled is to avoid defining nouns and put the emphasis on verbs: who does what, and how, in each system?

To answer this question, we will walk through the four key verbs attest, identify, authenticate, check, as well as two more esoteric ones, federate, and verify, as well as the unsung hero of all six, register, to offer seven insights into what’s “centralized” about today’s traditional and federated identity systems, and what “decentralized” identity means in practical, concrete terms.

Spherity — #SSI101: Attest, Identify, Authenticate, and Verify

1.) Attest

Attest is a fancy word for “say,” a little more deliberate and substantial than “say” but not quite as concrete or consequential as “testify” or “swear,” which are two specific ways of attesting that stake the attestor’s reputation or consequences on the accuracy of what is attested. Statements like “ice cream tastes good,” “this mushroom is deadly,” “I once scored a perfect 300 bowling,” or even “I pronounce you man and wife” are all attestations: statements the speaker believes to be true, or at least claims to believe. Attestations can be inaccurate about things the speaker doesn’t really know, or they can be lies, but they are said as if they were true by someone who believes them or at least reasonably impersonates a believer. Attestations all carry the implicit fine print that the internet often abbreviates as “AFAIK”: “as far as I know.”

Attestations make up the lion’s share of information and data

Despite a common popular misconception to the contrary, very little information in the real world, and very little data in the digital world, could be called “factual” in the colloquial sense of completely trustworthy or uncontroversially true. Instead, the overwhelming majority of recorded information is made up of attestations, each a grain of sand in a Sahara of information that’s true as far as anyone knows.

Our modern, digital life is no exception: most of the “hard data” we hear so much about are attestations as well: readings from fallible sensors, outputs from imperfect software networked insecurely, opinions of experts, estimated calculations. Robots, hardware, and software are wrong, mistaken, miscalibrated, out of context, or malicious about as often as humans are, so computer science has historically borrowed from law and philosophy many fine-grained distinctions for relative measures of truth and trustworthiness. Keeping straight who said what, and which machines were the source of what attestations, and then using that context to weigh and interpret data accordingly, is perhaps the most important tool in the toolkits of data science, machine learning, and business intelligence. Or to put it another way, data science has just as big an identity problem as the internet economy does.

The problem with attestations is that without an identity attached to them, they are not worth very much. After all, the “I” in AFAIK needs to be definitively ascertained for any attestation to have enough context and scope to be trusted or analyzed. Unidentified attestations are just anonymous rumors, information impossible to trace or verify. The first step in evaluation an attestation is identifying who said it.

2.) Identify

We started off this glossary looking at identities, which are the basic unit of any identity system, and used the example of mushroom identification. In that context, you know you are holding a mushroom, but you want to know exactly which kind. This leads to a series of measurements and yes/no questions that can help you pin down the genus and species for safety or culinary purposes. This is a fairly typical method for identifying anything.

In the worlds of business and data, however, finding out what kind of human or machine you are dealing with is usually trivial or given by context; identification processes are overwhelmingly geared instead towards the unique and individual identity of your interlocutor. It doesn’t really matter whether a powerful corporation is trying to authenticate its interlocutor or two humans are haggling over the attestations made in an eBay listing: the process is essentially the same, at its lowest level. It begins by securing a communication channel and trying to identify who or what is on the other side. For humans, this usually means finding a name: a “legal” and confirmed name if money is changing hands, or a pseudonym or “handle” in contexts where less trust is required. For machines or software, a unique manufacturer’s serial number or other immutable identifier is usually best, since anything not verifiably fixed in hardware is easily falsified (or “spoofed” in cybersec terminology).

Human identity in identity systems is always a round peg in a square hole

So with humans and machines, just as with mushrooms, identifying your interlocutor is some combination of asking it questions, measuring it, and analyzing it for “fingerprints” and other mathematical indexes of its unique or categorical identity. Online and elsewhere in the digital world, the analysis and the measurements usually happen automatically and quietly in the background, looking at the context of communication, the identity of middlemen like browser agents or intermediary IP address that helped secure the communication channel, etc. These questions mostly provide context and the kind of human or machine. The fundamental and most important question, however, is asked directly of the interlocutor: who are you? How they answer is largely on the honor system, unless you require that interlocutor to authenticate itself.

3.) Authenticate

‘Authenticate’ is a fancy word for “prove you are who you say you are.” This can take many forms, and increasingly in today’s world, it takes multiple forms at once, in the form of so-called “two-factor” or “multiple-factor authentication.” When logging in to an online system or physically presenting yourself at a manned security gate, one is often asked to prove they are the same person today they were last week by providing a username-and-password, or a state-issued identification, but also by confirming current control of a phone number, a credit card, a biometric such as a fingerprint or a facial scan, a magnetic fob, an authentic badge, a cryptographically-generated time-specific PIN spat out by a token-based authenticator like Authy or Google Authenticator, etc etc. Traditionally, these are called “factors” because they take different verbs: one thing you are, one thing you know, and/or one thing you hold (i.e. possess).

Humans authenticate themselves by showing tokens to registries, mostly.

You could say that all of these authentication “factors” or “vectors” are variations on the theme of holding up a key. To extend that metaphor, we could say that each such key only unlocks a pre-existing and pre-defined lock: a phone number, credit card, fob, etc has to be pre-registered in a database and bound to the identity of the person, as does the token that generates a time-specific PIN or a biometric measurement against which a fresh biometric reading is compared.

In a sense, then, all forms of authentication require you to check someone’s keys against a lock specially designed and encoded to open only for that key, i.e., for the identity of which that key provides proof.

4.) Check (aka “Look up”)

Having been presented with the right kind(s) of key(s), the identifying party checks them to confirm the authentication. Against what do you check someone’s keys, to see if they match the claimed identity? Why, against an identity registry of course.

These take many forms, but any registry of personal information that can be used to correlate any of the above keys to an individual person (or machine) is an identity registry, in functional terms and GDPR terms as well. Identity functions should comprise a layer that is “missing” from the modern software stack, a gap filled by a “patchwork of stop-gap solutions,” to quote Kim Cameron’s influential 2005 essay. Even before the age of software, it was already the weakest layer in the “legal stack” of modern economics, so paper-based identity wasn’t that effective or free beforehand. Whether physical, digital, or paper-based, all identity registries are at once highly powerful, increasingly valuable, and perennially fragile accumulations of data. If that sounds morally nefarious or legally daunting, it should: identity registries concentrate immense value and incur immense risk! Anyone who tells you otherwise is probably trying to sell you something, and that something is probably a new and improved identity registry that they have patented.

Tokens correspond to a pre-existing row in a database, in most cases.

Most corporations do not want to administer their own identity registries,and outsourcing these liabilities (and responsibilities) to national and global “identity providers” is usually an obvious choice for any business not predicated on exclusive or direct access to these functions and metadata. These large “identity providers” can build identity infrastructure on an economy of scale and invest accordingly in security and maintenance. Trusting a third party to authenticate is so commonplace we don’t even think about it much.

Think, for instance, of the ubiquity of the real-world requirement to prove your identity with a “State-issued identification,” in situations where a nation-state’s word can be assumed to provide adequate assurance. Online, this is a problematic requirement not only because paper credentials are hard to authenticate remotely, but because the internet is global enough to beg the question of which nation-state. Online, it is more common to rely on identity giants like Google, Facebook or Microsoft for identity assurance and correlation, or to rely on conventional commercial banks for higher-risk authentication when [regulated] transactions are at play.

This division of labor brings its own problems of centralization and security, however. Over the course of the history of the internet, these companies have grown fewer in number and larger in size, as well as growing increasingly powerful and central to the software industry. Their security and maintenance liabilities have also grown over the years, as identity theft and other forms of abuse have “industrialized” in reaction to this concentration of power and information worth stealing.

5.) Federate

In recent years, even these identity giants are increasingly “outsourcing” and dividing their labors between one another, in what’s called a federated identity model: by agreeing to pool resources and trust each other’s registries wherever certain generic standards are met, identity providers in a “federation” like FIDO or OAuth allow end-users to authenticate directly (to a registry they own and manage) or indirecty, via any other fully-compliant member of their federation of registries. In a sense, this greatly extends the number of “locks” a given key can open, and shares overhead between members of such a federation.

Linking databases together can make authentication (against a distant registry) feel effortless and natural.

The only problem is that it still centralizes power and hackworthy identity data in relative few registries. While federation models do encourage collaboration on standards and flexibility, each individual registry is still in a position of power over each registree, or in some ways over the registrees of the entire federations, since their power still includes censorship, vendor lock-in and other monopolistic business practices, etc. Federation can seem expansive and almost global in scale, but there is still an inside and an outside, without it being obvious who made the decision when you suddenly find your files locked or your identity unlisted. Furthermore, the members of such federations share integrity and security risks as well as costs, meaning each member’s assets are only as safe from fraud and abuse as the least-secure member of their federation.

All of the aforementioned usually authenticates individual actors and the keys or tokens — once these steps have securely identified the attestor, how do we verify the attestation, particularly if it was not made directly to us, but documented and circulated after the fact?

6.) Verify

Often, we care more about the veracity of the documented attestation than the individual making it; documented attestations can be verified in ways analogous to how individuals are, through signatures, keys, and registries. When we do not want to or cannot communicate with the attestor directly, we might take an attestation straight to a trusted registry and query it about the attestation and/or the attestor.

Nowadays, most verification processes are based on the so-called “phone home” principle. This can be metaphorical in the digital world, but often in the physical world it consists of literally calling (or emailing) someone. It could mean emailing someone’s references or previous employers, logging into a bank portal to confirm someone has an account, calling the registrar of a university to validate that so-and-so actually received a degree in 19XX, etc etc. In a paper-based world, there were proprietary watermarking and paper-making technologies to make paper credentials more costly and difficult to forge; these methods kept fraud relatively rare in most times and places. In cases requiring a high degree of certainty, however, “phoning home” to the credential-issuing institution has still been the historical standard of due diligence to trust any important credentials.

Most of the time, we verify credentials by “phoning home” to whomever issued them.

Digital credentials do not work all that differently in most cases, except that in some cases the “phoning home” function might be automated, if a secure-enough channel can be established over the internet (noting, of course, that this opens yet another vector for impersonation, privacy violation, or fraud). Take, for example, the classic example of a university degree. To confirm either the authenticity of a diploma (i.e., it’s “signature,” the proof it was issued by that university’s legitimate representative, i.e. the identity of the issuing institution) or the authenticity of the degree it attest to (the identity of the recipient), you essentially have to call the university. So in a sense, the diploma may be “held” by the graduate, but if the university goes out of business and stops answering its phones, it becomes an unverifiable credential whose value is limited to the domain of its fading familiarity.

Modern cryptography makes the actual process of verifying data simpler once the communication channel is established with the administrators of the registry. Nowadays, this is mostly just being used to “phone home” in a slightly more sophisticated way, confirming keys and signatures with a centralized identity registry of a new kind, usually administered in the same way by the usual suspects. In many ways, a server is just a fancy kind of phone, and both get disconnected when the companies that own them go out of business.

But we are starting to see the building blocks come into widespread usage for documents to verify themselves cryptographically, or to put it another way, to allow minimal reliance on registries or other proprietary data sources. In the most decentralized version of verification, issuers of credentials do not need to answer the phones or maintain a server for the credentials they issue to be verified and trusted; the credentials they’ve issued can be verified against public, universal records that are persistent, and that grow more reliable over time as their information is verified by others. How can attestations and identities be verified without relying on siloed, contingent, revocable, and fragile registries? Well, they have to be registered to those universal, public records instead.

0.) Register

In a sense, the “registry” is the actor that we take for granted the most, and “register” is the function we think about the least, partly because it happens before all other events. Indeed, conventional identity systems and modern federated ones alike function the way they do because every person, machine, and other kind of actor is registered on their way in, and stays registered until they are thrown out. Whoever initially registered these entities, and manages the registries where their identity “live” thereafter, took great power at step 0, even before those entities started interacting with each other, or making and challenging attestations. Verification without these kinds of centralized, omnipotent registries requires us to rethink how and where we register identities!

The ideal state for decentralizing identity is to have a neutral, cooperative global ledger on which entities register themselves rather than applying to a registry authority and having a registration in that registry granted, denied, or, worst of all, granted for a time but later revoked. Not just registration but also de-registration (or “revocation”) and being forgotten should probably also remain within the power of each holder of an identity. And perhaps one giant global ledger large enough for every entity on earth is a little impractical and hyperbolic, like something out of Jorge Luis Borges or some other sophist philosopher; maybe a federation of independent and yet interoperable ledgers is a little more organic, reasonable, and decentralized as a goal.

Identities registered on blockchains can verify information without phoning anyone.

In many ways, the design breakthrough of SSI is that identities “register themselves” on a blockchain rather than in a storage system controlled by any one party. Once registered, an entity stays in “control” of its registration via cryptographic keys that can be used to move the data or sever all ties to it and fade into a tomb of encrypted anonymity. Blockchain technology can often be narrativized as a kind of magic that “solves” everything, but one thing it does really well is storing small amounts of information immutably in a record that all can read, yet which only the holders of cryptographic keys can update or revoke. They can be expensive and complex to maintain and govern in cooperation with many different stakeholders storing their data in it, particularly compared to a modern cloud database, but in the context of an identity registry with different power relationships and controls, that extra overhead translates to a different relationship between the administrators of a registry and the holders of identities or data registered in them.

Which brings us to our next topic: so what exactly does an SSI platform like Spherity write on (and off) a blockchain or a more private digital ledger? How can these “data trails” be analyzed to correlate, track, vouchsafe, or forget an identity? How can data about people and corporate data be encoded into a blockchain, while minimizing the risk of any personal or proprietary data being leaked, compromised, or correlated?

--

--

Juan Caballero
Spherity

Juan is Communications lead for Spherity, a software startup in Berlin pioneering nonhuman identity, SSI, and digital twins. Personal acct: @by_caballero