What is Digital Identity?
“What’s in a name? That which we call a rose by any other word would smell as sweet.”
-William Shakespeare, Romeo and Juliet
On April 10th, Mattereum hosted the third Internet of Agreements® (IoA) conference at the Google Campus in London, dedicated to the topic of “Identity”. This is in introductory article, that examines various models — their methods and implications — from multiple angles in order to construct a practical framing of identity systems past, present, and future. You can find all materials from the conference here.
The Sapir–Whorf Hypothesis, also known as the principle of linguistic relativity, posits that language constructs our reality and worldview. While the hypothesis has been contested over the years, language is unarguably fundamental to the models of the world we build in our heads — and in our systems.
Society is a sea of identity. In order to orient ourselves in a world of physical and mental objects in constant motion, we use language to give things stable names. Unsurprisingly, it has been a staggeringly complex and messy process over the years. Governments, corporations, social media behemoths, communities, trust-minimized distributed networks of value; we have created many systems with as many different models of identity, all with different perspectives and aims. In contrast to the individual experience of self, when we talk about digital identity we are not talking about a singular, global identifier for an individual across all of their online activities but rather the means by which people can securely interact and transact in a specific context online.
Trust, and the Act of Naming
One of the commonest reasons for creating identity systems has been the establishment of trust and accountability: for the most part, society prefers not to deal with ghosts.
An examination of identity systems is an examination of names referencing entities or resources, the construction of namespaces, and the formation of trust frameworks. There are various interesting questions to ask about these systems. Who assigns the name? Who/what is being named? How is the name assigned? What capabilities does the name provide? What are the relationships between the names? Who or what gives the name efficacy and meaning? And can the user trust the identity provider with their privacy and the way their virtual identity is presented to the world?
Naming is the practice. Trust is the goal.
For a brief history of digital identity systems, see A Brief History of Digital Identity.
Before we attempt to construct a useful ontology for identity, let’s take a look at some different schools of thought. This will let us explore various frameworks and perspectives, and help us set the scope of the problem.
I am that I am
Our notion of self is a complicated affair, as a moment’s introspection will reveal. From a developmental psychology perspective, we have no identity in our early years: it is completely outside us. The main focal points of our world are our parents or guardians, who nurture us and protect us and occasionally cast a particular word in our direction to get our attention: our name. Our sense of self vs other starts to take shape around the age of three.
We are embedded within social structures from the moment of our birth. We are given a name. Our reality tunnels encompass incredibly little at first but are then shaped by others: our parents, our schools, our friends, our media, our churches, our communities. Our given or self-assigned name helps us connect, bond, and interact with other entities in the world in order to establish trust. Without trust, we lack the means of co-creation and co-existence that is one of our defining traits as a species.
We are works in progress; there is no final definition of who we are, thus making identity difficult to put in a box.
This personal view of identity is important to consider as we explore the others since it underlies them all. If we cannot consistently answer the question “Who am I?” context to context, then the systemization of identity is a colossal challenge, if not an impossible one.
Seeing Like a State
The state’s view of identity is that a person is a unique collection of facts derived from various source documents — birth certificate, Social Security Number, bank accounts, driver’s license, passport, etc. This perspective of who a person is reductive and definite; simplification and certainty are desired properties of any system of control. It is difficult to make sure millions or even billions of people are paying taxes or otherwise held to account within a certain region if they are not easily catalogued. Unfortunately, it is precisely this cataloguing of the human that makes most of us uneasy. This clashes with our personal experience of identity: are we not more than the sum of our parts or words on paper?
While some state entities around the world simply containerize the documents mentioned above so that they can be moved around using secure networks, others have taken a different approach. Let’s take a brief look at two: Iceland’s kennitala and India’s Aadhaar.
The Icelandic kennitala (Icelandic for “mark number”, in the sense of a “distinguishing mark”) is a unique ten-digit ID number assigned to all individuals at birth, as well as registered foreign nationals and corporate entities. In the personal kennitala, the first six digits are the individual’s birth date in day–month–year format, the seventh and eighth are a random number between 20 and 99, the ninth is a checksum digit, and the tenth indicates the century of birth. This allows for up to 80 people born on the same day; since the total population of Iceland is about 335,000, the actual figure was under 30 even in 1960, and is now nearer to 10; even allowing for centenarians, that’s enough!
While there are other European countries that have their own national ID systems, the Icelandic system is unique in how pervasive it is. Government. Companies. Schools. The thoroughness and completeness of the Icelandic ID system is such that it gives always-up-to-date population stats, whereas a census only gives a snapshot perhaps every decade. Accurate up-to-date population stats can be obtained with a database query.
The Indian Aadhaar system is the largest digital identity effort in the world (1.19 billion registered) and the first rollout of a nationwide biometrics-based ID system. The 12-digit Aadhaar number is linked to a central database entry that contains biometric data including ten fingerprints, iris scan, face scan, and biographic data of region/place of birth. Here we have another instance of cataloging the human by means of physiological data.
Not only is this massive database susceptible to hacks due to its centrality, design, and use in society, it has also been used as a tool for systemic corruption. In India, the dead walk among the living. The Uttar Pradesh State Government over the years has falsely declared living individuals dead in order to obtain their property rights. The Uttar Pradesh Association of Dead People is a pressure group seeking to combat this corruption and reclaim their lives and associated rights. With the same ease with which population statistics can be obtained by querying a database in Iceland, the powerful and connected of India could have someone declared dead. With property rights being the domain of the state, they are effectively dead within the reality of the system, cut off from all manner of services as if by sympathetic magic, a real-life reversal of Dead Souls, Nikolai Gogol’s 19th century novel in which the protagonist buys deceased serfs — a significant tax burden due to the infrequent censuses conducted at the time — from their owners, to use as collateral for a loan.
The trust structure within state-issued identity systems is clearly top-down, and there is no such thing as trickle-down trust.
The purpose is control. The model is reductionist. The game is power.
The Commodity of You
Another form of institutional reality lives in the market: the corporation. The earliest databases such as Dun & Bradstreet in the US and Companies House in the UK are massive siloes of commercial information decades in the making. Each has its own namespace: the DUNS number, for example, is a unique nine-digit number assigned to a corporate entity within the D&B database. While the purpose of Companies House is primarily to register companies operating within the UK as a government-mandated duty, D&B stores information intended to be used for commercial decision-making. While these two are examples of massive corporate namespaces, they are relatively benign. It is when the contents of the database is a facsimile of ourselves used to market to us that the game changes altogether: enter the social network.
Facebook is a database disguised as a network. Over two billion users have joined, bringing along with them their social graph, all of the accumulated edges connecting them with their friends, families, institutions, ideologies, interests, etc. Unfortunately not too far from hyperbole, Facebook can use this information to create a simulacrum, or homunculus, of you within their system. The algorithms and the data sets you unwittingly feed them can literally be used to change your reality and emotional well-being. If the recent Cambridge Analytica scandal and the 2014 Emotional Contagion Experiment are any indication, our data has become valuable, even powerful, in ways we never anticipated. Similar to the state, trust is centralized with the service provider whom we have come to rely upon to stay connected to others in the 21st century.
This commoditization of our online identities leads to the next model of identity, that of property.
Data as Private Property
The public discourse around the privacy and custody of our personal data is largely a result of the distaste with the corporate practices mentioned in the previous section. Facebook became one of the largest companies in the world by extracting untold value from the populace, under the guise of providing a free service. According to Jaron Lanier, it is the choice of these social media companies to not charge for their service that has lead to the subliminal, extractive revenue models that has caused so much ire and controversy.
In order to keep these companies in check, regulation is currently being enacted in different parts of the world, notably the Global Data Protection Regulation (GDPR) in the EU, essentially to flip the script on data ownership. Whether these efforts will lead to significant change in corporate practice or not is uncertain. There are underlying technical challenges to consider. Facebook, in light of the Cambridge Analytica scandal and the imminent enactment of GDPR and other legislation, has said that it would take up to three years for the company to fully adapt its design.
However, systems of data ownership and custody alternative to the aforementioned models have not been fully realized. While GDPR and other regulations can apply the right pressures to companies in order to make them more accountable, trust still matters. Perhaps Facebook’s recent interest in blockchain will result in solutions which offer more transparency and accountability among the network’s participants and less centralization of trust.
Identity Through Community
As we established earlier, the lived experience of identity is not constrained to the individual but extends outward to their community. The social networks uploaded those connections (the social graph), but at the cost of users’ privacy and abuse by corporate and political interests. Can we look to our communities to provide a trust structure suitable for a secure identity system?
Two previous attempts at identity through a community were PGP’s web of trust and CAcert’s governed approach to certificate issuance; however, these were largely the domain of the tech-savvy and were unsuitable for wider adoption, despite solving some of the problems with the longstanding PKI/CA model. We can extract two interesting elements from these projects: web of trust’s user custody of cryptographic key pairs with accumulating signatures and CAcert’s method of holding key holders to account via arbitration. The first highlights the connections between individuals in the web of trust and the second implements an accountability structure on top of the web of trust.
Another community-based, bottom-up identity model currently in the works is ChamaPesa. A chama is an informal cooperative society whose members, typically neighbours, pool and invest regular savings. Originating in Kenya, Chamas now control approximately 40% of Africa’s GDP. ChamaPesa is an application which provides transparent accounting and allows chamas to save in cryptocurrency, taking them out of the expensive and possibly-corrupt banking system altogether. The interesting element regarding identity here is that the application takes advantage of the existing social structure of the chama, in which members know and vouch for each other. ChamaPesa can provide portability of these community-backed identities from chama to chama, place to place. The system is still in development, but its lack of reliance on tech-savvy suggests it could achieve more traction than web of trust or CAcert. Time will tell.
These models all offer different ways to represent people within a social and/or technical system. Most of them break down because of their insistence on definitive identification and other design flaws. The CAs which have played a fundamental role in mapping identities and resources on the Internet over the years have been compromised time and again which in turn compromised the users. Social networks, while seeing unprecedented adoption, have effectively captured our social graph, yet have done so at the cost of privacy and susceptibility to corporate and political interests. The only nationwide attempt at using biometrics to create a citizen avatar and ID number, the Aadhaar, takes on the vulnerabilities of centrality, systemic corruption, and the unreliability of physiological data over time. All of these models are predicated on the idea of identity as definitive. If identity is in fact probabilistic in practice, then the game changes altogether.
Identity as Correlation and Delineation
Our identity systems, as well as our informal notions of identity, are less about defining a singular, unique entity (which is how we usually conceptualize them) and more about the correlation of information about an entity across contexts. In a paper from the Web of Trust II: ID2020 Design Workshop Identity Crisis: Clearer Identity through Correlation, Joe Andrieu et al beautifully clear up the identity discourse by suggesting we replace the word “identity” with “correlation,” and it is surprisingly effective. Credentials, attestations, and identifiers can construct a composite entity with some degree of assurance. While the goal remains to get as close to defining a unique individual as possible within particular contexts, collecting these “edges” so that an entity takes shape is what constitutes an identity in practice. (See also Identity in Depth by Ian Grigg). The challenge then lies in ensuring the integrity of those edges and the information along them.
By viewing identity in practice as a correlation of facts between entities — a fine-tuned social graph — we can literally map out a composite identity.
Keys to the Kingdom
Public key cryptography has been a fundamental component of digital identity systems since public key infrastructure was first implemented to map entities to cryptographic keys. Web of trust attempted to bootstrap the system directly from individuals, but there was lack of clarity as to the meaning of the collected signatures. Are they attesting to the identity or to the meeting? Simple Public Key Infrastructure (SPKI) aimed to eliminate some of the ambiguity by annotating the keys with relevant information such as human-readable names and authorizations, but never made it past the IETF’s drafting process.
With the advent of blockchains and cryptocurrencies, there is a massive resurgence of user-generated key pairs, except now they are collectively handling hundreds of billions of dollars worth of assets. Despite the immense value at stake, there is a complete lack of key semantics which could drastically reduce the uncertainty and lack of trust. In the world of blockchain, there are not many means by which you can confidently know the entity behind a public key, or at least the authorizations of the key. You may be dealing with a ghost.
This is the challenge facing the current trend towards self-sovereign identity solutions in the blockchain space: in a largely anonymous or pseudonymous environment, how can people participate in commerce with any confidence or certainty?
This section will briefly discuss two methods of mitigating risk between interacting entities across a variety of online contexts. One has been implemented in the past and the other is a more recent proposal. The purpose of this section is to introduce a radical notion to the identity problem: there is no absolute need to have a perfect identity system if the entities interacting and transacting with one another can hold each other and the administrators (if present) to account when a dispute arises, and thereby mitigate risk in transactions.
Since identity providers such as certificate authorities are never liable for the quality of the service, this liability is pushed down to the users, who absorb all of the risk. In a meeting of minds such as a social network or parties to a commercial agreement, if someone is wronged then they need to have a means to seek recourse. One method is to give the community access to dispute resolution. If a mediator in the form of a mutual third party cannot get the entities to compromise, then the dispute can go to binding arbitration. As mentioned earlier, this is what CAcert implemented to bring more trust and accountability into their online community. This model could be implemented in any online business context and is certainly apt for commerce conducted using blockchain-based accounting and smart-contract–driven agreements.
Identity as Insurable Risk
Here at Mattereum, we propose a solution to the digital identity conundrum in commerce: treat it as an insurable risk. Can I trust this person to follow through with their end of the deal? In this case, the emphasis is placed on the performance of the deal rather than the people within it. Having in place insurance or indemnity can go a long way in securing commerce between identities which are probabilistic at best. The challenge then lies not in the identifying of the transacting parties but in minimizing transaction risk. This can be done in a peer-to-peer fashion with social proofs or via trusted third parties who can attest to one’s identity.
Despite the many broken ontologies of identity systems over the years, it is possible to examine these models in order to extrapolate a clearer framework. Our lived experience of identity cannot be directly translated into our systems as we are not too certain of that ourselves, and who’s to decide that for us? Institutional identity is simply a way to keep an eye on everyone for some purpose or another. Bottom-up based identity has tried to solve the problem but has so far been constrained by its own design. Blockchain, being a potentially new infrastructure-layer for global finance and trade, has so far not realized its potential to put this within the domain of the individual. Perhaps if we look at identity as being inherently probabilistic in nature and more of a composite structure, or as Grigg says, an “edge protocol,” rather than a singular, absolute thing, then we can start to map out these edges, improve their quality, and create better trust frameworks in our systems, focussing less on the identities themselves, but rather the interactions and transactions between them.