A Brief History of Digital Identity

On April 10th, Mattereum hosted the third Internet of Agreements® (IoA) conference at the Google Campus in London, dedicated to the topic of “Identity”. This is in introductory article, that explores the history of efforts to solve the identity problem — with databases, various ID-systems, biometrics, communities, social networks and more. You can find all materials from the conference here.

Identity has become one of the core challenges to technological innovation. Ultimately, the multifarious social systems and structures we build aim to make social interactions easier, more efficient, or less uncertain. Proving identity across time and space is hard, and human identity in particular is complex and multi-faceted. The recent surge in the use and importance of digital systems and their corresponding versions of identity have only compounded the difficulty.

This article tells the story of various efforts to solve this problem, a story involving many different institutions and individuals attempting to represent existence within a certain context for a certain purpose, usually trust and accountability. We focus on the evolution of practice, leaving the messy ontologies of digital identity aside.

The Database and Institutional Reality

Many histories of digital identity start at the advent of the Internet, but the construction of name spaces is much older.

Identity in our social systems is less concerned with encapsulating the human and more about the act of naming. As Silicon Valley’s Jared Dunn says in a moment of sad wisdom, “a name is just a sound that somebody makes when they need you.” The purpose of these names (or numbers) is to prove the uniqueness of a particular individual, to ensure accountability, and to establish some trust between individuals and institutions, to provide points of reference for the framework of laws and other social contracts that run our society.

However, naming is only part of the process. Names in our systems and societies have power to endow certain capabilities in the world of matter and the world of data. To carry weight, these strings of characters must have some significance in both physical and digital domains.

Let’s start with databases.

Long before computers could effectively communicate with one another, there were massive databases, intended to preserve a model of a certain institutional reality. These were owned and operated by governments, corporations, and banks in order to better manage and access accumulated data on citizens, companies, employers, employees, customers, etc.

A notable example of a foundational database that is still around is Dun & Bradstreet. The origins of D&B date from 1841 with the founding of the Mercantile Agency by an entrepreneur named Lewis Tappan. The intent was to provide American merchants with reliable credit information on businesses in order to enhance commercial decision-making. This is the origin of the credit reporter as a profession; four US presidents were formerly D&B credit reporters. The company grew rapidly, taking over competitors and winning customers including the US government and UN, and in 1963 D&B implemented the DUNS number, a unique nine-digit number that to this day functions as the foundation of their entire system. As early as the 1970s, D&B fully computerized all operations surrounding their data. Their database contains detailed information on over 285 million businesses, making them one of the key providers of analytics and other services layered on top of over a century of commercial data.

Similarly, across the Atlantic Ocean, the United Kingdom has Companies House, the state registrar of companies. The roots of Companies House go back to the Joint Stock Companies Act of 1844, which allowed for the formation of shareholder companies. The act required all companies to be recorded on a public register maintained by the Registrar of Joint Stock Companies. Presently there are more than 3.5 million companies registered with Companies House, with half a million new companies being registered annually. The Personal Information Charter sets a standard for the personal data that the Registrar holds regarding persons associated with a company; for example, company directors are required to have on record their name, address, occupation, nationality, and date of birth.

Dun & Bradstreet and Companies House are examples of corporate and state entities that have created and maintained models of institutional reality that essentially digitize paper records. Companies, or the people that constitute them, have certain attributes that are deemed important, which are then catalogued for the purposes of an institution. There are two facets to this: the optimization of internal operations, and knowledge for commercial decision-making. The similar timelines and structures of D&B and Companies House show how a hierarchical organizational structure coupled with a certain way of thinking fundamentally affects the systems they build. This is emblematic of Conway’s Law: “organizations which design systems…are constrained to produce designs which are copies of the communication structures of these organizations.”

State-issued ID systems run on similar logic, although the US relies on social security numbers (SSNs) and other identifiers that are not primarily designed for ID. The passport, the identity document intended to represent you across the borders of the world, is nothing but a collection of facts assembled and approved by a central authority. Many state-run digital identity systems simply repackage the information contained in a passport in digital form, and move it about securely (one hopes).

As Above, So Below

Also important, but often overlooked, are the notions of identity built into the computer systems that house the databases. Many of the fundamental mechanisms were developed for early systems in the 1960s and 70s, and remain the same today: as notions of digital identity evolve, these primitives remain the building blocks with which new systems must be built, and to which they may be well- or ill-suited.

The two best-known operating-system level paradigms are access control lists (ACLs) and capabilities. The former is pervasive; the latter’s potential has yet to be fully realized.

An ACL is a list of permissions attached to a data object (often a file) that specifies which users or system processes may access it, and what operations are allowed (typically read, write and execute). A capabilities-based system instead associates unforgeable rights to perform certain operations, and attaches them to certain agents, for example programs running on behalf of a particular user, or of the system.

Here is a physical analogy: a person (user) is attempting to enter (permission, right to access) a room (data object) via a door (access mechanism). In ACL, a door-keeper might check a list of authorized users against the user’s ID, and if the person is on the door’s list, admit them. In a capabilities-based model, the user would carry a key that opens that particular door, and simply unlock the door.

Most modern operating systems use the ACL model, and many organizations implement Role-Based Access Control (RBAC), which is a sort of higher-level hybrid of ACLs and capabilities: subjects (such as users or software agents) are assigned roles, which in turn are given permissions to perform certain operations on given resources. RBAC is a flexible but complex model, with many possible implementations. This has a strong influence on how identity is conceptualized and implemented in modern systems: in particular, it is behind the ubiquitous username and password schema: this person is allowed to do this thing within this context because this entity said so. These design patterns tend to stick around.

ARPANET, the Internet, and DNS: A Decentralized Global Network

In the early 1960s, a director at United States Department of Defence’s ARPA (Advanced Research Projects Agency) named J.C.R. Licklider proposed to his colleagues the concept of an “Intergalactic Computer Network,” a global electronic commons that would be “the main and essential medium of informational interaction for governments, institutions, corporations, and individuals.” The vision he laid out is remarkably similar to the current day Internet. The first iteration of this network was called ARPANET, established in 1969. In the early days, the network consisted of terminals at various universities and government institutions. As the network grew in size, more advanced protocols became necessary to scale. In 1982, TCP/IP was implemented, a foundational protocol for the era of personal computing; later versions are still fundamental to today’s internet, in particular underpinning the World Wide Web.

TCP/IP assigns numerical addresses to internet “hosts” (computers). Since the ARPANET days there has been a need to map these to human-meaningful names. The Stanford Research Institute (now SRI International) maintained a single text file, HOSTS.TXT, which mapped names to addresses. The entire namespace of the proto-Internet was managed via a single text file maintained by a single institution. Of course this would not scale technically or politically as the internet grew, so an alternative was proposed: the Domain Name System (DNS). The Internet Engineering Task Force (IETF) published the first specification of this system in 1983. The DNS system is operated via a distributed system of servers, with the root zone operated by ICANN, the Internet Corporation of Assigned Names and Numbers, a US non-profit. Until 2016, ICANN was answerable to the US Department of Commerce; since then, it has operated independently.

Public Key Cryptography: The Foundation of Secure Public Networks

Public-key cryptography is perhaps one of the most important technological innovations of the past century. Without it, we would not have been able to secure the public networks on which global communication and commerce rely. Discovered in the early 1970s by British government cryptographers, it was not until 1976 that US researchers Whitfield Diffie and Martin Hellman independently discovered the method and published a paper describing it.

The system uses a linked pair of keys, one public, one private. The public key can be widely shared while the private key is used to decrypt messages encrypted by the public key. The crucial feature is that it is computationally infeasible to deduce the private key from the public key, so that while anyone can encrypt messages using the public key, only the private key holder can decrypt them.

The keys can also be used the other way around: if the private key holder performs the decryption operation on a plain-text message, the result is gobbledegook that can be “encrypted” by anyone who has the public key to recover the original message. Since only the private key holder could have produced such a piece of gobbledegook, this proves that the message was produced by the private key holder, so acting as a digital “signature”.

For many uses, it is not enough to know which key a message is encrypted with; it is crucial to know who owns that key: public keys, and thereby implicitly private keys, must be linked to identities. This need resulted in the creation of public key infrastructures (PKI), systems which facilitate the issuance and storage of digital certificates, which are used to verify that a public key belongs to a particular entity. A trusted third party, or certificate authority (CA), publishes a public key mapped to a user using a private key. In this relationship, the user must trust the CAs who are solely responsible for maintaining the integrity of the link between the entity and the public key.

Public key cryptography has been the bedrock of identity on the Internet, from the longstanding practice of PKI and CAs, to PGP’s web of trust experiment, to the entire blockchain ecosystem.

The World Wide Web: A Computer Network for Humans

Tim Berners-Lee’s groundbreaking work with hypertext made networked computers generally usable and useful, and kickstarted the age of personal network computing. The first graphical web browser was released publicly in August 1991, underpinned by HTTP (the HyperText Transfer Protocol) and HTML (HyperText Markup Language) and the first web servers.

The early days of the Web can be described as an Internet of Ideas, in which people from all over the world could communicate across time and space. People could “find the others” as Timothy Leary once advised, and establish communities that transcended spatial limitations.

Next, PKI/CA technology was added to the web to create “secure HTTP” (HTTPS), enabling web sites whose identity users could trust, and with whom they could share information securely, over encrypted channels. This was sufficient to allow credit card transactions (the user could trust that their details were being sent securely to the intended merchant), and the Internet of Commerce was born, which gave rise to some of the largest companies in the world at the turn of the century.

So online identity was now a solved problem, right?

Identity through Community: PGP’s Web of Trust and CAcert

Not exactly. The certificate authority model that came out of public key infrastructure and which became the Internet standard was limited by its centralization of trust. The CAs could be compromised or have questionable integrity in regards to the issuance and signing of certificates, and there was little that individual users could do about it.

Foreseeing the inherent flaw of a centralized CA model, PGP creator Phil Zimmerman introduced the concept of a “web of trust” in 1992, but it did not gain traction until the early 2000s. Webs of trust remove the central reliance on a certificate authority and replace it with a peer-to-peer approach in which each user has their own public and private keys, and can sign the keys of people they know or whose identity they can verify personally (for example, by checking government-issued ID). Keys signed by other keys which themselves carry many signatures are considered more trustworthy. Webs of trust are often grown and strengthened by “key-signing parties” held at conferences and other meetups.

While this model worked for some small technical communities, and has been used successfully to underpin some large free software projects, such as the Debian operating system, it required a level of technical sophistication and a degree of work beyond average consumers, and a degree of decentralization that made businesses unhappy.

In 2003, a community-driven CA organization developed a CA named CAcert which provided a hybrid approach to certificate issuance and signing. The first layer was a web of trust system in which people would meet face-to-face to verify each other’s identity. This would result in the accumulation of assurance points. The assurer role could be obtained by frequently participating in the assurance process and passing a test. Once a user’s ID was sufficiently assured, CAcert’s certificate authority would sign the user’s keys à la PKI.

While the hybrid method of the peer-to-peer assurance model and the traditional PKI model helped to create a high trust environment, an important innovation was the CAcert user agreement, which included an arbitration clause. Having this dispute resolution mechanism in place in the event of a system error or foul play or any other mess that arises when people are involved reassured community members and users that the system was accountable and correctable.

Web of trust and CAcert showed that a more bottom-up identity system was possible. While both projects lacked the intuitive design necessary for user adoption and scale, they did show identity systems could be constructed without the centralization of trust.

The Social Networks

While much of the “identity” schemas of the eighties and nineties were based on the CA model in some shape or form, the next major shift in digital identity was precipitated by social networking sites. While there have been a few social media players that have attained a significant degree of popularity over the years, we will focus on the site that has 2.2 billion monthly active users: Facebook.

Facebook’s membership was initially limited to the Harvard student body; it was then spread to the Ivy League schools and similar institutions internationally, and then to high schools. The market entry was quite calculated in targeting the up and coming generation that would grow with the service. The name itself is a reference to the face book directories of university students.

Facebook operates by allowing users to put their social graph, the mapping of their relations with other people, institutions, ideologies, interests, etc., online. This is quite powerful: the accumulated experience gives a sense of presence that identity certificates lack. However, if this social graph seems like just another database then you would be right. While Facebook presents itself as a network, it is really just a massive centrally controlled database that houses and runs algorithms over data provided by a third of the world’s population.

The reach of Facebook extends far beyond the site and mobile apps. For many websites and and online services, a Facebook profile is a sufficient identity for login purposes. This practice of an extensible digital identity connected to a centralized corporate entity like Facebook, LinkedIn, or Google, which amasses data based on your activity and exploration, has become perhaps one of the most widely used digital identity models. Who wants to create a new username and password for each website that requires an identity? Cornering the market on convenience goes a long way.

The social network holds a special place in this story, as it presents a pivotal shift in how the average person views their identity. Before social media, interacting and transacting online essentially involved a login to a website or the use of email. It was more purposeful. With services like Facebook, there is an emotional current. The notion of the status update is different from other messaging. To many, the online presence is a truly cultivated extension of the self.

Biometric Identity

The practice of identifying an individual based on recordable physical traits, such as fingerprints, goes back to the nineteenth century. Sir William James Herschel, Chief Magistrate of a district in India, asked a local businessman to add his handprint to a contract to make it harder to repudiate. In the years that followed, others would further explore the use of the fingerprint in identifying individuals, particularly criminals.

Modern biometrics use digital abstractions of physiological and even behavioural traits to identify individuals. The largest biometric identity system ever implemented is the Aadhaar in India, which currently stores the personally identifiable information of over 1.19 billion individuals in a centralized database. The Unique Identification Authority of India (UIDAI) is mandated to assign an Aadhaar, a 12-digit unique ID number, to every Indian resident. The number refers to a database record containing biometric information including a photo, two iris scans, and all available fingerprints, and demographic data.

Since 2009, the Aadhaar has become the identity anchor for Indian citizens across a variety of services, especially government services. As with all biometric ID systems, the Aadhaar has been criticized for its security, efficacy, and its uncomfortable cataloging of the individual. Are we not more than the sum of our parts?

IoT and The Identity of Things

For the longest time, the only entities that communicated on the Internet were individuals and commercial entities comprised of individuals, but proliferating gadgetry has created an entire ecosystem of “smart” objects that connect wirelessly to the Internet to share data with each other and on our behalf. Household devices such as Amazon Alexa and the plethora of smart appliances available are the household deities of the 21st century. Wishes uttered in the space of your home can be granted: indoor temperature, favorite playlist, news of the day. This is often called the Internet of Things (IoT). Devices that capture and relay industrial data are also prevalent in agriculture, supply chains, and robotics: the Industrial Internet of Things.

There has been a lot of concern about the security and integrity of the many connections that smart devices can form over time: person-to-machine, machine-to-machine, machine-to-network. IoT has often been the point of failure in operational security, with seemingly benign devices like WiFi-connected printers being the Achilles heel of an entire enterprise. According to a report by Gartner Research, “by 2020 over 25 percent of identified attacks in enterprises will involve IoT, although IoT will account for less than 10 percent of IT security budgets.”

The IoT field is growing fast as networked devices become ubiquitous, so there has yet to be a clear standardization of best practices in regards to security and identity within the space. Some services, like DigiCert, employ a PKI/CA model to identify these devices. Other platforms, like ForgeRock and Gigya, advocate the use of protocols like OAuth and its lightweight identity layer OpenID Connect to handle authentication and authorization.

The identity of our machines does not enter the discourse as much as it should given the rise of automation and a growing machine presence — virtual and physical — in our society.

Maybe by the time smart cities become a reality and our toasters start giving us relationship advice these identity models and namespaces will be secure and operate on a consensual basis in regards to our data and privacy.

Blockchain and the Quest for Self-Sovereign Identity

The notion of self-sovereign identity dates back to the early nineties with the Cypherpunks, an informal network of computer scientists and activists who valued peer-to-peer networks and the right to anonymity amidst the growing centralization and surveillance of the Internet. “True Names” by Vernor Vinge was a seminal text that influenced programmers and computer scientists alike (many of them likely Cypherpunks) with its band of hackers, or “warlocks,” who keep their identities secret from each other and the Great Adversary, the United States government. If one’s true name was known to another, he or she would be under their control. As Eric Hughes writes in A Cypherpunk’s Manifesto, “Privacy is the power to selectively reveal oneself to the world.”

It was this culture which planted the seeds that would result in years of work on digital cash systems, none of which succeeded, due to the reliance on a central server or trusted third party, and the indifference or active hostility of financial regulators. Bitcoin was the first peer-to-peer electronic cash system that gained any sort of traction. The structure of the Bitcoin network, the first widespread mutual distributed ledger, opened up an array of use cases impossible with previous centralized systems. While distributed ledgers go back at least to 1978, the consensus protocol of Bitcoin’s underlying blockchain allows the network to function without the direction of a sovereign entity such as a central bank or government. The anonymous inventor(s) of Bitcoin, Satoshi Nakamoto, introduced a radically new concept of a peer-to-peer, trust-minimized financial system.

Aside from financial applications, one of the most widely discussed use cases of blockchain and distributed ledger technology is identity. One of the reasons for the enthusiasm around digital identity and blockchain is the fault-tolerance of the distributed data structure. If you want a certain set of facts to persist into the future, then blockchains are well-suited if said facts are not compromised by transparency. Namecoin, a decentralized DNS and identity system built on a fork of the Bitcoin network, operates under this methodology. Use your key pair to map identities to domains and other data, publish it onto the chain, and let the subsequent blocks cement the data in time like a fly in amber (to use Nick Szabo’s great analogy). Projects such as Namecoin claim to solve Zooko’s Triangle, which states that name systems can have only two out of three desired properties: secure, decentralized, and human-meaningful. But how can privacy be maintained in a transparent ledger of transaction data? This requires zero-knowledge proofs, which allow parties to prove to each other their knowledge of a shared secret without revealing the secret itself. Some projects, such as Sovrin and uPort, record the attestations on the blockchain rather than the identity itself.

The challenges facing identity solutions based on distributed ledgers arise from the same properties that make the technology attractive in the first place. Surely if we are able to freely transact digital assets with one another using wallet interfaces to key pairs then we ought to be able to assign and maintain our identities on these networks? But this opens up many problems such as how to keep false or inaccurate information from being crystallized in the ledger, how to deal with lost or compromised keys, establishing certainty in a “trustless” environment, etc.

The Story So Far…

This survey of digital identity systems spanned the namespaces of institutional databases from the sixties, the very architecture and language of these systems, the advent of the Internet and public key infrastructure, different approaches to fixing the CA reliance, social networking behemoths, biometric national ID, Identity of Things, and the latest efforts towards self-sovereign identity via blockchain, yet the very multitude of attempts at representing people and other forms of matter within our systems has proven to be a long process of trial and error. The naming of things and the establishment of trust have become this persistent conundrum for our society. Within these structures we build around ourselves, we still have trouble representing ourselves, our capabilities, and our materials to each other within all the machinery.

There is much work to be done.

Further Reading

Vinay Gupta on Identity

Ian Grigg on Identity

David Birch’s Identity is the New Money (US, UK)

IETF and the Domain Name System