A new Identity Model [with Prototype]

The Internet consists of a complex, integrated ecosystem of hierarchical layers and technologies that enable the network to function in a unique way, and allow for frictionless data transmission between two parties. Only because of this layered structure and breaking down big problems into smaller sub-problems the Internet was able to overcome technological challenges. Each layer has their own focus and interest groups that focus exclusively on creating the best solution for the layers respective problems. A quick overview of the Internet’s layer (from bottom to top):

  • Link Layer: Offers the physical connection between nodes in the network
  • IP Layer: For identification of nodes in the network and allowing data to be routed from source to destination
  • TCP Layer: Establishes and maintains the communication (and thus, data transmission) between nodes
  • Applications Layer: Specifies the protocols and interfacing methods with which the nodes can communicate with each other

Each of these layers performs its function exceptionally well, so well that we often don’t realize how far and how fast these data packets had to travel to display a simple website in our browser.

Instead of hierarchical layers, identity is broken down into intertwined processes that create an identity cycle. The basic premise of identity is that you can enroll (create) and hold unlimited identities, can authenticate access to these identities to people or service providers of your choice and can modify (update) the identities. A simplified visualization of the identity cycle can be seen below:

The first step in the process is enrolling a new identity. The creation of identities is not limited, neither in the quantity of identities that can be created and neither in the attributes that define the identity. This supports the need for privacy and being able to represent a pseudonym during online communication (think of Reddit or a forum). Enrollment also includes a very important step, which is the issuance of a unique identifier associated with the identity. Like in a database, each entry in the database gets its own unique identifier so it can be mapped to the right identity.

Authentication is a process through which the user represents a predefined identity in her online activities and data exchange. The authentication process itself helps in determining the authenticity of the person accessing and representing the identity. The methods for authenticating an identity can range from simple passwords to more advanced biometric methods (which, when only used for authentication, are probably too insecure) such as iris or finger vein scans. For some authentication methods there are known attack vectors which harm the authenticity of the claim on an identity (best example being phishing). Therefore a strong authentication method, preferably an authentication method that is unique for each person and difficult to steal (e.g. fingerprint or iris) needs to be chosen to ensure security.

No identity is permanent and we constantly change (aka update) ourselves. Therefore modifying one’s identity attributes is a core part of identity. Some identities (birth certificate, citizenship, drivers license) are static and immutable, but they are rarely used during online communication and serve little purpose apart for verification reasons.

This model for identity is a very simplified version of the closed identity system we have today. Each service provider has to manage identity of their users themselves, they ultimately get to decide what kind of information needs to be submitted in order to access certain areas of their website, and they ultimately decide what happens with the personal information of their users. The user has to accept the premise that the service provider can be trusted to handle the identity, and not have any malicious intentions (such as selling your data). Often this trust is misplaced and privacy intrusion and identity theft is the consequence, which as we’ve seen recently, can have serious consequences.

In this system, identity is a “one time thing” and can not be revoked. When you sign-up to a service your identity with that service provider is permanent and the service provider has unlimited access to your identity. You may be able to change certain attributes of your identity, but you can never completely delete it. Even though certain services allow you to “delete” your identity, the question of what data entries remain on the centralized servers remains. Additionally, in todays online information exchange anonymity or even pseudonymity is rarely supported (best case example is Reddit, where users can still create arbitrary identities).

Entities, Identities and Attributes

Before introducing you to a more complete identity model, we need to introduce the difference between entities, identities and attributes. An entity is basically your entirety. It is the meta-object that holds the you (i.e. your identities). An entity basically maps to all of your identities and allows you to represent these identities in information exchanges. The importance of entities is that they themselves can only be linked to one person i.e. you, the holder and creator.

Identities are what define you. The question “Who am I?” is basically an attempt to map your entity to a specific identity which you think best defines/describes you. Therefore, an identity crisis is the non-existence of the correct identity that defines you. The identity you represent is constantly changing and the one you are representing at a given moment depends on many variables (the situation you are in, the impression you are trying to give, the information you have of the person you are in communication with, and so on). Through these rotating identities you are able to adapt dynamically to situations and achieve your desired outcome. The identities which you can create are basically infinite and modern science is definitely pushing boundaries even further, considering you can even modify your identity’s physical appearance.

Identities themselves consist of attributes that provide a description of the identity itself. Attributes are values of variables and give a better overview of the context. Every identity has at least one attribute, but the amount of attributes for a respective identity is limited (even when attempting to provide the most descriptive description of yourself, you will run out of attributes to describe yourself quite soon). Attributes not only help in creating context, but they also play a role in identification and proving the authenticity of an identity claim. The entity can be compared with the identity itself, thus it can be determined how authentic the identity claim is. A good example is when you go to a liquor store, and the store owner has to compare the identity presented (your ID card) with your entity (you) to determine if the identity claim is authentic and the entity actually owns the identity. There can also be behavioral comparisons to determine the authenticity of an identity claim (think of a cool handshake greeting that only you and your friends know). But probably the most robust way to determine the authenticity of an identity claim is through biometric attributes of a person (fingerprints, finger veins and iris).

The Bitcoin Blockchain: Secure record-keeping and validation

Even though Bitcoin itself is still in its infancy and its use as “the money for the internet” is up for questioning, we can still agree that Bitcoin as a network, with its immutable publicly verifiable ledger, does one thing really well: record-keeping and validation of entries. Protected by a huge cluster of powerful miners that invest millions each year in electricity to protect the network, it is the most powerful tool for record-keeping and validation of entries, or more specifically, of transactions.

Factom uses the Bitcoin Blockchain to for example enter and account for the merkle roots they create of the land registry and ownership titles. This is a perfect use case example where it makes a lot of sense to place important information into a publicly verifiable and secure ledger for the world to see and verify the validity of a claim. The same goes for identity, exactly because of this security feature, an identity layer of the future benefits greatly by utilizing the Bitcoin Blockchain for entering hashed information (or merkle roots) about entities and identities for the safe-keeping and validation of such information.

An important note to make is that the Blockchain today is not designed for data storage. It is merely there for entering proofs of information, such as a cryptographic hash digest of data. Exactly that’s why Proof o fExistence only creates a hash digest of the file it places into the Blockchain and not the file itself. It still functions perfectly to publicly verify the claim of the existence of a specific file or piece of information, since nobody is able to fake the existence of the file.

A more complete identity model

With this knowledge, we are able to create a more detailed identity model.

The basic building block of identity is the entity itself. It holds the most basic information about you, including a reference to all of the identities that correspond to it. An entity is created through a specific enroll process which may make use of biometrics for both, an additonal-factor authentication of the entity or as an attribute of the identity itself. An important process in the creation of an entity though is the “eternification” through the Bitcoin Blockchain.

Don’t worry about the arrows, I know they’re not straight lines.

This consists of placing important, hashed information about the entity into the Blockchain through a transaction (e.g. utilizing OP_RETURN). Through this process the entity not only becomes officially documented, but it also serves as future proof that the entity exists and is owned by a specific person.

With this entity new identities can be created. As specified previously, the creation of an identity is not limited and can contain arbitrary information (to support privacy). Although certain service providers may only accept identities which have been certified. Certification is a process which includes an independent party or a government agency that is able to certify the legitimacy of an identity. Web of trust is another concept to certify such a claim. An identity, similar to an entity, is placed into the Blockchain for verification purposes.

The identification of an identity can consist of various, predetermined methods. It can either be a simple password (boring), a picture, a file or biometric scans of oneself (e.g. fingerprint, iris, finger veins, voice recognition, etc.). By identifying yourself you are able to proof the claim that you own a specific identity, can represent it and thus proceed to the next step, which is authorization. Authorizing a new service provider consists of a basic handshake (similar to how SSL works today) where information is exchanged between the service provider and you. The service provider sends you information about its certificate and trust level, while you submit your actual identity which you want to be identified with on the service provider. An important part is that you yourself can decide which kind of personal information you want to submit to the service provider.

Once you are identified on a service provider you are able to get a more personalized user experience. To even further add to this personalized user experience, you and the service provider could interact with independent smart contracts (or decentralized applications) whose sole purpose is there to perform certain operations (calculations and analysis) of your personal data, which is then output and visualized — basically Zero Knowledge Proofs. An example would be age verification, instead of submitting your date of birth you can simply submit a “True” or “False” to the question “Are you over 18?”. Another example is AML (Anti-Money-Laundering) which only needs to provide the service provider with a “True” or “False” about the appearance on money laundering lists, fugitive lists, sanction screening lists, PEP lists and other lists (as a side note, KYC on the Blockchain is a bit more tricky). All this is happening without the user submitting the sensitive information to the service provider itself, but rather to a smart contract that independently performs these operations, sends the output to the service provider and forever deletes the information you submitted.

Through interactions with service providers, organizations and other individuals in the identity model, you are gradually enhancing your identity. Similar to PGP’s Web of Trust model, the more interactions you are in with different kinds and types of people/service providers the more you are able to enhance your identity. What this basically means is that the credibility and reputation of your identity increases the more often you use it to represent yourself during online communication. The more trustworthy an identity is, the more widely accepted it will be, thus further increasing the trust score. You get the idea.

Since identities can and should be altered, the final step includes the modification of the identity. Modification basically means to add new attributes or change existing ones. This step closes the identity cycle and makes it start anew.

This identity system is open and agnostic, since the user creates a predefined identity she wants to use in an online information exchange with a service provider. She authorizes the service provider with access to specific attributes of that identity. This is a huge difference from the closed identity system described earlier, since here the user is in control of her identity and she can decide what information gets submitted and what stays hidden.

Comments on this identity model

This identity model is still far from being a final version and will surely be changed in the coming weeks, it still provides at least some insight into what an identity model of the future could look like. This model still leaves some open questions though:

  • Identity Storage. If not stored on the Blockchain, where are identities stored then? Since the client-server model is flawed, the only other solution is a new P2P system that enables nodes to store encrypted identities and submit them for identification and authentication purposes to the origin claim.

Image by storj.io

  • Certification. One of the big questions, mainly because the certification model of today through centralized Certificate Authorities is flawed. The Web of Trust model could be adopted where an identity gradually certifies itself, but this process is too slow. During the creation phase the user could of course submit more personal information (e.g. Facebook, Twitter, Linkedin accounts, identity cards or passports, driver’s license, etc.) to prove his claim on an identity, but this too has risks of fraud. Most likely a government authority combined with a few stakeholders (banks, insurance companies, etc.) have to be combined to issue certificates in a more “trustless” way, where each provides an independent conclusion on an identity claim and the results are then combined for the final certification.
  • Reputation outside Web of Trust. Even though PGP’s Web of Trust model applies greatly for identity, there has to be another way to build reputation and trust scores. This requires inputs with the online and outside world though. What this means specifically is that your reputation score is based on transactions you had online or offline. For example, renting an apartment and paying regularly will surely increase your reputation score and make you more trustworthy as a good tenant. Similarly, if you rent out an apartment on Airbnb or use Couchsurfing, you (hopefully) get a positive review which further enhances your identity by increasing your trust score. Such a trust system not only enhances a single identity, but the entire entity. The difficulty is in designing a system that cannot be cheated easily, considering how easy it is to get good reviews on Amazon, Yelp and the likes, a very robust and fair reputation algorithm needs to be created.

Prototype of the Identity Model

To demonstrate the concept of the identity model we just discussed, we developed a simple prototype in Python (https://github.com/domschiener/identity-on-bitcoin) that allows users to enroll new entities, create identities, anchor them into the Bitcoin Blockchain and authorize service providers access to specific attributes of your identity.

Enrolling an entity works with a password, image, file or biometric information of yourself. Since passwords are too boring and

images/files not as secure, we are going to use a fingerprint scanner that was setup with a Raspberry Pi to scan the unique characteristics of my finger and generate a SHA256 hash with, which will be the Bitcoin private key (as an important side note, utilizing purely biometrics for the authentication of your identity is incredibly insecure. This is simply a Proof of Concept, we could also use any combination of fingerprints and other authenticaiton methods such as passwords and XOR the characteristics together to create an even more secure authentication pattern). With the private key we then generate the public key and public address as well as PGP key pair, which will later be used for interacting with a service provider. Here is the corresponding public address of my fingerprint

https://www.blocktrail.com/BTC/address/1Jjta5ugjTyLtryiSmVwXYs8EnY3LtmQqY

Since our goal is it to publicly verify that an entity was created at a specific date, we are going to make a Bitcoin transaction with an OP_RETURN input which contains basic information about the entity. This is an example of such a transaction

https://www.blocktrail.com/BTC/tx/a8ee777c62df54b6cb4d9f2b346224b49eefc109fc6217a4ea2f132e2ee888c9 (scroll to the bottom to see the OP_RETURN input)

Now our entity was officially created and can in the future be audited through the Bitcoin Blockchain (all I have to do is show the txid). Since an entity contains and stores additional information we need to be able to store this information. That’s why we’re using the Pickle library to store the information locally. Obviously the goal would it be to store this information in a peer to peer network to provide easy access of our identities to ourselves and service providers.

Now we can authenticate ourselves with the same fingerprint again. The scanner will read the characteristics of my index finger again and determine if it matches the characteristics of fingerprints it has stored in the local flash storage, if it does, it sends the characteristics to our main.py file and we are able to generate the same, unique SHA256 hash again — our private key.

Lets create a new identity, which is just simply a Python dictionary. The identities attributes are encrypted with your private key using AES. Later when you want to authorize a service provider access to your identity, you can decide which information you would like to decrypt and which you would like to keep encrypted. Same here, we are going to anchor it into the Bitcoin Blockchain. Here is the transaction:

https://www.blocktrail.com/BTC/tx/3eb06d7bcc4165d7894e04e10b61a38e111aa3c13843ddb636f26bbc40a098a3

With this identity we can now authorize service provides access to chosen attributes of our identity. To illustrate the information exchange, we are simply using PGP. Right now everything is manually, but the goal for the future is to enable transaction flow with the click of a button. You can authorize a new service provider by simply entering the PGP public key. The information of the service provider will then be stored in your local keyring. Then you can generate an “access token” (basically the encrypted message) where you can choose which service provider to use and the attributes you would like to send decrypted to the service provider. The simple-page.html example is simply a locked website that requires you to confirm that you are over 18 to see a certain area of the website. Determining whether you are 18 or not is done client-side, but the goal would it be to use a smart contract in the future that independently comes to the conclusion (True or False) if you are 18 and forwards this information to the service provider.