What we learned from designing an academic certificates system on the blockchain

Overview of our digital certification architecture

Over the past year, we have been working on a set of tools to issue, display, and verify digital credentials using the Bitcoin blockchain and the open badges specification. Today we are releasing version 1 of our code under the MIT open-source license to make it easier for others to start experimenting with similar ideas. In addition to opening up the code, we also want to share some of our thinking behind the design, as well as some of the interesting questions about managing digital reputations that we plan to continue working on.

You can find links to our source code, documentation, and discussion on our project homepage: http://certificates.media.mit.edu.

The overall design of the certification architecture is fairly simple. A certificate issuer signs a well-structured digital certificate and stores its hash within a blockchain transaction. A transaction output is assigned to the recipient.

Working on this project, we have not only learned a lot about the blockchain, but also about the way that technology can shape socioeconomic practices around the concept of credentials. We hope that sharing some of the things we have grappled with and the decisions we made (and why) will be useful for other developers and institutions interested in developing digital credential systems that make use of blockchain architectures.

Many of the most interesting challenges we encountered were not technical in nature, but they cannot easily be separated from the technology because small design decisions can fundamentally shape behavior. That is why we have taken small experimental steps, tested our system with actual users, and continue to make changes based on what we are learning. The blockchain is a relatively new technology and its complexity and immutability make it even more important to carefully consider the long-term effects of design decisions.

Version 1 of our tools is intended as a useful starting point for other researchers and experimental projects. For institutions looking to roll out digital credential systems, we recommend waiting for version 2. We have already started a pretty fundamental redesign, and we will also release future versions of the project under the same MIT open-source license.

Issuer, Viewer, Schema

The following three repositories make up our digital certificates architecture:

Cert-schema describes the data standard for digital certificates. A digital certificate is essentially a JSON file with the necessary fields needed for our cert-issuer code to place it on the blockchain. We tried to keep the schema as close to the open badges specification as possible and expect to be even more closely aligned with the next version of the specification.

Cert-issuer takes a JSON certificate, creates a hash (a short string that can be used to uniquely identify a larger digital file) of the certificate, and issues a certificate by broadcasting a Bitcoin transaction from the issuing institution’s address to a recipient’s address with the hash embedded within the OP_RETURN field.

Cert-viewer is used to display and verify digital certificates after they have been issued. The viewer code also provides the ability for users to request certificates and to generate a new Bitcoin identity.

Digital certificate examples from various deployments of our cert-viewer codebase. Left to right: MIT Media Lab alumni; Learning Machine employee; MIT Global Entrepreneurship Bootcamp participant; Laboratorio para la Ciudad workshop participant.

The Importance of Digital Credentials

We won’t repeat some of our general thinking about credentials and recognition here, but please have a look at our original post if you would like more background. Our interest in designing new solutions in this space is driven by the limitations we see in current approaches. When certification systems are not working well, the consequences can be more than just inefficient, such as the cumbersome and expensive process of requesting a university transcript: they can be disastrous, such as when a refugee is unable to provide a certificate of completed study, and is therefore prevented from continuing her education. Digital systems could help in both of these situations.

(Beyond the) Hype

A cautionary note about the blockchain hype. During the year that we have been working on this project, blockchain-based certification systems have become a hot topic (type the term into Google and see for yourself). Needless to say, much of the rhetoric has been exaggerated (and the same is true for some of the criticism). One important takeaway for us has been that the blockchain is a lot more complicated than most people make it out to be. Building applications on top of it–which is what we did–is getting easier, but there are still very few people who deeply understand its inner workings (and we don’t consider ourselves part of that group). The blockchain is not a simple solution that will fix everything that is wrong with today’s credentials. But it does offer some possibilities for improving the system we have today–and that’s what we are excited to explore.

Why the Bitcoin Blockchain? Why not Ethereum?

The easy answer is that when we started out, Ethereum was a mere whiff of an idea (no pun intended). The other part of the answer is that Bitcoin has been the most tested and reliable blockchain to date; in addition, the relatively robust self-interest of miners, and the financial investment made into Bitcoin (and Bitcoin related companies) make it likely that it will be around for a good while longer. Our solution is not locked to one particular blockchain–it would be easy to also start publishing our credentials to other blockchains, but for most of what we want to do, the functionality of the Bitcoin blockchain continues to be sufficient. That is not to say that we are not curious about the potential of smart contracts, and we are discussing the potential of Ethereum-based side-chains to reduce transaction cost and expand functionality.

Dealing with Public/Private Key Pairs

Our system uses public/private key pairs to authenticate an issuer as well as a recipient. While that’s a powerful concept, we found it was a bit of a headache to implement in practice. Ideally certificate recipients (such as graduates or workshop participants) would create their own key-pairs and then share their public key with us in order to request a certificate. But the amount of technical sophistication required to do this makes a broad roll-out prohibitive. For now, the ability to share a simple link to a certificate is convenient, but in the future, we will need better ways for non-technical users to create and manage their own keys. The best solution would be a wallet for academic credentials that works like the wallets used to hold and transact Bitcoin. An alternative would be to use a paper-based system of pre-creating and sharing keys (and then destroying them). But that requires a higher level of trust in the institution that issues the certificates.

Certificate Revocation

We wanted to reserve the possibility to revoke a certificate. Partly because everyone is concerned about it, and partly because we were worried that we might miss a fundamental flaw in our design and need to invalidate our first attempts. Revocation in our current system (version 1) is not actually a deletion–no information can ever be deleted from the blockchain–but it is a flag that either the issuer or the recipient can set to signal that they don’t acknowledge the certificate to be valid. In more technical terms, we create two outputs containing $0.01, with one assigned to the recipient and the other to the issuer. To revoke a certificate, either party just spends the output they control. In that sense it works more like a convention that all users have to agree on. Our viewer code follows this convention and checks if the revocation flag has been set, but other viewers could choose to ignore it. That is a design choice we are reconsidering, and for version 2 we are exploring other revocation approaches, which could reduce the ability for viewers to show or validate revoked certificates. Two possible directions are versioning and maintaining a revocation list. Versioning (e.g., following a chain of spends to the most current version of a certificate) sounds tedious. A revocation list, on the other hand, is a common pattern with other certificate issuers, such as open badges and X.509 certificates.

Privacy

While much has been made of the ability to conduct shady business on the blockchain, it is inherently a public and immutable space–everyone has access to its contents and nothing can be erased. At the same time, certificates are only useful when they can be tied to a person. That’s why protecting private data is so important. On one hand, learners need to be able to show evidence that they (and not somebody else) received a particular certificate. At the same time, they should be able to disclose this information to one employer, without having to also share it with every other employer. Some of our colleagues at MIT are working on systems that will provide more sophisticated ways of managing private data, but these are still in the early development stages. In our current solution, we try to balance obfuscation (making it hard for non-authorized users to find information they shouldn’t have access to) with usability so that institutions or learners that lack advanced technical sophistication will not be prevented from using the credentials. We do this by hashing the certificate (which contains a learner’s personal information) and only placing the hash on the blockchain. If someone wants to verify the validity of a certificate, they need the learner to disclose both the certificate itself and where the hash of the certificate is located on the blockchain.

The process of verifying a digital certificate. You can verify a certificate manually or by using our verify code in the certs-viewer codebase.

The Right to Curation

Should learners be able to choose what parts of their history they share with others? With traditional certificates, learners have been been able to construct different narratives of their experiences for different purposes. For example, a learner who has interests in food and writing may highlight a certain set of experiences when applying for jobs as a journalist, and a different set of qualifications when applying to work as a sous chef. She may also talk about these experiences in different ways in her interviews. Some employers might prefer stronger requirements for full transparency, but in most cases, there is no good reason to require her to share all of her accomplishments in the same way. And without better safeguards in place to protect the further sharing of such personal information, the risks of requiring disclosure outweigh the benefits. It’s a tricky question, because you would want to know about previous DUI convictions before you hire a new driver, but we believe our legal and social mechanisms are better equipped to deal with it than a new technological system. Some people choose to broadcast their academic history (e.g., display it on LinkedIn), others prefer to disclose it only when needed. We aim to give the learner similar flexibility when using digital credentials. When a learner chooses to share a certificate with a potential employer, only the contents of the specific certificate is shared. It is possible to search the blockchain for other certificates that the learner may have received, but the content of these certificates will be encrypted. There are shortcomings to this design. For example, if an issuer only issues one type of certificate it is possible to search for all transactions this issuer has made on the blockchain, and deduce who else may have received them. That is why we are working on a fundamental technical change moving from version 1 to 2, to make traceability much harder.

Tracking Use and Value

Tracking the use of credentials as a way to document their value to the individual is an area where we see a lot of potential, but don’t have a clear design proposal yet. If there was a public record of the degrees to which employers pay attention (beyond the obvious list of Ivy League institutions), it would help students decide which programs to enroll in. Two possible solutions to extend our architecture in this way are “transactional disclosure” and “disclosure by proxy.” The first would implement the process of disclosing a certificate as a type of transaction that is publicly recorded, generating metadata that others can use. The second solution would rely on users verifying certificates through a third-party service (e.g., disclosing them to an employer via a website–in most cases this could be the issuing institution) which keeps a record of the disclosures. We don’t plan to add either of these into our version 1, but it’s something we are thinking about for version 2 of the code.

Version 2

We mentioned above that version 1 was for experimental users and researchers. For version 2 we are making some architectural changes, but we also focus on documentation and deployment, to make it easier for other institutions to get started. The biggest technical change is how we will store the certificate data. In version 1, each certificate corresponds to a transaction on the Bitcoin blockchain. While that provides a nice metaphor for the process of issuing a certificate (there is an actual transaction going from the issuer to the recipient) it is unnecessarily wasteful. In version 2 we will store certificate data in a Merkle tree (a cryptographic construct that allows more efficient storage) while preserving the ability for individual users to point to their individual certificates (without having access to or control of other certificates). The Merkle root will still be recorded on the Bitcoin blockchain to preserve the benefits of using a blockchain. This presents other interesting challenges, because Merkle trees are more likely to be maintained by issuing institutions than by recipients, but as we mentioned before, a wallet-based approach to managing credentials (and storing references to the certificate data on the blockchain) would still give recipients full control over their credentials.

Getting Started/Involved

The new project home will be at http://certificates.media.mit.edu, where you will find links to all of our source code, documentation, and example implementations. If you are experimenting in this space, please consider joining the Github repository for the issuer functionality (cert-issuer), where we plan to move most of the technical discussion regarding version 2 over the next few weeks. For project-specific feature requests, bugs, or other issues, we recommend opening a Github issue against the project, or submitting a pull request.

Acknowledgements

We’ve talked about some of the considerations we made in designing these certificates, but haven’t mentioned the many people who helped us in the process. When we started working on this project, we knew next to nothing about the blockchain and we are greatly indebted to the many people at MIT and elsewhere that helped us get started. Nate Otto helped us wrangle the open badges specification to make it work with the blockchain (and we are proud to be OBI compliant). Guy Zyskind, one of the creators of Enigma, and Jeremy Rubin, Chelsea Barabas, Brian Forde, Neha Narula, and Conner Fromknecht from the Media Lab’s Digital Currency Initiative helped with all things bitcoin and blockchain and have been terrific partners. David Anderton from MIT Global Entrepreneurship Bootcamp and Daniel Tello and Sofía Bosch from the Laboratorio para la Ciudad were fearless first adopters of the digital certificates. And very early on in the process we started collaborating with Dan Hughes and Chris Jagers from Learning Machine, who have a wealth of experience in how institutions deal with academic credentials (and a seemingly endless list of obscure academic treatises on the history of university admissions). All of the above have been incredibly helpful, sometimes by pointing out flaws in our ideas, and always by trying to identify better solutions. And we have been fortunate to do this work within the freedom of the MIT Media Lab, which allowed us (pushed us in fact) to not only think about digital credentials, but actually issue them to our Director’s Fellows and alumni, allowing us to gain valuable real-world experience and user feedback.

— — — — — — — — — — — — — 
Juliana Nazaré (@ju1es_) is a graduate student in the Program in Media Arts and Sciences at the MIT Media Lab. Kim Hamilton Duffy (@kimdhamilton) is a Principal Engineer at Learning Machine. J. Philipp Schmidt (@schmidtphi) is Director of Learning Innovation at the MIT Media Lab. http://1l2p.net