Message Security Layer: A Modern Take on Securing Communication
Netflix serves audio and video to millions of devices and subscribers across the globe. Each device has its own unique hardware and software, and differing security properties and capabilities. The communication between these devices and our servers must be secured to protect both our subscribers and our service.
When we first launched the Netflix streaming service we used a combination of HTTPS and a homegrown security mechanism called NTBA to provide that security. However, over time this combination started exhibiting growing pains. With the advent of HTML5 and the Media Source Extensions and Encrypted Media Extensions we needed something new that would be compatible with that platform. We took this as an opportunity to address many of the shortcomings of the earlier technology. The Message Security Layer (MSL) was born from these dual concerns.
Problems with HTTPS
One of the largest problems with HTTPS is the PKI infrastructure. There were a number of short-lived incidents where a renewed server certificate caused outages. We had no good way of handling revocation: our attempts to leverage CRL and OCSP technologies resulted in a complex set of workarounds to deal with infrastructure downtimes and configuration mistakes, which ultimately led to a worse user experience and brittle security mechanism with little insight into errors. Recent security breaches at certificate authorities and the issuance of intermediate certificate authorities means placing trust in one actor requires placing trust in a whole chain of actors not necessarily deserving of trust.
Another significant issue with HTTPS is the requirement for accurate time. The X.509 certificates used by HTTPS contain two timestamps and if the validating software thinks the current time is outside that time window the connection is rejected. The vast majority of devices do not know the correct time and have no way of securely learning the correct time.
Being tied to SSL and TLS, HTTPS also suffers from fundamental security issues unknown at the time of their design. Examples include padding attacks and the use of MAC-then-Encrypt, which is less secure than Encrypt-then-MAC.
There are other less obvious issues with HTTPS. Establishing a connection requires extra network round trips and depending on the implementation may result in multiple requests to supporting infrastructure such as CRL distribution points and OCSP responders in order to validate a certificate chain. As we continually improved application responsiveness and playback startup time this overhead became significant, particularly in situations with less reliable network connectivity such as Wi-Fi or mobile networks.
Even ignoring these issues, integrating new features and behaviors into HTTPS would have been extremely difficult. The specification is fixed and mandates certain behaviors. Leveraging specific device security features would require hacking the SSL/TLS stack in unintended ways: imagine generating some form of client certificate that used a dynamically generated set of device credentials.
Before starting to design MSL we had to identify its high-level goals. Other than general best practices when it comes to protocol design, the following objectives are particularly important given the scale of deployment, the fact it must run on multiple platforms, and the knowledge it will be used for future unknown use cases.
- Automatic error recovery. With millions of devices and subscribers we need devices that enter a bad state to be able to automatically recover without compromising security.
- Performance. We do not want our application performance and responsiveness to be limited any more than it has to be. The network is by far the most expensive performance cost.
- Flexible and extensible. Whenever possible we want to take advantage of security features provided by devices and their software. Likewise if something no longer provides the security we need then there needs to be a migration path forward.
- Standards compatible. Although related to being flexible and extensible, we paid particular attention to being standards compatible. Specifically we want to be able to leverage the Web Crypto API now available in the major web browsers.
MSL is a modern cryptographic protocol that takes into account the latest cryptography technologies and knowledge. It supports the following basic security properties.
- Integrity protection. Messages in transit are protected from tampering.
- Encryption. Message data is protected from inspection.
- Authentication. Messages can be trusted to come from a specific device and user.
- Non-replayable. Messages containing non-idempotent data can be non-replayable.
MSL supports two different deployment models, which we refer to as MSL network types. A single device may participate in multiple MSL networks simultaneously.
- Trusted services network. This deployment consists of a single client device and multiple servers. The client authenticates against the servers. The servers have shared access to the same cryptographic secrets and therefore each server must trust all other servers.
- Peer-to-peer. This is a typical p2p arrangement where each each side of the communication is mutually authenticated.
A typical MSL message consists of a header and one or more application payload chunks. Each chunk is individually protected which allows the sender and recipient to process application data as it is transmitted. A message stream may remain open indefinitely, allowing large time gaps between chunks if desired.
MSL has pluggable authentication and may leverage any number of device and user authentication types for the initial message. The initial message will provide authentication, integrity protection, and encryption if the device authentication type supports it. Future messages will make use of session keys established as a result of the initial communication.
If the recipient encounters an error when receiving a message it will respond with an error message. Error messages consist of a header that indicates the type of error that occurred. Upon receipt of the error message the original sender can attempt to recover and retransmit the original application data. For example, if the message recipient believes one side or the other is using incorrect session keys the error will indicate that new session keys should be negotiated from scratch. Or if the message recipient believes the device or user credentials are incorrect the error will request the sender re-authenticate using new credentials.
To minimize network round-trips MSL attempts to perform authentication, key negotiation, and renewal operations while it is also transmitting application data (Figure 2). As a result MSL does not impose any additional network round trips and only minimal data overhead.
This may not always be possible in which case a MSL handshake must first occur, after which sensitive data such as user credentials and application data may be transmitted (Figure 3).
Once session keys have been established they may be reused for future communication. Session keys may also be persisted to allow reuse between application executions. In a trusted services network the session keys resulting from a key negotiation with one server can be used with all other servers.
Whenever possible we would like to take advantage of the security features provided by a specific platform. Doing so often provides stronger security than is possible without leveraging those features.
Some devices may already contain cryptographic keys that can be used to authenticate and secure initial communication. Likewise some devices may have already authenticated the user and it is a better user experience if the user is not required to enter their email and password again.
MSL is a plug-in architecture which allows for the easy integration of different device and user authentication schemes, session key negotiation schemes, and cryptographic algorithms. This also means that the security of any MSL deployment heavily depends on the mechanisms and algorithms it is configured with.
The plug-in architecture also means new schemes and algorithms can be incorporated without requiring a protocol redesign.
- Time independence. MSL does not require time to be synchronized between communicating devices. It is possible certain authentication or key negotiation schemes may impose their own time requirements.
- Service tokens. Service tokens are very similar to HTTP cookies: they allow applications to attach arbitrary data to messages. However service tokens can be cryptographically bound to a specific device and/or user, which prevents data from being migrated without authorization.
To learn more about MSL and find out how you can use it for your own applications visit the Message Security Layer repository on GitHub.
MSL Today and Tomorrow
With MSL we have eliminated many of the problems we faced with HTTPS and platform integration. Its flexible and extensible design means it will be able to adapt as Netflix expands and as the cryptographic landscape changes.
We are already using MSL on many different platforms including our HTML5 player, game consoles, and upcoming CE devices. MSL can be used just as effectively to secure internal communications. In the future we envision using MSL over Web Sockets to create long-lived secure communication channels between our clients and servers.
We take security seriously at Netflix and are always looking for the best to join our team. If you are also interested in attacking the challenges of the fastest-growing online streaming service in the world, check out our job listings.
— Wesley Miaw & Mitch Zollinger, Security Engineering
Originally published at techblog.netflix.com on October 31, 2014.