What Is/Are a WebSockets?
What We Talk About When We Talk About the Real-Time Web
--
I first heard the term “WebSockets” long before I had any clue what they (Websockets) were. Was it possible to have a singular WebSocket? I’d heard they were a “game-changer,” something to do with the “real-time” web. But it was hard to understand where they fit. Was it a library, or an API, or a protocol? Or was it a “transport?” As it turns out, it’s a little bit of all of these but none of them in particular.
WebSockets have quickly become an integral part of the “Internet of Things.” Most of us have come to take the “real-time” web (e.g., push notifications, chat, etc.) for granted, but it took a lot of doing to get from static HTML pages to the dynamic web apps we now expect. And while WebSockets wasn’t the beginning of that movement, they were a huge step forward. Like indoor plumbing and the right to vote, to understand what makes them so great, we need to understand what came before.
The Web Before WebSockets
Hypertext Transfer Protocol (HTTP) remains the backbone of the internet as we know it. Beginning web developers fast come to under the power and constraints of the request-response cycle. But it was built to do something comparatively simple, that is, transfer hypertext — good, old, static HTML. For these purposes, HTTP’s request-response cycle makes good sense. Ask and you shall receive. You want a document, you ask the server to serve it up, and the server responds the best it can. The limitation here is that everything we get from the server has to be requested.
Then came JavaScript, allowing for dynamic web pages — the little things like dropdown menus and photo carousels. Soon, programmers were finding ways to update a page while avoiding a reload. This led to the adoption of HTTP Polling. Then came AJAX which made “long polling” possible. Long polling involved sending a request to a server but asking the server not to respond until there was something new to send. The server would wait, send the new data, and close the connection. Then, as part of the process of handling the response, the client would immediately send another similar request to respond when something has changed. Finally, we were getting close to “real-time” two-way communication between clients and servers.
Still, these weren’t formalized practices. Solutions like long polling were inherently bending HTTP and its intent to achieve something it was never meant to do. Then, in 2006, Alex Russell introduced the term “Comet” to describe a standardized practice for such two-way web apps (as a fun aside, Comet is apparently not an acronym but a pun responding to AJAX, since both Comet and AJAX were names of household cleaners).
Enter WebSocket
By 2006 then, developers had a working solution to the problem of bidirectional communication, but it was a duct tape solution to a problem that would only become more common as the Internet of Things expanded. The standardized Comet model didn’t last long before Ian Hickson and Michael Carter pitched their new solution, which they dubbed “WebSockets.” They introduced the concept in June 2008 and by December 2009, Google Chrome was offering full support for the protocol. By December 2011, WebSockets were enabled by default in multiple browsers and The WebSocket Protocol was published to the IETF website (RFC 6455).
So what are they?
When people talk about WebSockets, plural, or a WebSocket, what they’re generally referring to are individual server connections via the WebSocket Protocol. Sophie Debenedetto explains:
“Connections form the foundation of the client-server relationship. For every WebSocket accepted by the server, a connection object is instantiated. This object becomes the parent of all the channel subscriptions that are created from there on. The connection itself does not deal with any specific application logic beyond authentication and authorization.
Websocket connections look just like HTTP connections except that instead of “http://yourapp.com”, it’s “ws://yourapp.com.” WebSockets also have a secure connection “wss” just like “https.” Like HTTP, WebSocket is a layer of abstraction build on top of TCP/IP sockets (hence the name). That URL we type into our browsers is processed by an ISP’s DNS servers and converted to an IP address like 123.45.67.89:10. That last number after the colon, the 10 (which would actually be an 80 by default for HTTP connections), is the port number. A Websocket connection stays plugged into that port for as long as the connection persists.
FUN FACT: While it is called a protocol, some engineers argue it’s more technically a transport. It’s a mail truck, not the mail. This is because although there are strict rules for establishing a connection and enveloping data, there is no rule on how a payload, once wrapped, is structured. WebSocket connections require the use of what the documentation calls a “subprotocol” to structure the message. This could be JSON, or XML, or MQTT, or WAMP. As long as the server and client agree, Websocket doesn’t care.
The Handshake
WebSocket can use any authentication method HTTP can use. What Websocket really cares about is the handshake. The handshake is the method for establishing a connection and it looks very similar to an HTTP request. In fact, it starts out as an HTTP request. Per the RFC:
The opening handshake is intended to be compatible with HTTP-based server-side software and intermediaries, so that a single port can be used by both HTTP clients talking to that server and WebSocket clients talking to that server. To this end, the WebSocket client’s handshake is an HTTP Upgrade request:
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13
A WebSocket connection just requires those additional request headers: a Connection header to let the server know the client wants to upgrade the connection, and an Upgrade header to let the server know the type of connection to upgrade to is WebSocket. the rest is handled By the WebSocket Web API.
Conclusion
Websockets can reference a few things. There is WebSocket (no article), the protocol, and there is a WebSocket, a connection over such a protocol. A user can have multiple tabs or windows open to the same endpoint, which would mean they could have multiple WebSockets.
In practice using WebSockets is becoming easier and easier. Browsers have a WebSocket Web API by default. Making a WebSocket in JavaScript is as simple as:
const ws = new WebSocket(‘ws://yourapp.com’);
Backend frameworks have native implementations for WebSockets like Ruby on Rails’ Action Cable which Rails introduced in 2015 with Rails 5. You can find more information in the docs.
Resources:
[whatwg] TCPConnection feedback from Michael Carter on 2008–06–18 (whatwg.org from June 2008)