Stealing Secrets from Developers using Websockets
This is a story of a convoluted, not-very-useful method for extracting codez from unwitting JavaScript developers working on top secret projects.
A couple of articles have hit the sites recently about websites abusing websocket functionality to port-scan user’s computers’. For example: https://nullsweep.com/why-is-this-website-port-scanning-me/.
The reason these techniques work is because browsers allow websockets from public origins to open websockets connections to localhost without many protections.
This got me thinking. I know that popular JavaScript frameworks use websockets in development to automatically reload pages when content changes. Could a malicious website eaves-drop on that traffic, and find out when developers are saving their code?
The reality was slightly worse than I had thought.
The Scheme
- Either set-up, or inject advertising malware into a popular site that front-end developers tend to frequent. Let’s say http://frontend-overflowstack.com/
- On this page, add code that tries to open websockets connections to common ports (scanning 10k ports takes a second or so, so you can be quite generous here)
- If the page manages to open a connection, hold it open, and forward all messages received to your secret database of nefariousness.
- ?
- Profit
Does it work?
I’ve hosted a very simple page at: http://frontend-overflowstack.com/. On load, it tries to connect a websocket to every port between 2,000 and 10,000 on the visitors computer (barring a few that Firefox doesn’t allow). If a port connects, then it listens to that port and outputs any messages received. This page does not save or otherwise transmit any captured data, it is only displayed temporarily on screen.
If any output appears on this page, an actually malicious site could easily send that output to any server it wants.
Generating Data
To test this concept out, we need a simple web server that uses hot-reloading. This is the simplest I could come up with:
Which, when run, starts up a server on port 3000, a websocket server on port 9856, and sends a message: reload
to any connected websocket clients every 5 seconds.
If we fire up our sniffer site, the following appears:
So frontend-overflowstack.com is directly eaves-dropping on reload messages being sent by a local dev server to my local browser.
At this stage, it’s possible to sit back and gleefully count how many times each visitor to our site makes changes to their local JavaScript code, but can we use this to get more info?
The plot thickens
The majority of front-end development these days seems to involve using React, and typically this involves running the webpack-dev server, which includes its own, more fancy web-socket interface.
This server shares much more, only slightly interesting, data over its websocket. Demonstrating this is as simple as invoking create-react-app:
$ npx create-react-app test
$ cd test/
$ npm start
If we run this, and look at our evil site again:
Instantly there’s more data being shown, we’re getting hashes and status messages, all the useless infos.
But what happens when the developer makes a typo? The webpack dev server helpfully tries to send a bunch of debugging and stack information to the developer’s screen, by way of its websocket connection.
Luckily, our evil site can see this too:
Now things are getting juicy. We’ve got code snippets, paths to files, locations, all sorts of bits of useful info.
It gets even better if eventually the Dev accidentally typos on a line containing useful data:
Now we’ve got a copy of this developer’s AWS Dev credentials. Quick, fire up the bitcoin miners!
Anatomy of the “Attack”
No technical design is complete without some form of diagram. Here’s how this works, graphically:
(To simplify the diagram, I have omitted the local web server running, and pretended that the websocket server originates from within the editor directly)
A malicious web page on some browser tab silently connects to open websockets on the user’s machine. When sensitive data is sent over that socket (from a process that expects to be communicating over a local-only channel) the website can receive that message data, and forward it on to any external database.
Threat Assessment
Limiting factors
In all seriousness, this attack vector is pretty slim. You’ve got to tempt unwitting users to visit your site, and to stay on it while they’re developing JS code.
You’ve then got to wait to get lucky to glean morsels of data from their coding mistakes, to maybe find an opening that allows you to profit from this data.
Compounding Considerations
However, we’ve already seen that various sites are already using websocket port-scanning technology without much in the way of general developer awareness. Given that JS tooling tends to use a small number of well-known ports, writing a script to exfiltrate react Dev traffic subtly would not be particularly hard.
Imagine an internal developer working for Twitbook just pressing save in their editor and causing that access token or internal server address to be leaked to the wrong audience.
The slightly scary aspect of this is that a reasonable developer should have a general expectation that pressing save in their code editor of choice should have effectively 0 chance of causing data to be leaked to a third-party web service. This attack raises that chance enough to be a tiny bit concerning.
Remediation
I pursued this vector of trying to intercept JavaScript hot-reloading mechanisms because it’s really the only general use of websockets that I’m familiar with. Discord also uses websockets, but a passing glance at that didn’t yield any obvious results, as that channel is designed with the public internet in mind.
It’s worrying that just this one simple use-case of a one-way communication channel for reloading has exposed so much potential information to bad websites.
Given this, it’s likely that other uses of websockets (for data not designed for the public internet) may be similarly compromised.
Arguably the webpack-dev server should do some authentication or that alternate browser communication channels be used for hot-reloading (I believe this is already being planned for other reasons).
It certainly feels like it’s the way that browsers/web standards implement origin policies for websockets is surprising, and is resulting in software designed for local-only development to be exposed to the public internet in a non-ideal way.
I would expect any fix to focus on implementing extra controls in the browsers.