Building a SOCKS Proxy with Node.js (Part 1)

Node possesses some powerful core libraries, though they’re often obscured behind more all-encompassing frameworks like Hapi or Express. If you’re primarily a web developer, you most likely work with those higher level frameworks and aren’t exposed to the lower level minutiae those frameworks are actually taking care of for you. Even if you’ve worked with the core http library, chances are that you’ve never touched the lower level net library. I recently had cause to explore net more closely, which inspired me to create this post.

What We Talk About When We Talk About Protocols

In our context, a protocol is a standardized system of rules that make it easier for data to be encoded, transferred, decided, and acted upon by multiple parties. A collection of the most relevant protocols to us as web developers is known as the Internet Protocol Suite, which specifies protocols for addressing (IP), email (SMTP), hypertext (HTTP), and others. These protocols operate at different layers of abstraction.

Figure 1. Comparison of the institutionalized OSI model and common interpretation of the more practical TCP/IP abstraction layers (source: macsstuff.net)

Not to go into too much depth (at least not yet), but essentially higher level protocols get wrapped in lower level protocols as data is sent out of your system and similarly unwrapped at the receiving end. So, an HTTP request is encapsulated within a TCP segment (or a series of segments), which is contained within an IP packet, which is tucked inside an Ethernet frame.

HTTP is an application layer protocol i.e it is one of only many possible formats for communication between Internet applications. It is the agreed upon standard for web communication, and the basis for most implementations of RESTful data transfer utilized by many web and native applications. The http library gives us nicely parsed and interpreted HTTP requests that we can program around and respond to. The net library operates at a lower level, the transport layer, and concerns itself with the Transport Control Protocol, which as mentioned before, encapsulates HTTP requests and responses before they are sent on their way. However, there are other protocols that can be sent via TCP.

(It’s important to mention that there are also other protocols besides TCP that can be used at the transport layer.)

SOCKS

SOCKS is an application layer protocol (technically considered a session layer protocol in the OSI model) that essentially relays TCP traffic from one origination point to another. Though traditionally meant to act as a medium for users inside firewalls to access the greater internet, a SOCKS proxy can also be useful for masking your IP address. In my use case, we are running some web scraping tools that we don’t want to trigger any IP-related filtering, so using an array of SOCKS proxies will hopefully help prevent that. There is actually a well-regarded SOCKS implementation built with Node called socks, but I did not grasp how to actually use it when I first set out. After experimenting with dante, a SOCKS implementation for *nix systems, I decided to try my hand at building one, and hopefully learn some things about net and networking in the process.

In the next part, we will start to look at setting up a barebones net server, and examine the data that gets passed to it from a SOCKS client.