I learned about
socat a few years ago and am generally surprised more developers don’t know about it. Perhaps I appreciate it all the more since I saw it being used for the first time to fix a production issue, and these sort of incidents leave a lasting impression on one’s mind.
socat has a bit of a learning curve compared to tools such as
netcat. While I still often use
netcat and friends (in no small part due to muscle memory),
socat truly is the Swiss Army Knife of network debugging tools.
What is socat?
socat stands for SOcket CAT. It is a utility for data transfer between two
socat so versatile is the fact that an
address can represent a network socket, any file descriptor, a Unix domain datagram or stream socket, TCP and UDP (over both IPv4 and IPv6), SOCKS 4/4a over IPv4/IPv6, SCTP, PTY, datagram and stream sockets, named and unnamed pipes, raw IP sockets, OpenSSL, or on Linux even any arbitrary network device.
The way I learn CLI tools is by first learning the usage of the tool, followed by committing a few simple commands to muscle memory. Usually, I can get by with those just all right. If I need to do something a little more involved, I can look at the man page, or failing that, I can Google it.
The most “basic”
socat invocation would:
socat [options] <address> <address>
A more concrete example would be:
socat -d -d - TCP4:www.example.com:80
-d -d would be the
- would be the first
TCP4:www.example.com:80 would be the second
At first glance, this might seem like a lot to take in (and the examples in the man page are, if anything, even more inscrutable), so let’s break each component down a bit more.
Let’s first start with the
address, since the
address is the cornerstone aspect of
In order to understand
socat it’s important to understand what
addresses are and how they work.
address is something that the user provides via the command line. Invoking
socat without any addresses results in:
2018/09/22 19:12:30 socat E exactly 2 addresses required (there are 0); use option "-h" for help
address comprises of three components:
addresstype, followed by a
- zero or more required
addressparameters separated by
- zero or more
addressoptions separated by
The type is used to specify the kind of
address we need. Popular options are TCP4, CREATE, EXEC, GOPEN, STDIN, STDOUT, PIPE, PTY, UDP4 etc, where the names are pretty self-explanatory.
However, in the example we saw in the previous section, a
socat command was represented as
socat -d -d - TCP4:www.example.com:80
- was said to be one of the two
addresses. This doesn’t look like a fully formed
address that adheres to the aforementioned convention.
This is because certain
address types have aliases.
- is one such alias used to represend
STDIO. Another alias is
TCPwhich stands for
TCPv4. The manpage of
socat lists all other aliases.
Immediately after the
type comes zero or more required address
parameters separated by
The number of address
parameters depends on the address
TCP4 requires a server specification and a port specification (number or service name). A valid
address of type
TCP4 established with port 80 of host
www.example.com would be
Another example of an
address would be
UDP_RECVFROM:9125 which creates a UDP socket on port 9125, receives one packet from an unspecified peer and may send one or more answer packets to that peer.
UDP_RECVFROM) is sometimes optional.
Address specifications starting with a number are assumed to be of type
FD(raw file descriptor) addresses. Similarly, if a
/is found before the first
, , then the address type is assumed to be
GOPEN (generic file open).
Address parameters can be further enhanced with
options, which govern how the opening of the
address is done or what the properties of the resulting bytestreams will be.
Options are specified after
address parameters and they are separated from the last address parameter by a
, indicates when the address parameters end and when the
options begin). Options can be specified either directly or with an
Extending the previous example, we can specify the option
retry=5 on the address to specify the number of times the connection to
www.example.com needs to be retried.
Similarly, the following
address allows one to set up a TCP listening socket and fork a child process to handle all incoming client connections.
option belongs to one
option group. Every
address type has a set of
option groups, and only
options belonging to the
option groups for the given
address type are permitted. It is not, for example, possible to apply
options reserved for sockets to regular files.
For example, the
creat option belongs to the
OPEN option group. The
creat option can only be used with those
address types (
PIPE) that have
OPEN as a part of their option group set.
OPEN option group allows for the setting of flags with the
open() system call. Using
creat as an
option on an
address of type
O_CREAT flag when
open() is invoked.
Now that we have a slightly better understanding of what
addresses are, let’s see how data is transferred between the two
socat establishes two unidirectional bytestreams between the two
For the first stream, the first
address acts as the data source and the second
address is the data sink. For the second bytestream, the second
address is the data source and the first
address is the data sink.
-u ensures that the first
address can only be used for reading and the second
address can only be used for writing.
In the following example,
socat -u STDIN STDOUT
address (STDIN) is only used for reading data and the second
address (STDOUT) is only used for writing. This can be verified by looking at what
socat prints out when it is invoked:
$ socat -u STDIN STDOUT
2018/10/14 14:18:15 socat N using stdin for reading
2018/10/14 14:18:15 socat N using stdout for writing
socat STDIN STDOUT opens both the
addresses for reading and writing.
$ socat STDIN STDOUT
2018/10/14 14:19:48 socat N using stdin for reading and writing
2018/10/14 14:19:48 socat N using stdout for reading and writing
Single Address Specification and Dual Addresses
address of the sort we’ve seen till now that conforms to the aforementioned format is known as a single address specification.
socat -d -d - TCP4:www.example.com:80
socat command with two single address specifications ends up establishing two unidirectional bytestreams between the two
In this case, the first
address can write data to the second
address and the second
address would write data back to the first
address. However, it’s possible that we might not want the second
address to write data back to the first
address (STDIO), but instead we might want it to write the data to a different
address (a file, for example) instead.
Two single addresses specifications can be combined with
!! to form a dual type
address for one bytestream. Here, the first
address is used by
socat for reading data, and the second
address for writing data.
A very simple example would be the following:
socat -d -d READLINE\!\!OPEN:file.txt,creat,trunc SYSTEM:'read stdin; echo $stdin'
In this example, the first
address is a dual address specification (
READLINE!!OPEN:file.txt,creat,trunc), where data is being read from a source (
READLINE) and written to a different sink (
socat reads the input from
READLINE, transfers it to the second
SYSTEM — which forks a child process, executes the shell command specified). The second
address returns data back to a different sink of the first
address — in this example it’s
OPEN:file.txt, creat, trunc .
This was a fairly trivial example where only one of the two
addresses was a dual address specification. It’s possible that both of the
addresses might be dual address specifications.
Address options versus socat options
It’s important to understand that the
options that apply to an
address is different from invoking
socat itself with specific options, which govern how the
socat tool behaves (as opposed to shaping how the
We’ve already seen previously that adding the option
socat ensures that the first
address os only opened for reading and the second
address for writing.
Another useful option to know about is the
-d option, which can be used with
socat to print warning messages, in addition to fatal and error messages.
-d -d will print the fatal, error, warning, and notice messages. I usually have
socat aliased to
socat -d -d in my dotfiles.
Similarly, if one invokes
socat -h or
socat -hh , one will be presented with a wall of information which can be a little overwhelming and not all of which is required for getting started.
The Lifecycle of a socat instance
socat process goes through four phases. The first phase comprises of parsing the command line options and initializing logging. The second phase comprises of opening the first
address followed by the second
address. Opening of the
addresses is usually a blocking operation, and per the man page, “especially for complex address types like SOCKS, connection requests or authentication dialogs must be completed before the next step is started.”
The third transfer phase comprises of
socat monitoring the
write file descriptors of both streams. It does so via the
select system call (I’ve written about file descriptors and
socat monitors the read file descriptors of both the
addresses. When data is available at any of the sources and the corresponding sink of the other address is ready to accept a write,
socat goes on to read the data, perform newline character conversions if required, and write the data to the write file descriptor of the sink. It then continues to monitor the read file descriptors of both the
The closing phase begins when one of the two bytestreams reaches EOF.
socat detects the EOF of one bytestream and tries to shut down the write file descriptor of the other bytestream. This paves the way for a graceful termination of the other bytestream. For a defined time
socatcontinues to transfer data in the other direction, but then closes all remaining file descriptors and terminates.
The man page for
socat has a good list of examples, though I found understanding how
addresses are constructed to be a prerequisite to understanding the examples.
With a better understanding of
addresses, I hope the following examples (lifted from the man page) should be fairly straightforward to follow.
socat - tcp:www.blackhat.org:31337,readbytes=1000 connects to an unknown service and prevents being flooded.
socat -U TCP:target:9999,end-close TCP-L:8888,reuseaddr,fork merges data arriving from different TCP streams on port 8888 to just one stream to target:9999. The end-close option prevents the child processes forked off by the second
address from terminating the shared connection to 9999 (close(2) just unlinks the inode which stays active as long as the parent process lives; shutdown(2) would actively terminate the connection).
Why use socat when you can use netcat
Netcat is a fantastic tool for network debugging and exploration, but it’s mostly limited to TCP and UDP connections.
socat, in comparison, supports a very wide variety of
Yet another limitation of
netcat is that the lifetime of a
netcatconnection is constrained by the socket close time. What this means is that when one of the ends closes the socket, the
netcat connection ends. With
socat , as detailed in the section on the lifetime of a
socat instance, the closing phase paves the way for a more graceful termination of the bytestreams.
A War Story
I first learned about
socat while watching an ex-colleague fix what was a minor outage. At a previous job, we had a public facing API service A that spoke to another service B for a very, very, very small fraction of requests.
We used Consul for service discovery, so all of our services discovered each other dynamically by setting a Consul watch.
All, except for service B.
For reasons that are outside the scope of this post, service B (we only ran one instance of service B) ran on a static host (let’s call this host X) on a fixed port, whereas all of our other services did a
bind 0 and were scheduled dynamically. The IP address for service B was hardcoded in service A’s codebase instead of being discovered from Consul. This was supposed to be a stopgap measure; service B was in the process of being decommissioned and its replacement was expected to be deployed and configured the way all of our other services were (i.e., dynamically). It’s also worth mentioning that service B was very rarely deployed.
Late one evening, all requests originating from service A to service B started failing. Out monitoring alerted me to this fact, and upon investigation, I realized that service B had been deployed earlier that day and it wasn’t running on the host X anymore but was running on host Y.
I could’ve updated service A’s codebase to hardcode the new IP address of service B followed by redeploying service A. Except I wasn’t the developer primarily responsible for service A (in fact, I’d never worked with it previously) and I wasn’t entirely up to speed with how service A was deployed either.
Now in an ideal world, deployments of all our services would’ve been uniform, all our runbooks up to date and all services impeccably documented. However, in the real world, this isn’t the always the case (is it ever the case?). The developer responsible for service A was out for the day and didn’t respond back immediately to a Slack message. Our SRE, however, mentioned that it was fairly easy to fix with
socat. It was the first time I’d heard the name of this tool.
I’m not entirely sure what the magical incantation was, but I assume it must’ve been fairly trivial to setup a
socat server on host X that listened to all incoming TCP connections on port 8004 and forward them to host Y.
socat -d -d
Now far be it from me to argue that this was an ideal way to solve this problem, but it worked all the same. The next day, service B was decommissioned entirely, its replacement was deployed and was discovered by service A through Consul.
While I might’ve learned about
socat for the first time while watching an SRE troubleshoot and fix a production issue, I personally use
socat not so much as a production debugging tool than for troubleshooting local development issues (especially when developing network services).
In addition with
lsof (which I’ve written about previously),
socat is the tool I’ve reached for the most for troubleshooting networking issues.