full-stack

the trouble with socket timeout

Danilo Moret
moret1979

--

originally published on 2014–09–03 at https://moret1979.wordpress.com/

Hi. We’re currently upgrading a Ruby driver at our platform at work. At the socket level, the old version of this driver uses IO.select, which boils down to the OS’s select system call. A tried and true solution, working as expected on any scenario: it waits for a certain time, if the time runs out it simply returns nothing and resumes execution. So if a client connects to a server and it stops responding but doesn’t close the connection, the client can decide what to do with that. Here’s an example of that:

Run the server, and then run client-io-select.rb. As expected, it will timeout after 2s while the server is deliberately sleeping for 5s. Change the client timeout to 6s and it will print the server reply. The new version of the driver changed that implementation in favour of setting the timeout value as an option of the socket, as specified in the socket man page and other places. So instead of using IO.select, it’s using Socket’s setsockopt method before connecting to set both SO_RCVTIMEO and SO_SNDTIMEO, which translate to the OS’s socket options. After connecting it uses the socket read method directly, trusting on Ruby and the OS to handle timeouts, which sounds nice. However, we found that the support for those options is somewhat inconsistent through Ruby MRI’s versions — I didn’t test it on other Ruby implementations — and on different operating systems. An example of a client using this approach:

We ran that client on Ruby 1.8.7-p374, 1.9.3-p545 and 2.1.2 at Mac OS X 10.9.4, all of them installed via rvm. The server is the same of the first example. On old Ruby 1.8 the client timed out as expected. On the other Ruby versions it waited the server response instead. Before getting to that conclusion, we also ran some tests using C because we thought that different operating systems could follow or not those socket options. Here is the C client we wrote to test it:

We ran that client on Mac OS X 10.9.4 with LLVM 5.1, Ubuntu 14.04 with GCC 4.8.2 and on CentOS 5.8 with GCC 4.1.2 . We used the same server of the first example. On OS X the client timed out as expected, but on Ubuntu and CentOS it didn’t. Don’t forget to test it yourself, specially with newer Ruby versions: one of the posts we found while investigating this described a different behaviour because it was based on Ruby 1.8 five years ago. I couldn’t find the reason behind the difference between Ruby versions — it might be a build option that had a default before, but I can’t pinpoint why without some better knowledge of the Ruby codebase. The same applies for the different operating systems. But the lesson is: setting socket options for sockets on those Ruby builds does not produce the expected behaviour currently.

--

--

Danilo Moret
moret1979

I am who I am. Software developer, game player, carioca.