Prevent concurrent exception in Ohm/Redic

Igor Guzak
2 min readNov 17, 2019

--

On one of the projects we catch a few exceptions in a short period of time which pointed to Ohm::Model objects with the message RuntimeError: not connected. Ohm is a gem that provides objects mapping for Redis and the issue in the first place was fixed with some tweaks around Redis connections, but the reason for the issue was not clear. Why connection to Redis was not established? Or maybe the connection was lost due to some reasons? So it was decided to clarify where such exception comes from and why.

Ohm by default use Redic adapter for connecting to Redis. In Redic client implementation lib/redic/client.rb:

def connect
establish_connection unless connected?
@semaphore.synchronize do
yield
end
rescue Errno::ECONNRESET
@connection = false
retry
end
def establish_connection
begin
@connection = Redic::Connection.new(@uri, @timeout)
rescue StandardError => err
raise err, “Can’t connect to: %s” % @uri
end
# ...
end

we could see that StandardError exception during creation of connection instance is handled and exception message raised in result is different from what we’ve got, at the same time we handle Errno::ECONNRESET exception which expects to re-establish connection via retry if it to was lost. So, there is no reason why we could have RuntimeError: not connected, to clarify it we need dive dipper into implementation of Redic::Connection which simply wraps Hiredis::Connection from hredis-rb gem.

Moving to Hiredis we could find connection implementation which by default lead us to Hiredis::Ext::Connection which wraps C implementation of connection used for better performance and maps EOFError to already spotted in Redic exception Errno::ECONNRESET

module Hiredis
module Ext
class Connection
def read
_read
rescue ::EOFError
raise Errno::ECONNRESET
end
end
end
end

moving down the line to C-extension we could find definition of read method https://github.com/redis/hiredis-rb/blob/master/ext/hiredis_ext/connection.c#L459

static VALUE connection_read(VALUE self) {
// ....
if (__get_reply(pc,&reply) == -1)
parent_context_raise(pc);

which means that if a connection is lost we invokeparent_context_raise(pc) which expects that connection will be nullified and raised `EOFError` exception https://github.com/redis/hiredis-rb/blob/master/ext/hiredis_ext/connection.c#L61

static void parent_context_raise(redisParentContext *pc) {
// ...
parent_context_try_free(pc);
// ...
case REDIS_ERR_EOF:
/* Raise native Ruby EOFError */
rb_raise(rb_eEOFError,"%s",errstr);

as we already have seen EOFError later is transformed to Errno::ECONRESET and expect to be handled by Redic via creating a new connection, but if we will try to send data to such broken connection RuntimeError: not connected will be raised, so why the connection was not re-established but instead we send data to bad one? It happens simply because Redic establish_connection was involved outside of @semaphore.synchronize block which expects to sync all execution, as a result, while Errno::ECONNRESET was handled by one thread, others were trying to send data to old broken one and resulted in exception.

As conclusion, the issue was spotted and synchronization was fixed in PR to Redic https://github.com/amakawa/redic/pull/15

--

--