SignalR Core: Heartbeat and Redis

Alim Özdemir
Dec 23, 2019 · 3 min read

Realtime applications are hard to design in .NET world, we have SignalR Core which gives us a painless interface for developing such applications. SignalR Core is a very new library, here I will be discussing a problem case which came with the latest versions.

The story come up with a problem that I faced. I have been storing the client information on Redis. My application records client information with OnConnectedAsync method and remove it with OnDisconnectedAsync method. Moreover, we have an object called _clientList for inserting and removing clients, and for updating a clients information.

public async Task OnConnectedAsync()
{
...
_clientList.CreateUser(Context.ConnectionId);
...
}
public async Task OnDisconnectedAsync(Exception ex)
{
...
_clientList.RemoveUser(Context.ConnectionId);
...
}

This seems good enough for cases without a failure. However, we have to consider the worst case. _clientList must be reliable for the reading the data. Which means, we should be able to see the most up-to-date status information for the online users.

Further note; the story’s code does not include a Redis implementation.

Scenario 1

Redis has crashed and a new connection has establisted. _clientList.CreateUser method will also crash. And, we can’t see the new client on the Redis.

Scenario 2

Redis was working without problem and then it crashed or some network problem have occurred between server and redis. So, what will happen this in case ?

You guessed it right, the _clientList.RemoveUser will fail too. And our reliable data won’t be reliable anymore.

You have to cover these scenarios for better realtime applications. Let’s start.

Solutions

Scenario 1 has a basic solution. You can ignore the new connections while Redis is down. Which means, you don’t allow new connections. In order to accomplish this, basically call the Context.Abort()

public async Task OnConnectedAsync()
{
try {
...
_clientList.CreateUser(Context.ConnectionId);
...
}
catch (Exception ex) {
// put your logger logic here
Context.Abort();
}
}

After that, you can show such a message to user ‘Server is offline’. Problem solved for reliable data. Next step would be to solve the problem in Redis.

For advanced usage, this is not a proper solution. You have to make sure that Redis is always online, or have a backup plan in order to prevent server uptime loss.

Scenario 2, the main point of this story. Data stuck on the Redis. And, we can’t rely the data anymore. SignalR sending ping to the clients for testing their connections. It called as Heartbeat.

Old SignalR was using an extra interface to control existing connections. ITransportHeartbeat, it has collection of existing connections, so we can remove stuck data with iteration. I won’t delve into details here, you can refere the David Flow’s code here. However, taking all connections and processing them might cause some performance issues. In order to solve this, SignalR takes a different approach by implementing a new method between its ping/pong cycles on its own Backend.

ASP.NET Core has Connection Features. And, it includes IConnectionHeartbeatFeature interface. You must use the following code inside of a hub. Let's see the code.

private void Heartbeat()
{
var heartbeat = Context.Features.Get<IConnectionHeartbeatFeature>();
heartbeat.OnHeartbeat(state => {
(HttpContext context ,string connectionId) = ((HttpContext, string))state;
var ClientList = context.RequestServices.GetService<IClientList>();
ClientList.LatestPing(connectionId);
}, (Context.GetHttpContext(), Context.ConnectionId));
}

Call the Heartbeat method on OnConnectedAsync. Also, if you want to use a service from DI, You have to request with context.RequestServices.GetService . Then, we just update the client’s LatestPing parameter. This property will be updated on Redis.

To verify, the results check the http://localhost:5000/api/clients

Result

So, now we know latest client ping time. We can loop through the redis and if we didn’t receive a ping at least 5 min. from a client, we can remove it. The control time should be depend on your reconnecting time span settings. Because a re-connection might have happened after a natural disconnection case.

Conclusion

To sum up, we can use the latest ping time in order to deal with many cases. Of course, SignalR Core itself needs more features for edge cases. On the other hand, this solution is not perfect. Depending on the system design, a completely different service approach might be required.

You can check the code on github. I have used, the ChatSample from SignalR_samples repository. And it is just a demo.

Thanks to Halit Uyanık

Alim Özdemir

Written by

ITU. CE. LCWaikiki

More From Medium

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade