Devlog 19 — Networking rabbit hole

6 min readMay 25, 2024

Welcome to a devlog for my retro arena FPS game called Rend. This devlog is one of many, and you can access the other ones from this list. These devlogs provide insight into the development process and challenges of writing a game engine in C++ from scratch. The game is available for free on Itch.io.

Networking is giving me hell. Plugging the network stack into the existing game architecture was surprisingly easy, but that’s where easy ended. With this devlog, I want to share the network architecture used in Rend and the challenges I am currently facing with the protocol performance and stability. And hopefully, I’ll figure out what to do next.

The article is structured into three sections that peel of the layers of complexity of the network stack.

Top layer

The high-level view is quite normal with a server-client relationship. Each client sends its inputs to the server, the server aggregates them, and then sends back all inputs to all clients at once.

The server also tracks the state of the communication, differentiating between the following states:

Lobby
Map loading
Ingame

When all peers are ready, the server switches from lobby to map loading, prompting the peers to initialize the game. Once all peers have the game initialized, they send a second ready signal and the game can start.

Once a match finishes, the server goes back to the lobby state, even though the clients are not returned to the initial lobby. They are only on an intermission screen from which they can quit the game or make themselves ready for the next map.

To make the server maintenance as simple as possible, the server is not a standalone binary — instead, it is started automatically on a parallel thread when a player enters the game creation screen. While on that screen (or on the intermission screen), other players can join the lobby if they know the IP address.

The server could be a standalone binary, I just don’t assume anybody wants to learn its command-line parameters in this day and age. However, it would be trivial to add such feature in the future (try saying that out loud three times in a quick succession).

Middle layer

So how does the communication protocol look like? The server packet holds the most data, as it aggregates all the info from the clients and sends it back to them:

struct [[nodiscard]] ServerUpdateData final
{
    ServerState state = ServerState::Lobby;
    LobbySettings lobbySettings;
    std::vector<ClientData> clients;
    std::vector<InputData> inputs;
};

enum class [[nodiscard]] ServerState
{
    Lobby,
    MapLoading,
    GameInProgress
};

struct [[nodiscard]] LobbySettings final
{
    // Here goes stuff like:
    // * how many bots will be in the game
    // * point limits
    // * which maps to play and in which order
};

struct [[nodiscard]] ClientData final
{
    ClientState state = ClientState::Connected;
    // Here goes properties like:
    // * name
    // * input preferences
    // * and so on
};

enum class [[nodiscard]] ClientState
{
    Disconnected,
    Connected,
    ConnectedAndReady,
    ConnectedAndMapReady
};

struct [[nodiscard]] InputData
{
    PlayerIdxType clientId;
    size_t tick;
    InputSchema input; // structure representing the currently pressed buttons
};

The ClientState::Disconnected state allows me to mark a slot in clients as no longer used (so all indices still work) and this slot would be used by any newly connected client.

The client side is less complicated:

struct [[nodiscard]] ClientMessage final
{
    ClientMessageType type;
    std::string jsonData;
    size_t tick;
}

enum class [[nodiscard]] ClientMessageType : uint8_t
{
    ConnectionRequest,
    PeerSettingsUpdate,
    LobbySettingsUpdate,
    ReportPeerReady,
    ReportMapReady,
    ReportInput,
    ReportMapEnded,
    Disconnect
};

In many cases, the client just needs to send a different ClientMessageType with some instances where the payload is necessary — like sending the current ClientData, LobbySettings, or InputData.

An extra thing that I’ll be adding in the future will be the protocol version for possible updates. The code is not released to the public at the moment so I don’t have to worry about a protocol mismatch yet.

Bottom layer

This architecture works fine, although a proper networking engineer would likely view it as primitive and blunt. Please don’t judge me, this is the second time I have ever written a network protocol, and I have no idea what I am doing.

Let’s move to the stuff that is NOT working. I’ve mentioned in many earlier devlogs that the Rend is using a rollback networking approach. Instead of having an authoritative server that would maintain the state of the world and just tell the clients where they are and whether their inputs have been accepted, my server is only a message relay.

It has a vague concept of Lobby-Loading-Game states to be able to filter out certain kinds of messages when they shouldn’t happen, but aside from that, it just synchronizes the clients.

Peers work on the premise of having a deterministic simulation — as far as all peers have the same inputs at the same frame, the simulation behaves equally. The most simplistic implementation of this concept would be to wait at the start of each frame until inputs from all other peers are accepted and continue from there.

That is called a lockstep model and it is heavily dependent on the network performance. The first time I’ve ever written a network protocol, I’ve used lockstep and it was only playable over a LAN cable. The local wi-fi network is way too spotty for lockstep to be playable.

With rollback, I keep the last N states of the game in a circular buffer. The local simulation doesn’t wait for inputs from other players. Instead, when a packet is accepted, it knows the index of a state it should apply the packet to, so it rolls back to that state and resimulates from there.

Such an approach can, in theory, deal with a network lag equal to N * 1000 / FPS milliseconds, assuming that you can resimulate up to N frames and not negatively impact your frame rate. A disadvantage of this approach is that when implemented naïvely (guilty as charged), you cannot connect new players mid-game as they don’t have the current state of the simulation.

Perhaps, clients could upload “confirmed states” (states for which they got inputs from all peers) to the server so it can give the new player the latest confirmed state. At this point, it would be too much hassle for me so I am rather going to prevent mid-game connections.

I have a more pressing problem to solve — for some reason, the client machine that also runs the server, runs slightly slower than all other clients, eventually desyncing those other peers as they receive a packet that no longer fits into their rollback window.

The “server-side client” is plagued with an opposite problem — it is getting packets from the future! However, this problem is easier to solve, as I can just put these future packets into a buffer and process them once their frame arrives.

It’s very hard to determine what causes the time displacement between the peers — whether it is the extra network communication, whether it has to deal with thread switching between the client and server code, or whether it is some kind of snowball effect, I dunno. But right now it is THE problem that is keeping me from having a working net game.

And I have no idea how to fix this situation! I once heard the term “frame lengthening” that could be used for dealing with a longer network lag (as this case can kind of be treated as a network lag). I am not sure how it is supposed to be implemented, but I am thinking about just putting extra Sleeps into the client code that is getting packets near the rollback limit to slow down the client that is going too fast.

Summary

And that is my most important to-do till the next devlog — to eliminate time displacement between the clients. Once that is done, I am pretty sure to start desyncing because of the simulation not going on the same on both machines, but you know what they say — one step at a time. See you then!

Devlog 19 — Networking rabbit hole

Top layer

Middle layer

Bottom layer

Summary

Written by Jakub Neruda