Building Scalable Go Microservices
Scale stateless and stateful services with HAProxy
One of web applications’ most critical non-functional requirements is that they should be scalable. Scalability is an application’s ability to handle an increased load without a decrease in performance or availability. And the primary goal of the business is to scale to earn more money. So, nowadays, it’s required to build applications able to scale. But sometimes, it takes a lot of work. In this post, I will describe how to scale different web applications.
Scaling Types
There are two scaling types:
- Vertical scaling (or “scaling up”) — adding more resources (CPU, memory, bandwidth, or storage) to the existing servers;
- Horizontal scaling (or “scaling out”) — adding more servers to distribute the traffic.
Vertical Scaling
It’s the easiest way to scale up your application. You add more computing or/and storage resources based on the needs. The application logic doesn’t care about machine resources, so nothing special is needed to perform scaling up. Every application can scale up. But this type of scaling is not enough. There are several reasons why:
- Machine resources are limited, so you won’t be able to scale up endlessly;
- Powerful machines cost a lot of money;
- In case your application is deployed to a single machine, in case this machine goes down, your application becomes unavailable;
- It’s hard to deploy the application to a single machine without downtime;
- Machine maintenance is impossible without application downtime.
You can’t also name your application “scalable” if it can scale only vertically. As you can see, applications deployed to a single machine cannot be highly available. From the very beginning, you need to be able to host your application on at least two machines. To be able to do it, your application should support horizontal scalability.
Horizontal Scaling
The ability for web applications to scale out is crucial, but it comes with more complexity. First, if your application is hosted on several machines, traffic between these machines should be balanced. It’s possible to traffic balance on the client and the server side.
Client-side traffic balancing:
- Domain Name System (DNS) — you can use multiple IP addresses for a single domain name and balance traffic between these IPs in different algorithms (round-robin is most popular);
- Client-side — client needs to know server addresses and balance traffic between them by itself. It usually involves a library developed by server application owners that clients can use. Clients won’t implement traffic balancing on their own.
Server-side traffic balancing:
- Load balancer — a server that distributes incoming traffic across multiple servers. There are hardware-based and software-based load balancers. Hardware-based load balancing involves using a physical device to spread incoming traffic. Software-based load balancing involves using software to spread incoming traffic;
- Content Delivery Network (CDN) — a network of servers distributed worldwide designed to deliver content to users from the server closest to them. CDN can distribute traffic between multiple servers and cache content to improve user experience and performance.
It depends on the application you’re developing and your balancing strategy. The most popular technique for microservices involves using load balancers because client interaction with the server looks like interaction with a single machine by a single domain name. You can scale up and down target applications without changes in clients.
Hardware-based load balancers involve changes in the network on the physical level that are not possible when you host your application in the cloud. So, the popular option involves using software-based load balancers. The most popular software-based load balancers are:
- HAProxy — free, open-source, reliable, high-performance TCP/HTTP load balancer;
- NGINX — a web server and reverse proxy that can be used as a load balancer;
- Apache HTTP Server — a web server that can be used as a load balancer;
Now, let’s dive deep into how to scale different applications. But first, let’s start with the two types of web applications.
Web Application State
In scope of state, web applications can be stateless or stateful.
A stateless web application is an application that does not store data on the server between requests. Usually, data is stored in the external database, or each request contains all the information necessary to process the request.
A stateful web application is an application that stores data on the server. Data is stored in the machine storage or cached in the memory.
Different development approaches exist to build scalable-stateless and scalable-stateful web applications. Let’s build simple key-value storage that provides HTTP API for CRUD operations. But first, let’s use an external database to store data (build a stateless application), and then, build a database inside our application (build a stateful application).
Scalable Stateless Web Application
You can use many databases to build your web applications: relational, NoSQL, in-memory, columnar, time-series databases, etc. They all have their pros&cons for different scenarios. I’ll use one of the most popular key-value in-memory databases, named Redis, to build simple key-value storage.
Redis supports three operations that can be used to develop current applications:
GET key
— get the value ofkey
;SET key value
— setkey
to hold the stringvalue
.DEL key
— removes the specifiedkey
.
I will host a web server using the popular Go web framework Fiber. Let’s write an application code:
package main
import (
"context"
"os"
"github.com/gofiber/fiber/v2"
"github.com/redis/go-redis/v9"
)
type KeyValue struct {
Key string `json:"key"`
Value string `json:"value"`
}
func main() {
app := fiber.New()
addr := os.Getenv("REDIS_ADDR")
if addr == "" {
addr = "localhost:6379"
}
rdb := redis.NewClient(&redis.Options{
Addr: addr,
})
_, err := rdb.Ping(context.Background()).Result()
if err != nil {
panic(err)
}
app.Get("/health", func(c *fiber.Ctx) error {
return c.SendStatus(fiber.StatusOK)
})
app.Get("/key/:key", func(c *fiber.Ctx) error {
key := c.Params("key")
val, err := rdb.Get(c.Context(), key).Result()
if err != nil {
return c.Status(fiber.StatusNotFound).JSON(fiber.Map{
"error": "Key not found",
})
}
return c.JSON(KeyValue{Key: key, Value: val})
})
app.Post("/key", func(c *fiber.Ctx) error {
var kv KeyValue
if err := c.BodyParser(&kv); err != nil {
return c.Status(fiber.StatusBadRequest).JSON(fiber.Map{
"error": "Invalid request body",
})
}
err = rdb.Set(c.Context(), kv.Key, kv.Value, 0).Err()
if err != nil {
return c.Status(fiber.StatusInternalServerError).JSON(fiber.Map{
"error": "Failed to set key-value pair",
})
}
return c.JSON(kv)
})
app.Put("/key/:key", func(c *fiber.Ctx) error {
key := c.Params("key")
var kv KeyValue
if err := c.BodyParser(&kv); err != nil {
return c.Status(fiber.StatusBadRequest).JSON(fiber.Map{
"error": "Invalid request body",
})
}
err = rdb.Set(c.Context(), key, kv.Value, 0).Err()
if err != nil {
return c.Status(fiber.StatusInternalServerError).JSON(fiber.Map{
"error": "Failed to update key-value pair",
})
}
return c.JSON(KeyValue{Key: key, Value: kv.Value})
})
app.Delete("/key/:key", func(c *fiber.Ctx) error {
key := c.Params("key")
err := rdb.Del(c.Context(), key).Err()
if err != nil {
return c.Status(fiber.StatusInternalServerError).JSON(fiber.Map{
"error": "Failed to delete key-value pair",
})
}
return c.SendStatus(fiber.StatusNoContent)
})
if err := app.Listen(":4000"); err != nil {
panic(err)
}
}
Application has next API:
GET /health
— health check;GET /key/:key
— returns key-value pair if key exists in Redis. Otherwise — Not Found;POST /key
— saves key-value pair to Redis;PUT /key/:key
— updates key-value pair if it exists. Otherwise — saves key-value pair. Logic is the same as in previous method;DELETE /key/:key
— deletes key-value pair from Redis.
What happens if you start several copies of your application (I will call it “cluster”)? All of them will use the same Redis server (it might be a cluster as well) to store/retrieve data. The only remaining question is how to balance traffic between the cluster nodes. HAProxy can solve this issue.
Let’s start with several copies of our application with Docker Compose. But first, let’s define Dockerfile
for the Go application:
FROM golang:1.21
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o /out/app main.go
WORKDIR /out
CMD ["./app"]
Now, let’s define compose file with three instances of Go web server:
version: '3.9'
services:
app1:
restart: always
build:
context: .
dockerfile: Dockerfile
environment:
REDIS_ADDR: redis:6379
expose:
- '4000'
depends_on:
- redis
app2:
restart: always
build:
context: .
dockerfile: Dockerfile
environment:
REDIS_ADDR: redis:6379
expose:
- '4000'
depends_on:
- redis
app3:
restart: always
build:
context: .
dockerfile: Dockerfile
environment:
REDIS_ADDR: redis:6379
expose:
- '4000'
depends_on:
- redis
redis:
image: redis:7
volumes:
- redis-data:/data
expose:
- '6379'
volumes:
redis-data:
Now let’s balance traffic between cluster. To do it, let’s define HAProxy configuration:
defaults
mode http
timeout connect 5000
timeout client 50000
timeout server 50000
frontend http-in
bind *:80
default_backend app
backend app
balance roundrobin
option httpchk GET /health
server app1 app1:4000 check
server app2 app2:4000 check
server app3 app3:4000 check
This configuration defines three sections:
defaults
— default settings for all sections;frontend
— defines how incoming requests are handled;backend
—defines how requests are forwarded to servers.
So, HAProxy will listen to 80 HTTP port and balance traffic between cluster nodes. Also, HAProxy will perform periodic health checks, and if the server does not respond successfully to the health checks, it will be removed from the balancer. Health checks are done to exclude unhealthy or unavailable instances. Now let’s update the compose definition:
version: '3.9'
services:
app1:
restart: always
build:
context: .
dockerfile: Dockerfile
environment:
REDIS_ADDR: redis:6379
expose:
- '4000'
depends_on:
- redis
app2:
restart: always
build:
context: .
dockerfile: Dockerfile
environment:
REDIS_ADDR: redis:6379
expose:
- '4000'
depends_on:
- redis
app3:
restart: always
build:
context: .
dockerfile: Dockerfile
environment:
REDIS_ADDR: redis:6379
expose:
- '4000'
depends_on:
- redis
haproxy:
image: haproxy
volumes:
- ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
ports:
- '8081:80'
depends_on:
- app1
- app2
- app3
redis:
image: redis:7
volumes:
- redis-data:/data
expose:
- '6379'
volumes:
redis-data:
To start everything, execute next command:
docker compose up -d
Now, if you try to execute the next request, you will go through HAPRoxy to one of the instances of application:
curl --location 'localhost:8081/key' \
--header 'Content-Type: application/json' \
--data '{
"key": "hello",
"value": "world"
}'
Now try to stop app1
with next command:
docker compose stop app1
Check HAProxy logs in few seconds with next command:
docker compose logs haproxy
You will see something like that:
go-stateless-service-haproxy-1 | [WARNING] (8) : Server app/app1 is DOWN, reason: Layer4 timeout, check duration: 2003ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
HAProxy detected an unavailable server with the health check and removed it from the balancer. It means your requests will never go to app1
through HAPRoxy until it is dead. Now start app1
:
docker compose start app1
In few seconds you will see in HAProxy logs next:
go-stateless-service-haproxy-1 | [WARNING] (8) : Server app/app1 is UP, reason: Layer7 check passed, code: 200, check duration: 0ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
It means HAProxy detected healthy app1
and returned it to the load balancer.
The state is stored by Redis. You have no issues with horizontal application scaling until all the state is stored in the external database.
Let’s see what’s the difference with stateful application.
Scalable Stateful Web Application
Because the state is stored on the runtime machine storage or in memory, scaling a stateful application requires replicating this state across all the machines. And there are a lot of questions you can ask:
- What algorithms are used to replicate the state across the cluster?
- If the user saves some data to machine 1 and performs a GET request immediately after it to machine 2, what response will he see?
- How would the conflict be resolved if user 1 saved pair
key-value1
to machine 1 and user 2 saved pairkey-value2
to machine 2 simultaneously? - and so on…
So let’s start from the beginning.
Several cluster configurations are used to coordinate the distributed system:
- Single-Leader — A single node in a cluster acts as a leader node and accepts all write requests. Requests are replicated across the cluster nodes. This configuration is used in most relational databases because it ensures consistency, but the leader can become a bottleneck if it becomes overloaded;
- Multi-Leader — several nodes in the cluster can act as a leader. Each leader is responsible for a subset of the data. Writes are sent to the appropriate leader and then replicated across the cluster. This configuration improves availability and performance, but conflicts are possible;
- Leaderless — there is no designated leader and all nodes are equal. As well it improves availability and performance, but conflicts are possible.
Consensus algorithms ensure all nodes in the distributed system agree on a single state. The most popular consensus algorithm is Raft. If you want some details on how it works, I recommend reading this article:
The process of finding servers in a cluster is called service discovery. You can use dynamic or static service discovery. Static configuration is provided when the service is started with a configuration file or environment variables. Dynamic configuration involves using a service registry to discover services automatically. The most popular service registries are Consul, etcd and ZooKeeper.
Next, we need to talk about the CAP theorem. CAP theorem (either Consistent or Available when Partitioned) is a concept that achieving all these three states in distributed systems is impossible. You can achieve only two: consistent and partition tolerant when sacrificing availability or highly available and partition tolerant but eventually consistent. In most systems, it’s better to be highly available than consistent, so AP (Available and Partition tolerant) is preferable to CP (Consistent and Partition tolerant), but not always. For financial systems, consistency is the main requirement.
So, let’s implement the same key-value storage, but this time without Redis. Inside the application, let’s implement the Raft protocol by using the HashiCorp library:
A web application is going to be eventually consistent with single leader, so it’s possible to receive outdated value after the save.
To implement Raft, you need to implement Finite State Machine (FSM):
package fsm
import (
"encoding/json"
"go-stateful-service/internal/model"
"io"
"log"
"sync"
"github.com/hashicorp/raft"
)
type FSM struct {
mu sync.RWMutex
kv map[string]string
}
type InternalFSM interface {
raft.FSM
Read(string) (string, bool)
}
var _ InternalFSM = (*FSM)(nil)
func NewFSM() InternalFSM {
return &FSM{kv: make(map[string]string)}
}
func (f *FSM) Restore(snapshot io.ReadCloser) error {
data, err := io.ReadAll(snapshot)
if err != nil {
return err
}
var snapshotData map[string]string
if err := json.Unmarshal(data, &snapshotData); err != nil {
return err
}
f.mu.Lock()
defer f.mu.Unlock()
f.kv = make(map[string]string)
for k, v := range snapshotData {
f.kv[k] = v
}
return nil
}
func (f *FSM) Snapshot() (raft.FSMSnapshot, error) {
f.mu.RLock()
defer f.mu.RUnlock()
snapshot := make(map[string]string)
for k, v := range f.kv {
snapshot[k] = v
}
return &FSMSnapshot{snapshot: snapshot}, nil
}
func (f *FSM) Apply(l *raft.Log) interface{} {
if l.Type == raft.LogCommand {
var kv model.KeyValue
if err := json.Unmarshal(l.Data, &kv); err != nil {
log.Printf("Failed to unmarshal log data: %v", err)
return nil
}
f.mu.Lock()
defer f.mu.Unlock()
if kv.Value == nil {
delete(f.kv, kv.Key)
} else {
f.kv[kv.Key] = *kv.Value
}
}
return nil
}
func (f *FSM) Read(key string) (string, bool) {
f.mu.RLock()
defer f.mu.RUnlock()
val, ok := f.kv[key]
return val, ok
}
Raft FSM should implement three methods:
Apply
— is called once a log entry is committed by a majority of the cluster;Snapshot
— returns an FSMSnapshot used to support log compaction, to restore the FSM to a previous state, or to bring out-of-date followers up to a recent log index.Restore
— used to restore an FSM from a snapshot. It is not called concurrently with any other command. The FSM must discard all previous state before restoring the snapshot.
Now let’s update the main function:
package main
import (
"encoding/json"
"go-stateful-service/internal/fsm"
"go-stateful-service/internal/model"
"log"
"net"
"os"
"path"
"strings"
"time"
"github.com/gofiber/fiber/v2"
"github.com/hashicorp/raft"
raftboltdb "github.com/hashicorp/raft-boltdb"
)
func main() {
raftPath := os.Getenv("RAFT_PATH")
if _, err := os.Stat(raftPath); os.IsNotExist(err) {
if os.Mkdir(raftPath, 0755) != nil {
log.Fatal("failed to create Raft path")
}
}
raftAddr := os.Getenv("RAFT_ADDR")
raftID := os.Getenv("RAFT_ID")
nodeAddrsStr := os.Getenv("RAFT_NODE_ADDRS")
nodeAddrs := strings.Split(nodeAddrsStr, ",")
if len(nodeAddrs) == 0 {
log.Fatal("no Raft node addresses specified")
}
serverAddr := os.Getenv("SERVER_ADDR")
config := raft.DefaultConfig()
config.LocalID = raft.ServerID(raftID)
config.ElectionTimeout = 2 * time.Second
config.HeartbeatTimeout = 500 * time.Millisecond
logStore, err := raftboltdb.NewBoltStore(path.Join(raftPath, "raft-log"))
if err != nil {
log.Fatal(err)
}
stableStore, err := raftboltdb.NewBoltStore(path.Join(raftPath, "raft-stable"))
if err != nil {
log.Fatal(err)
}
addr, err := net.ResolveTCPAddr("tcp", raftAddr)
if err != nil {
log.Fatal(err)
}
transport, err := raft.NewTCPTransport(addr.String(), addr, 3, 10*time.Second, os.Stderr)
if err != nil {
log.Fatal(err)
}
fsmStore := fsm.NewFSM()
snapshotStore, err := raft.NewFileSnapshotStore(path.Join(raftPath, "raft-snapshots"), 1, os.Stderr)
if err != nil {
log.Fatal(err)
}
raftNode, err := raft.NewRaft(config, fsmStore, logStore, stableStore, snapshotStore, transport)
if err != nil {
log.Fatal(err)
}
var servers []raft.Server
for _, nodeAddr := range nodeAddrs {
s := strings.Split(nodeAddr, "@")
if len(s) != 2 {
log.Fatalf("invalid Raft node address: %s", nodeAddr)
}
servers = append(servers, raft.Server{
ID: raft.ServerID(s[0]),
Address: raft.ServerAddress(s[1]),
})
}
f := raftNode.BootstrapCluster(raft.Configuration{
Servers: servers,
})
if err := f.Error(); err != nil {
log.Println(err)
}
app := fiber.New()
app.Get("/health", func(c *fiber.Ctx) error {
return c.SendStatus(fiber.StatusOK)
})
app.Get("/leader", func(c *fiber.Ctx) error {
state := raftNode.State()
if state != raft.Leader {
return c.Status(fiber.StatusServiceUnavailable).JSON(fiber.Map{
"error": "Not the leader",
})
}
return c.SendStatus(fiber.StatusOK)
})
app.Get("/key/:key", func(c *fiber.Ctx) error {
key := c.Params("key")
val, ok := fsmStore.Read(key)
if !ok {
return c.Status(fiber.StatusNotFound).JSON(fiber.Map{
"error": "Key not found",
})
}
return c.JSON(model.KeyValue{Key: key, Value: &val})
})
app.Post("/key", func(c *fiber.Ctx) error {
var kv model.KeyValue
if err := c.BodyParser(&kv); err != nil {
return c.Status(fiber.StatusBadRequest).JSON(fiber.Map{
"error": "Invalid request body",
})
}
b, err := json.Marshal(kv)
if err != nil {
return c.Status(fiber.StatusInternalServerError).JSON(fiber.Map{
"error": "Failed to marshal key-value pair",
})
}
if raftNode.State() != raft.Leader {
return c.Status(fiber.StatusServiceUnavailable).JSON(fiber.Map{
"error": "Not the leader",
})
}
if err := raftNode.Apply(b, 500*time.Millisecond).Error(); err != nil {
return c.Status(fiber.StatusInternalServerError).JSON(fiber.Map{
"error": "Failed to set key-value pair",
})
}
return c.JSON(kv)
})
app.Put("/key/:key", func(c *fiber.Ctx) error {
key := c.Params("key")
var kv model.KeyValue
if err := c.BodyParser(&kv); err != nil {
return c.Status(fiber.StatusBadRequest).JSON(fiber.Map{
"error": "Invalid request body",
})
}
b, err := json.Marshal(kv)
if err != nil {
return c.Status(fiber.StatusInternalServerError).JSON(fiber.Map{
"error": "Failed to marshal key-value pair",
})
}
if raftNode.State() != raft.Leader {
return c.Status(fiber.StatusServiceUnavailable).JSON(fiber.Map{
"error": "Not the leader",
})
}
if err := raftNode.Apply(b, 500*time.Millisecond).Error(); err != nil {
return c.Status(fiber.StatusInternalServerError).JSON(fiber.Map{
"error": "Failed to update key-value pair",
})
}
return c.JSON(model.KeyValue{Key: key, Value: kv.Value})
})
app.Delete("/key/:key", func(c *fiber.Ctx) error {
key := c.Params("key")
kv := model.KeyValue{
Key: key,
Value: nil,
}
b, err := json.Marshal(kv)
if err != nil {
return c.Status(fiber.StatusInternalServerError).JSON(fiber.Map{
"error": "Failed to marshal key-value pair",
})
}
if raftNode.State() != raft.Leader {
return c.Status(fiber.StatusServiceUnavailable).JSON(fiber.Map{
"error": "Not the leader",
})
}
if err := raftNode.Apply(b, 500*time.Millisecond).Error(); err != nil {
return c.Status(fiber.StatusInternalServerError).JSON(fiber.Map{
"error": "Failed to delete key-value pair",
})
}
return c.SendStatus(fiber.StatusNoContent)
})
if err := app.Listen(serverAddr); err != nil {
panic(err)
}
}
Now, the main function creates a Raft node and bootstraps the Raft cluster by providing cluster information. One important thing is that the cluster has a single leader that can accept write requests. Other nodes are read-only.
Now let’s see the changes in compose file:
version: '3.9'
services:
app1:
restart: always
build:
context: .
dockerfile: Dockerfile
environment:
RAFT_PATH: /app/data
RAFT_ADDR: app1:9000
RAFT_ID: node1
RAFT_NODE_ADDRS: node1@app1:9000,node2@app2:9000,node3@app3:9000
SERVER_ADDR: :4000
expose:
- '4000'
- '9000'
volumes:
- app1_data:/app/data
app2:
restart: always
build:
context: .
dockerfile: Dockerfile
environment:
RAFT_PATH: /app/data
RAFT_ADDR: app2:9000
RAFT_ID: node2
RAFT_NODE_ADDRS: node1@app1:9000,node2@app2:9000,node3@app3:9000
SERVER_ADDR: :4000
expose:
- '4000'
- '9000'
volumes:
- app2_data:/app/data
app3:
restart: always
build:
context: .
dockerfile: Dockerfile
environment:
RAFT_PATH: /app/data
RAFT_ADDR: app3:9000
RAFT_ID: node3
RAFT_NODE_ADDRS: node1@app1:9000,node2@app2:9000,node3@app3:9000
SERVER_ADDR: :4000
expose:
- '4000'
- '9000'
volumes:
- app3_data:/app/data
haproxy:
image: haproxy
volumes:
- ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
ports:
- '8081:80'
depends_on:
- app1
- app2
- app3
volumes:
app1_data:
app2_data:
app3_data:
This time every application node comes with next configuration:
RAFT_PATH
— storage path to save logs and snapshots generated by Raft;RAFT_ADDR
— address that will be used by Raft protocol;RAFT_ID
— node identifier;RAFT_NODE_ADDRS
— other nodes in the cluster;SERVER_ADDR
— HTTP address
As well, each node needs a volume to store its state.
A node should know other nodes to join to start the cluster. So they are provided with RAFT_NODE_ADDRS
variable. After the node joins the cluster, the election begins, and the cluster votes for the leader. After the leader is elected, the cluster can accept incoming requests.
But this time, HAProxy should send write requests only to the leader. To do it, I’ve extended the application API with one endpoint — GET /leader
. It returns OK
if the current node is the leader node or Service Unavailable
if the node is a follower.
Now let’s update HAProxy configuration:
defaults
mode http
timeout connect 5000
timeout client 50000
timeout server 50000
frontend http-in
bind *:80
acl write_methods method POST DELETE PUT
use_backend app-write if write_methods
default_backend app-read-only
backend app-read-only
balance roundrobin
option httpchk GET /health
server app1 app1:4000 check
server app2 app2:4000 check
server app3 app3:4000 check
backend app-write
balance roundrobin
option httpchk GET /leader
server app1 app1:4000 check
server app2 app2:4000 check
server app3 app3:4000 check
This time there are two backends:
app-read-only
— used for read requests. It usesGET /health
check to balance the traffic. Ideally, all nodes can accept incoming traffic;app-write
— used for write requests. It usesGET /leader
check to balance traffic. Only leader node can accept incoming traffic.
Let’s start the cluster with next command:
docker compose up -d
If you check the logs of every node, you probably will see that one of the nodes is Leader:
go-stateful-service-app1-1 | 2023-09-21T15:20:44.673Z [WARN] raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
go-stateful-service-app1-1 | 2023-09-21T15:20:44.673Z [INFO] raft: entering candidate state: node="Node at 172.19.0.2:9000 [Candidate]" term=3
go-stateful-service-app1-1 | 2023-09-21T15:20:44.721Z [DEBUG] raft: voting for self: term=3 id=node1
go-stateful-service-app1-1 | 2023-09-21T15:20:44.780Z [DEBUG] raft: asking for vote: term=3 from=node2 address=app2:9000
go-stateful-service-app1-1 | 2023-09-21T15:20:44.780Z [DEBUG] raft: asking for vote: term=3 from=node3 address=app3:9000
go-stateful-service-app1-1 | 2023-09-21T15:20:44.780Z [DEBUG] raft: calculated votes needed: needed=2 term=3
go-stateful-service-app1-1 | 2023-09-21T15:20:44.780Z [INFO] raft: duplicate requestVote for same term: term=3
go-stateful-service-app1-1 | 2023-09-21T15:20:44.780Z [DEBUG] raft: vote granted: from=node1 term=3 tally=1
go-stateful-service-app1-1 | 2023-09-21T15:20:44.893Z [DEBUG] raft: vote granted: from=node3 term=3 tally=2
go-stateful-service-app1-1 | 2023-09-21T15:20:44.893Z [INFO] raft: election won: term=3 tally=2
go-stateful-service-app1-1 | 2023-09-21T15:20:44.893Z [INFO] raft: entering leader state: leader="Node at 172.19.0.2:9000 [Leader]"
go-stateful-service-app1-1 | 2023-09-21T15:20:44.893Z [INFO] raft: added peer, starting replication: peer=node2
go-stateful-service-app1-1 | 2023-09-21T15:20:44.893Z [INFO] raft: added peer, starting replication: peer=node3
go-stateful-service-app1-1 | 2023-09-21T15:20:44.894Z [INFO] raft: pipelining replication: peer="{Voter node3 app3:9000}"
go-stateful-service-app1-1 | 2023-09-21T15:20:44.915Z [INFO] raft: pipelining replication: peer="{Voter node2 app2:9000}"
Another nodes will be followers:
go-stateful-service-app2-1 | 2023-09-21T15:20:44.673Z [WARN] raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
go-stateful-service-app2-1 | 2023-09-21T15:20:44.674Z [INFO] raft: entering candidate state: node="Node at 172.19.0.3:9000 [Candidate]" term=3
go-stateful-service-app2-1 | 2023-09-21T15:20:44.721Z [DEBUG] raft: asking for vote: term=3 from=node1 address=app1:9000
go-stateful-service-app2-1 | 2023-09-21T15:20:44.721Z [DEBUG] raft: voting for self: term=3 id=node2
go-stateful-service-app2-1 | 2023-09-21T15:20:44.809Z [DEBUG] raft: asking for vote: term=3 from=node3 address=app3:9000
go-stateful-service-app2-1 | 2023-09-21T15:20:44.810Z [DEBUG] raft: calculated votes needed: needed=2 term=3
go-stateful-service-app2-1 | 2023-09-21T15:20:44.810Z [INFO] raft: duplicate requestVote for same term: term=3
go-stateful-service-app2-1 | 2023-09-21T15:20:44.810Z [DEBUG] raft: vote granted: from=node2 term=3 tally=1
go-stateful-service-app2-1 | 2023-09-21T15:20:44.915Z [INFO] raft: entering follower state: follower="Node at 172.19.0.3:9000 [Follower]" leader-address=172.19.0.2:9000 leader-id=node1
In my case, app1
is leader so only this node can accept write requests. Let’s try to stop it by performing next command:
docker compose stop app1
In few seconds, new node will be promoted as a leader. In my case it’s app2
:
go-stateful-service-app2-1 | 2023-09-21T15:23:57.800Z [WARN] raft: heartbeat timeout reached, starting election: last-leader-addr=172.19.0.2:9000 last-leader-id=node1
go-stateful-service-app2-1 | 2023-09-21T15:23:57.800Z [INFO] raft: entering candidate state: node="Node at 172.19.0.3:9000 [Candidate]" term=4
go-stateful-service-app2-1 | 2023-09-21T15:23:57.817Z [DEBUG] raft: asking for vote: term=4 from=node1 address=app1:9000
go-stateful-service-app2-1 | 2023-09-21T15:23:57.817Z [DEBUG] raft: voting for self: term=4 id=node2
go-stateful-service-app2-1 | 2023-09-21T15:23:57.818Z [ERROR] raft: failed to make requestVote RPC: target="{Voter node1 app1:9000}" error=EOF term=4
go-stateful-service-app2-1 | 2023-09-21T15:23:57.849Z [DEBUG] raft: asking for vote: term=4 from=node3 address=app3:9000
go-stateful-service-app2-1 | 2023-09-21T15:23:57.849Z [DEBUG] raft: calculated votes needed: needed=2 term=4
go-stateful-service-app2-1 | 2023-09-21T15:23:57.849Z [DEBUG] raft: vote granted: from=node2 term=4 tally=1
go-stateful-service-app2-1 | 2023-09-21T15:23:58.066Z [INFO] raft: duplicate requestVote for same term: term=4
go-stateful-service-app2-1 | 2023-09-21T15:24:00.498Z [WARN] raft: Election timeout reached, restarting election
go-stateful-service-app2-1 | 2023-09-21T15:24:00.498Z [INFO] raft: entering candidate state: node="Node at 172.19.0.3:9000 [Candidate]" term=5
go-stateful-service-app2-1 | 2023-09-21T15:24:00.515Z [DEBUG] raft: asking for vote: term=5 from=node1 address=app1:9000
go-stateful-service-app2-1 | 2023-09-21T15:24:00.515Z [DEBUG] raft: voting for self: term=5 id=node2
go-stateful-service-app2-1 | 2023-09-21T15:24:00.551Z [DEBUG] raft: asking for vote: term=5 from=node3 address=app3:9000
go-stateful-service-app2-1 | 2023-09-21T15:24:00.552Z [DEBUG] raft: calculated votes needed: needed=2 term=5
go-stateful-service-app2-1 | 2023-09-21T15:24:00.552Z [DEBUG] raft: vote granted: from=node2 term=5 tally=1
go-stateful-service-app2-1 | 2023-09-21T15:24:00.655Z [DEBUG] raft: vote granted: from=node3 term=5 tally=2
go-stateful-service-app2-1 | 2023-09-21T15:24:00.655Z [INFO] raft: election won: term=5 tally=2
go-stateful-service-app2-1 | 2023-09-21T15:24:00.655Z [INFO] raft: entering leader state: leader="Node at 172.19.0.3:9000 [Leader]"
go-stateful-service-app2-1 | 2023-09-21T15:24:00.655Z [INFO] raft: added peer, starting replication: peer=node1
go-stateful-service-app2-1 | 2023-09-21T15:24:00.655Z [INFO] raft: added peer, starting replication: peer=node3
go-stateful-service-app2-1 | 2023-09-21T15:24:00.655Z [INFO] raft: pipelining replication: peer="{Voter node3 app3:9000}"
Now, if you start app1
, it can be promoted as a follower:
go-stateful-service-app1-1 | 2023-09-21T15:25:37.852Z [INFO] raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:node1 Address:app1:9000} {Suffrage:Voter ID:node2 Address:app2:9000} {Suffrage:Voter ID:node3 Address:app3:9000}]"
go-stateful-service-app1-1 | 2023-09-21T15:25:37.853Z [INFO] raft: entering follower state: follower="Node at 172.19.0.2:9000 [Follower]" leader-address= leader-id=
go-stateful-service-app1-1 | 2023/09/21 15:25:37 bootstrap only works on new clusters
go-stateful-service-app1-1 | 2023-09-21T15:25:38.471Z [WARN] raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
go-stateful-service-app1-1 | 2023-09-21T15:25:38.471Z [INFO] raft: entering candidate state: node="Node at 172.19.0.2:9000 [Candidate]" term=4
go-stateful-service-app1-1 | 2023-09-21T15:25:38.479Z [DEBUG] raft: voting for self: term=4 id=node1
go-stateful-service-app1-1 | 2023-09-21T15:25:38.481Z [DEBUG] raft: asking for vote: term=4 from=node2 address=app2:9000
go-stateful-service-app1-1 | 2023-09-21T15:25:38.481Z [DEBUG] raft: asking for vote: term=4 from=node3 address=app3:9000
go-stateful-service-app1-1 | 2023-09-21T15:25:38.481Z [DEBUG] raft: calculated votes needed: needed=2 term=4
go-stateful-service-app1-1 | 2023-09-21T15:25:38.481Z [DEBUG] raft: vote granted: from=node1 term=4 tally=1
go-stateful-service-app1-1 | 2023-09-21T15:25:38.483Z [DEBUG] raft: newer term discovered, fallback to follower: term=5
go-stateful-service-app1-1 | 2023-09-21T15:25:38.486Z [INFO] raft: entering follower state: follower="Node at 172.19.0.2:9000 [Follower]" leader-address=172.19.0.3:9000 leader-id=node2
HAProxy changes the write-only backend traffic to the leader each time the leader changes by performing a GET /leader
request.
All HTTP API methods work the same as for stateless service from the user’s perspective. Under the hood, the Raft algorithm replicates data across the cluster.
As you can see, it’s still possible to horizontally scale stateful service, but this time, you need to think about a lot of details carefully.
Application Code
You can find it here, don’t forget to star the repository!
Conclusion
The ability to scale web applications is one of the most important non-functional requirements nowadays. It affects the availability and performance of the application.
If an application can be scaled only vertically, it’s not enough to call the application “scalable”. Web applications should be scaled horizontally as well.
As you can see, there are different approaches to horizontal scale, and there are a lot of details you need to consider. It’s always possible to scale out a stateless application, so do not store any state inside the application if possible.
For modern HTTP API, it’s preferable to use external databases to store the state instead of storing it inside the application. But if you are developing a database, you probably don’t want to use an external one and need to think about many details when developing modern distributed applications.