ElastiCache client connection handling

Client connection handling suggestion: Connection Pooling, Pipelining, Timeout and Backoff retry setting. 說明前端應用在處理連線的建議、及客戶常見的問題。

Jerry’s Notes

Published in

What’s next?

7 min readMar 20, 2022

Connection pooling

Redis Clients Handling: https://redis.io/topics/clients

Connection pooling means that connections are reused rather than created each time a connection is requested. To facilitate connection reuse, a memory cache of server connections, called a connection pool, is maintained by a connection pooling module in the supported client application. Connection pools create a set of connections which you can use as needed (and when done — the connection is returned to the connection pool for further reuse).
Connection Pooling的原理，通過預先創建多個連接，當進行Redis操作時，直接獲取已經創建的連接進行操作，而且操作完成後，不會釋放，用於後續的其他Redis操作，這樣就達到了避免頻繁的Redis連接建立、釋放連接的開銷，從而提高性能。

Pipelining

Using pipelining to speedup Redis queries: https://redis.io/topics/pipelining

It is a technique, using this it is possible for a client application to send multiple commands to the server without waiting for the replies at all, and finally read the replies in a single step. Reduce the latency cost due to the round-trip time. Improves by a huge amount the total operations you can perform per second in a given Redis server. When pipelining is used, many commands are usually read with a single read() system call, and multiple replies are delivered with a single write() system call. Because of this, the number of total queries performed per second initially increases almost up to 10 times in redis server compared with not using pipelining.
Redis默認每次執行請求都會創建和斷開一次連接池的操作，如果想執行多條命令的時候會在這件事情上消耗過多的時間，因此我們可以使用Redis的管道來一次性發送多條命令並返回多個結果，節約發送命令和創建連接的時間提升效率。
Redis Pipelining 可以在服務端未響應時，客戶端可以繼續向服務端發送請求，並最終一次性讀取所有服務端的響應。Pipeline 在某些場景下非常有用，比如有多個 command 需要被“及時的”提交，而且他們對相應結果沒有互相依賴，對結果響應也無需立即獲得，那麼 pipeline 就可以充當這種“批處理”的工具；而且在一定程度上，可以較大的提升性能，性能提升的原因主要是 TCP 連接中減少了“交互往返”的時間。

使用上的重點!

1) 每個指令之前，需要確認命令是否有先後關係的相依性，有的話，不建議使用。
2) 主要是減少 TCP 連接的交互往返時間 -> 對跨AZ延遲會有改善，或是AZ之間短暫延遲有減緩的機會。
3) 大量的使用 Pipelining，不會有等比級數的效能改善，請記得 Redis 是單執行序服務，處理命令還是一個一個來執行。所以使用 Pipelining 大量傳送命令，100筆不一定比 50筆快，也不一定比10筆快。

Q: Why should a customer use pipelining? Advantages?

Able to process new requests even if the client didn’t already read the old responses. This way it is possible to send multiple commands to the server without waiting for the replies at all, and finally read the replies in a single step.

Timeout setting

不建議小於2秒鐘，因為實例設備及網路會有抖動或掉封包的機會，所以過低的 Timeout setting 反而容易造成問題，適當的 Timeout setting 值，搭配合理的 Re-try setting 是比較建議的。

Backoff Re-try setting

Error retries and exponential backoff in AWS — https://docs.aws.amazon.com/general/latest/gr/api-retries.html

The backoff algorithm. The idea behind exponential backoff is to use progressively longer waits between retries for consecutive error responses. You should implement a maximum delay interval, as well as a maximum number of retries. The maximum delay interval and maximTum number of retries are not necessarily fixed values, and should be set based on the operation being performed, as well as other local factors, such as network latency.
指數退避(Backoff)，例如，10秒後第1次的錯誤重試，而第2次以後就變成12秒，第3次15秒的方式，來做指數退避(Backoff)的錯誤重試，這樣可以減緩當網路恢復後，同一時間所有的前端，都對後端發出連線的請求，造成後端 Redis 伺服器端的連線風暴(connection storm)。

實際測試 — Connection pooling

■ Using connection pooling can increase requests per second, and also reduce latency.
■ 使用 connection pooling -k 1 可以確認，每秒可以處理的命令數變多了，而latency反而更低。
■Target ElastiCache Redis: cache.r6g.large
■Testing EC2 Client: r6g.large

測試記錄

Test1: No connection Pooling | No Pipelining

$ redis-benchmark -h xxx.cache.amazonaws.com -n 10000 -t set,get -k 0 -P 1
SET: throughput summary: 569.31 requests per second
latency summary (msec):
avg min p50 p95 p99 max
47.441 0.576 3.495 9.231 2007.039 2010.111
GET: throughput summary: 562.68 requests per second
latency summary (msec):
avg min p50 p95 p99 max
46.528 0.576 3.687 10.071 2008.063 2012.159

Test2: Had connection Pooling | No Pipelining

$ redis-benchmark -h xxx.cache.amazonaws.com -n 10000 -t set,get -k 1 -P 1
SET: throughput summary: 67567.57 requests per second←明顯的提升
latency summary (msec):
avg min p50 p95 p99 max
0.700 0.560 0.687 0.863 0.927 1.271
GET: throughput summary: 66666.66 requests per second←明顯的提升
latency summary (msec):
avg min p50 p95 p99 max
0.711 0.552 0.695 0.887 1.007 1.215

實際測試 — Pipelining

■ Using Pipelining and connection pooling that increase requests per second. However, latency also got increased. 使用 Pipelining -P 16: 可以明顯看到，每秒可以處理的命令數變多了，但latency也變高了。(此時已經使用 connection pooling -k 1)
■ Using Pipelining with too much requests (-P 256 or -P 4096) at the same time that total requests per second didn’t got increase. However, Pipelining with -P 256 or -P 4096 that latency will high than Pipelining with -P 16. 使用 Pipelining 同時並發過多時(-P 256 or -P 4096)，反而會造成latency變高，但效能並沒有提升。

測試記錄

■Target ElastiCache Redis: cache.r6g.large
■Testing EC2 Client: r6g.large

Test3: Had connection Pooling | Pipelining=16

$ redis-benchmark -h xxx.cache.amazonaws.com -n 10000 -t set,get -k 1 -P 16
SET: throughput summary: 714285.69 requests per second
latency summary (msec):
avg min p50 p95 p99 max
1.044 0.624 1.015 1.511 1.759 1.815
GET: throughput summary: 909090.94 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.841 0.584 0.815 1.119 1.375 1.479

■ Using Pipelining and connection pooling that increase requests per second. However, latency also got increased.
■ 使用 Pipelining -P 16: 可以明顯看到，每秒可以處理的命令數變多了，但latency也變高了。(此時已經使用 connection pooling -k 1)

Test4: Had connection Pooling | Pipelining=256

$ redis-benchmark -h xxx.cache.amazonaws.com -n 10000 -t set,get -k 1 -P 256
SET: throughput summary: 731428.56 requests per second
latency summary (msec):
avg min p50 p95 p99 max
12.199 0.976 12.455 13.127 13.191 13.247
GET: throughput summary: 1024000.00 requests per second
latency summary (msec):
avg min p50 p95 p99 max
7.431 0.824 6.983 9.511 9.551 9.551

■ Using Pipelining with too much requests (-P 256 or -P 4096) at the same time that total requests per second didn’t got increase. However, Pipelining with -P 256 or -P 4096 that latency will high than Pipelining with -P 16.
■ 使用 Pipelining -P 256: 每秒可以處理的命令數跟 -P 16 是接近的，但latency也變更高了。(此時已經使用 connection pooling -k 1)

Test5: Had connection Pooling | Pipelining=4096

$ redis-benchmark -h xxx.cache.amazonaws.com -n 10000 -t set,get -k 1 -P 4096
SET: throughput summary: 864571.38 requests per second
latency summary (msec):
avg min p50 p95 p99 max
1.649 0.904 1.951 2.143 2.143 2.143
GET: throughput summary: 1075636.38 requests per second
latency summary (msec):
avg min p50 p95 p99 max
1.703 1.064 1.991 2.095 2.095 2.095

■ Using Pipelining with too much requests (-P 256 or -P 4096) at the same time that total requests per second didn’t got increase. However, Pipelining with -P 256 or -P 4096 that latency will high than Pipelining with -P 16.
■ 使用 Pipelining -P 4096: 每秒可以處理的命令數跟 -P 16 是接近的，但latency還是比 -P 16 高。(此時已經使用 connection pooling -k 1)

實際測試 — Connection pooling (Python 代碼)

### 測試結果 ###
■ 行時間 : 4m23.798s → 1m52.722s (大幅減少)
■ NewConnection 大幅減少
■ EngineCPUUtlization 負載差不多，但執行時間的時間大幅減少。### 沒有使用 Connection Pooling ###
$ time python redis-no-connection-pool-single.py
real    4m23.798s
user    0m52.587s
sys    1m11.068s
---$ cat redis-no-connection-pool-single.py
import redis
import random
import stringdef id_generator(size=6, chars=string.ascii_uppercase + string.digits):
    return ''.join(random.choice(chars) for _ in range(size))def connect():
        r = redis.Redis (host='xxx.usw2.cache.amazonaws.com', port=6379, db=0)
        return rdef task():
        KeyId=id_generator(10)
        KeyValue=id_generator(20)
        r = connect()
        r.set(KeyId, KeyValue, 7200)
        r.get(KeyId)logging.basicConfig(level=logging.INFO)for i in range (0, 100000):
    task()### 有使用 connection Pooling ###
$ time python redis-connection-pool-single.py
INFO:root:Connected to Redisreal    1m52.722s
user    0m37.186s
sys    0m3.741s
---$ cat redis-connection-pool-single.py
import redis
import random
import stringdef id_generator(size=6, chars=string.ascii_uppercase + string.digits):
    return ''.join(random.choice(chars) for _ in range(size))def connect():
        redis_pool = redis.ConnectionPool(max_connections=5, host='xxx.usw2.cache.amazonaws.com', port=6379, db=0)
        return redis_pooldef task(c):
        redis_conn = redis.Redis(connection_pool=c)
        KeyId=id_generator(10)
        KeyValue=id_generator(20)
        redis_conn.set(KeyId, KeyValue, 7200)
        redis_conn.get(KeyId)r = connect()logging.info("Connected to Redis")for i in range (0, 100000):
    task(r)

常見問題?

Q: 為什麼我們在不同 AZ的應用程序，訪問相同 AZ 中的 ElastiCache Redis 會有不同延遲的？

A: 不同可用區(AZ)之間，因為物理區域的關係，會有延遲上的差異，以下為參考值。
Client (AZ-A) -> Redis (AZ-A) Latency : 1ms
Client (AZ-B) -> Redis (AZ-A) Latency : 3ms+

Q: 我們的應用程序客戶端有 Redis 的連接延遲，這裡有什麼建議給我們嗎？

A: 建議使用1) Connection Pooling、2) Pipelining、3) 放置在同一個AZ．

Q: 如果新增連接數 (NewConnections) 非常高，如何減少新連接？

A: 建議使用 Connection Pooling。

Q: 節點是否有最高連接數(maxclients)的上限?

A: YES。有 ElastiCache 有節點的最高連接數(maxclients) 65000 限制，而過多的連接數很容易造成服務延遲，故建議使用 Connection Pooling，並且避免過多的連線。
[] Redis-specific parameters — Redis 2.6.13 parameters — https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/ParameterGroups.Redis.html#ParameterGroups.Redis.2-6-13
maxclients This value applies to all instance types except those explicity specified
Default: 65000

Q: 連線是否會被服務器端主動中斷 (tcp-keepalive)?

No. ElastiCache 預設不會主動中斷客戶端連線。
[] Redis-specific parameters — Redis 2.6.13 parameters — https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/ParameterGroups.Redis.html#ParameterGroups.Redis.2-6-13
tcp-keepalive
Default: 0
If this is set to a nonzero value (N), node clients are polled every N seconds to ensure that they are still connected. With the default setting of 0, no such polling occurs.

Q: 如何Kill idle connections to Redis?

redis 127.0.0.1:6379> CLIENT LIST
id=9115408 addr=10.0.1.252:56348 fd=95 name= age=23 idle=23 flags=N db=14 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=get
id=9115336 addr=10.0.2.151:52018 fd=37 name= age=107 idle=107 flags=N db=14 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=get$ sudo pip install redis$ cat killredisconn.py
----
import redis
import reidle_max = 300r = redis.Redis(host="localhost", port=6379, password=None)
cl = r.execute_command("client", "list")pattern = r"addr=(.*?) .*? idle=(\d*)"
regex = re.compile(pattern)
for match in regex.finditer(cl):
    if int(match.group(2)) > idle_max:
        r.execute_command("client", "kill", match.group(1))
----$ python killredisconn.py

Q: 跨 VPC 的連通方式?

必要條件:
兩個 VPC 位於同一個地區。
方案一:
使用 VPC Peering 或 Transit Gateway 其中一中方式將兩個 VPC 聯通，來使得您的 Redis 連接封包，可以從實例所在的 VPC-A 發送到 VPC-B [+]。
[+] 用於存取 Amazon VPC 中 ElastiCache 叢集的存取模式 — 存取與 Amazon EC2 執行個體位於相同 Amazon VPC 中的 ElastiCache 叢集:
https://docs.aws.amazon.com/zh_tw/AmazonElastiCache/latest/red-ug/elasticache-vpc-accessing.html#elasticache-vpc-accessing-same-vpc
方案二:
可以通過 AWS Private Link 搭配 NLB 的方式訪問 [+]，但此一方式您需要額外配置 NLB 及 Lambda，來讓您的 AWS 內部服務，例如 RDS 在有實例縮容 / 擴容 / 節點替換的時候能自動的更新上對應的節點 IP。
[+] AWS Blog — Access Amazon RDS across VPCs using AWS PrivateLink and Network Load Balancer:
https://aws.amazon.com/tw/blogs/database/access-amazon-rds-across-vpcs-using-aws-privatelink-and-network-load-balancer/
!!! 注意 !!!
第二個方案，只適用於 “單節點” 的架構，並且使用 VPC Endpoint，還必需搭配 AWS Private Link，故第二個方案目前並不推薦。

延伸閱讀 (Reference)

ElastiCache client connection handling

Client connection handling suggestion: Connection Pooling, Pipelining, Timeout and Backoff retry setting. 說明前端應用在處理連線的建議、及客戶常見的問題。

Connection pooling

Pipelining

使用上的重點!

Timeout setting

Backoff Re-try setting

實際測試 — Connection pooling

測試記錄

實際測試 — Pipelining

測試記錄

實際測試 — Connection pooling (Python 代碼)

常見問題?

Written by Jerry’s Notes