doc: 阅读redis cluster文档

2025-09-28 16:30:47 +08:00
parent e9a8e69327
commit 97c4a830a9
1 changed files with 129 additions and 2 deletions
--- a/中间件/redis/redis.md
+++ b/中间件/redis/redis.md
@@ -200,6 +200,18 @@
        - [connected to the majority of masters](#connected-to-the-majority-of-masters)
        - [connected to the minority of masters](#connected-to-the-minority-of-masters)
      - [Availability](#availability)
+      - [Performance](#performance-1)
+      - [Why merge operations are avoided](#why-merge-operations-are-avoided)
+    - [Overview of Redis Cluster main components](#overview-of-redis-cluster-main-components)
+      - [key distribution model](#key-distribution-model)
+      - [hash tags](#hash-tags)
+        - [global-style patterns](#global-style-patterns)
+      - [Cluster node attributes](#cluster-node-attributes)
+        - [node ID](#node-id)
+        - [node Attributes](#node-attributes)
+        - [CLUSTER NODES](#cluster-nodes)
+      - [Cluster bus](#cluster-bus)
+      - [Cluster topology](#cluster-topology)


 # redis
@@ -3255,7 +3267,122 @@ redis cluster在`minority side of partition`方将不可用。在`majority side

 redis cluster目前存在`replicas migration`的特性，replicas将会迁移到`orphaned masters`（masters no longer having replicas），这将在很多场景下提高redis cluster的可访问性。故而，每次正常处理failure event后，cluster都会重新调整replicas layout，以更高的抵御下一次failure。

-
-
+#### Performance 
+在redis cluster nodes中，并不会将commands代理到正确的节点，而是通过将client重定向到正确的节点。
+
+最终，clients会获取cluster的最新状态，状态中记录了每个node处理那些keys集合，故而在正常操作时client能够直接连接正确的节点。
+
+因为redis cluster使用了asynchronous replication，向node执行写操作时，操作返回前node并不会等待其他nodes的ack。
+
+并且，在redis cluster环境下，multi-key操作只允许keys被hash到同一个slot时可用。触发发生resharding，否则数据绝不会在节点之间进行移动。
+
+redis cluster的常规操完全和单个redis实例相同，即在`Redis Cluster with N master nodes`的场景下，其理论性能大致为`cluster_with_n_nodes_performance = single_instance_performance * N`，故而redis cluster的性能是可线性拓展的。
+
+并且，对redis cluster的查询操作也通过在一个`round trip`中完成，而clients通常会维持和ndoes的长连接，故而redis cluster操作的延迟和single instance场景下的延迟也大致相同。
+
+Redis Cluster的目标如下：`在保证高性能和高可拓展性的前提下，使用弱但是合理的data safety和可用性`。
+
+#### Why merge operations are avoided
+Redis Cluster设计避免了多个nodes针对相同的key进行写操作的场景。在redis中，value可能会很大，例如lists或sorted sets可能会包含数百万个元素。并且，redis data type的结构也可能很复杂。在这种场景下，对这些value的转移和合并可能会造成很大的性能瓶颈。故而，redis cluster摈弃了该设计。
+
+### Overview of Redis Cluster main components
+#### key distribution model
+cluster的key space被划分为了16384（2^14）个slots，故而cluster中最多包含`16384`个master nodes（推荐的集群节点最大数量约为1000）。
+
+cluster中的每个master node处理16384中的一个子集。在没有`cluster reconfiguration`正在进行时（即hash slots没有从一个节点移动到另一个节点），cluster处于稳定状态。当cluster处于稳定状态时，一个slot只会对应一个master node。（master node可能含义多个replicas，并在在net splits或failures场景下replica可能会代替master node）。
+
+将keys映射到hash slots的算法如下所示：
+```
+HASH_SLOT = CRC16(key) mod 16384
+```
+
+#### hash tags
+在计算key对应的hash slot时，有一种例外，其用于实现`hash tags`。`hash tags`是用于确保`multiple keys被分配到相同hash slot`的机制。其用于在redis cluster环境下实现multi-key操作。
+
+为了实现hash tags，当key中包含`{}`的pattern时，只有`{`和`}`之间的子串才会被用于hash计算。然而，当key中包含多个`{}`包围的部分时，其规则如下：
+- IF the key contains a `{` character
+- AND IF there is a `}` to the right of `{`
+- AND IF there are one or more characters between the first occurrence of `{` and the following first occurrence of `}` is hashed
+
+具体示例如下：
+- `{user1000}.following`和`{user1000}.followers`都会被hash到相同的slot，二者都根据`user1000`来计算hash slot
+- 对于`foo{}{bar}`，由于其第一个`{`和后续第一个`}`之间并没有包含任何字符，故而`foo{}{bar}`的key整体都会被用于hash计算
+- `foo{{bar}}zap`其子串`{bar`将会被用于计算hash slot，因为第一个`{`和其后第一个`}`间包围的字符串为`{bar`
+- 对于`foo{bar}{zap}`，其子串`bar`将被用于hash slot的计算
+- 在将二进制数据作为key时，可以令key以`{}`开头，其会保证根据整体来计算hash slot
+
+##### global-style patterns
+部分commands接收glob-style pattern，例如`KEYS`, `SCAN`, `SORT`等命令，其针对`patterns that imply a single slot`的场景进行了优化。其代表`if all keys that can match a pattern must belong to a specific slot`，那么在查找匹配该pattern的keys时，只有该slot会被检查。该pattern slot优化在redis 8.0中被引入。
+
+在pattern满足如下条件时，该优化被命中：
+- pattern contains a hash tag
+- there no wildcards or escape characters before the hash tag, and
+- the hashtag within curly braces doesn't contain any wildcards or excape characters
+
+#### Cluster node attributes
+##### node ID
+在cluster中，每个node都有一个unique name。node name由`160 bit`随机数字的`hex`来表示，当node启动时，第一时间就会获取unique name。（通常，会使用/dev/urandom）。node会将ID存储在node的配置文件中，并且会一直对该ID进行重用，直至配置文件被删除或通过`CLUSTER RESET`来强制重置。
+
+node ID用于在整个cluster之间表示每个node，对于cluster中的node，允许变更其IP但是维持其node ID不变。并且，cluster也支持检测IP/port的变更，并且通过gossip协议对cluster进行reconfigure。
+
+##### node Attributes
+对于cluster而言，`node ID`并非是每个node关联的唯一信息，但是其是唯一会保持全局一致的属性。cluster中每个node都包含了一系列关联的信息，一些信息是与`cluster configuration detail of this specific node`相关的，该信息是最终一致的。
+
+- 全局一致属性：node ID在任何时刻都是全局一致的属性，集群中所有节点对node ID的属性认知相同
+- 最终一致属性：如`slot归属，主从关系`的属性，短期内不同节点的认知可能并不一致，但是随着时间的流逝cluster中各节点的认知会趋于相同
+- 本地存储属性：该类属性为节点私有的，仅在单个节点存储，不再nodes间进行共享和同步，例如`本地连接数、最后ping时间`等属性
+
+> 在上述描述中，
+> - `全局一致`代表集群中所有节点`在任何时刻`对相同属性的认知完全一样
+> - `最终一致`代表集群中所有节点`在某些时刻`对相同属性的认知可能会有所不同，但在长期会趋于相同
+> - `instead local to each node`: 代表该属性保存在node本地，不同node的属性并不共享
+
+在cluster中，`every node maintains the following information about other nodes that it is aware of in the cluster`：
+- node的`ID, IP, port`
+- 一系列flags
+- 如果node被标记为replica，记录其master信息
+- last time the node was pinged
+- last time the pong was received
+- current configuration epoch of the node
+- link state
+- the set of hash slots served
+
+##### CLUSTER NODES
+`CLUSTER NODES`命令可以被发送给cluster中的任何节点，用于提供集群的状态以及每个节点的信息。
+
+如下为`CLUSTER NODES`被发送给master node后输出的信息示例：
+```redis-cli
+$ redis-cli cluster nodes
+d1861060fe6a534d42d8a19aeb36600e18785e04 127.0.0.1:6379 myself - 0 1318428930 1 connected 0-1364
+3886e65cc906bfd9b1f7e7bde468726a052d1dae 127.0.0.1:6380 master - 1318428930 1318428931 2 connected 1365-2729
+d289c575dcbc4bdd2931585fd4339089e461a27d 127.0.0.1:6381 master - 1318428931 1318428931 3 connected 2730-4095
+```
+在上述示例中，列出的不同fields按顺序为：
+- node id
+- address:port
+- flags
+- last ping sent
+- last pong received
+- configuration epoch 
+- link state
+- slots
+
+#### Cluster bus
+每个redis cluster node都拥有一个额外的tcp port用于接收`incoming connections from other redis cluster nodes`，该port将会在`data port`的基础上`+10000`获得，其也可以通过cluster-port config来进行指定。
+
+例如，当redis在`6379`端口上监听client connections，那么可以不向`redis.conf`中添加`cluster-port`参数，cluster bus port会被定为`16379`。
+
+同样的，也可以在`redis.conf`中配置`cluster-port`为`20000`，cluster bus port将会被定为`20000`。
+
+`node-to-node`的交流完全通过`cluster bus`和`cluster bus protocol`来进行。`cluster bus protocol`为一个二进制协议，由`frames of different types and sizes`组成。
+
+#### Cluster topology
+redis cluster是一个`full mesh`结构，集群中每个node都和其他所有节点都单独建立了一个TCP连接。
+
+在包含N个节点的cluster中，每个node都有`N-1`个outgoing TCP连接和`N-1`个incoming connections。
+
+这些tcp连接会一致保持活跃，且并非是按需创建的。当一个node期望其他node的pong响应时，在`waiting long enough to mark the node as unreachable`之前，其会尝试对connection进行刷新，从头开始对node进行来reconnecting。(重新连接后会重新发送PING)
+
+redis cluster nodes组成了full mesh，但是node使用了`gossip protocol`和`configuration update mechanism`用于避免`nodes之间的消息交换过于频繁`。故而，消息交换的数量并非指数级增长。