doc: 阅读redis cluster文档

2025-09-28 10:57:52 +08:00
parent c478a97a99
commit 67818df180
1 changed files with 41 additions and 0 deletions
--- a/中间件/redis/redis.md
+++ b/中间件/redis/redis.md
@@ -196,6 +196,9 @@
      - [Redis Cluster Goals](#redis-cluster-goals)
      - [Implemented subset](#implemented-subset)
    - [Client and Server roles in the Redis cluster protocol](#client-and-server-roles-in-the-redis-cluster-protocol)
+      - [write safety](#write-safety)
+        - [connected to the majority of masters](#connected-to-the-majority-of-masters)
+        - [connected to the minority of masters](#connected-to-the-minority-of-masters)


 # redis
@@ -3201,3 +3204,41 @@ Redis Cluster并不像standalone版本的redis一样支持multiple databases，

 cluster中所有的nodes都会通过TCP bus和名为`Redis Cluster Bus`的二进制协议来连结。其中，`TCP bus`是一种逻辑上的结构，代表cluster nodes中任两个nodes之间都会相互连接。

+nodes使用gossip protocol来传播关于cluster的信息，可用于`发现新节点、发送ping packet确保所有的其他节点状态正常、指定条件下发送的消息`。`在集群间传播Pub/Sub消息、在用户请求时手动failover`也会使用到cluster bus。
+
+因为cluster nodes不能代理请求，故client可通过`redirection errors(MOVED, ASK)`被重定向到其他节点。client理论上可以对集群中的任何节点发送请求，并在需要时重定向到其他节点，`故client无需持有集群的状态`。但是，client可以缓存keys和nodes的关系，从而提升性能。
+
+#### write safety
+redis cluster在nodes之间使用了asynchronous replication，并使用了`last failover wins`的`implicit merge function`。
+
+> `last failover wins implicit merge function`代表在集群中，如果某节点被选举为新的主节点，那么该选举出节点的数据就称为了该分片的权威数据，分片中所有其他replicas都会从新选举的节点中复制数据并覆盖自己原先数据，从而保证failover后所有replicas和master node数据的一致性。
+
+在发生`network partition`时，总会存在一个window of time，在该时间范围内部分写操作可能会丢失。但是，在不同场景下，window可能会有所不同，这取决于：
+- client is connected to the majority of masters
+- client is connected to the minority of masters
+
+##### connected to the majority of masters
+Redis Cluster `tries harder` to retain writes that are performed by clients connected to the majority of masters, compared to writes performed in the minority side.
+
+> 在上述描述中，`tries harder`是由redis cluster的failover选举机制决定的，相比于`minority of master`的写操作会直接丢失，针对`majority of masters`的写操作绝大部分会被保留，基于`last failover wins`的策略。
+
+如下示例展示了`loss of acknowledged writes received in the majority partitions during failures`的场景：
+- 写操作可能已经发送给master，且mater响应了该写请求，但是该写操作尚未通过asynchronous replication被同步到replica nodes。如果在该时刻master发生故障，那么该写操作将永远丢失。
+- 另一种理论上可能的`write loss`模式如下：
+  - master因为`network partition`而不可被访问
+  - master的其中一个replica发生failover
+  - 过了一段时间后master又重新可访问
+  - `A client with an out-of-date routing table`可能会向old master发送写请求，如果此时`old master尚未被转化为cluster的replica`，那么该写操作又可能会被丢失
+
+上述描述中，第二种场景不太可能发生，因为在`master nodes unable to communicate with the majority of the other masters for enough time to be failed over`(`for enough time to be failed over`代表无法通信的时间已经达到触发failover的时间)的前提下，master ndoes将无法接收写请求，并且即使在`network partition`已经被修复的场景下，在一段时间内写请求仍然会被拒绝，用以其他节点通知该master节点集群状态的变化。除此之外，该场景还需要client的routing table尚未被更新。
+
+##### connected to the minority of masters
+针对`minority side of a partition`的写请求将会又有更大的`丢失写操作窗口`。例如，如果`minority of masters拥有一个或多个clients`，redis cluster将会在`network partition`期间丢失相当多数量的写操作，因为在`the masters are failed over in the majority side`时，所有发送到`minority of masters`的写请求都会丢失。
+
+具体而言，如果master要发生fail over，那么其必须至少在`NODE_TIMEOUT`时间范围内无法被`majority of masters`访问
+- 如果network partition在该时间限制前恢复，并不会丢失任何写请求
+- 如果`network partition`的持续时间超过`NODE_TIMEOUT`，那么所有针对`minority side`的写操作（直到达到NODE_TIMEOUT）都会被丢失
+  - 因为minority side在`NODE_TIMEOUT`达到后，如果仍然无法连接majority，将拒绝接收写请求，故而minority不可访问后window存在上限
+
+
+