From df43af8900f756e72b53601b555ee2eafc9de97a Mon Sep 17 00:00:00 2001
From: asahi <mikiyashiki@outlook.com>
Date: Mon, 22 Sep 2025 14:16:28 +0800
Subject: [PATCH] =?UTF-8?q?doc:=20=E9=98=85=E8=AF=BBbloom=20filter?=
 =?UTF-8?q?=E7=9B=B8=E5=85=B3=E6=96=87=E6=A1=A3?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 中间件/redis/redis.md | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/中间件/redis/redis.md b/中间件/redis/redis.md
index 53f669c..b9e9196 100644
--- a/中间件/redis/redis.md
+++ b/中间件/redis/redis.md
@@ -139,6 +139,8 @@
       - [Bloom Filter](#bloom-filter)
         - [Example](#example-1)
         - [Reserving Bloom filters](#reserving-bloom-filters)
+        - [total size of bloom filter](#total-size-of-bloom-filter)
+        - [Performance](#performance)
 
 
 # redis
@@ -2394,4 +2396,26 @@ BF.RESERVE {key} {error_rate} {capacity} [EXPANSION expansion] [NONSCALING]
   - 默认的EXPANSION为2
   > 在向filter中添加new sub-filter时，相比于前一个filter，会为new sub-filter分配更多的hash function
 
-- `NONSCALING`: 如果想要禁用scale，可以指定`NONSCALING`。如果达到了initially assigned 
\ No newline at end of file
+- `NONSCALING`: 如果想要禁用scale，可以指定`NONSCALING`。如果达到了initially assigned capacity，error rate将会开始增加。
+
+##### total size of bloom filter
+bloom filter实际使用的内存大小是根据`a function of choosen error rate`来决定的：
+- hash functions的最佳数量为`ceil(-ln(error_rate) / ln(2))`
+  - 即要求的error_rate越小，hash function的数量应该越多
+- 在给定期望error_rate和最有hash functions数量的前提下，对每个items需要的bits个数为` -ln(error_rate) / ln(2)^2`
+- 故而，bloom filter需要的bits数量为`capacity * -ln(error_rate) / ln(2)^2`
+  - 在`1%` error rate的前提下，需要`7`个hash functions，每个item需要`9.585`bits
+  - 在`0.1%` error rate的前提下，需要`10`个hash functions，每个item需要`14.378`bits
+  - 在`0.01%` error rate的前提下，需要`14`个hash fucntions，每个item需要`19.170`bits
+
+而相比于bloom filter，使用`redis set`来membership testing时，需要耗费的内存大小为
+```
+memory_with_sets = capacity*(192b + value)
+```
+对于ip地址，每个item大概需要在`40 bytes`（320bits），而在使用error rate为`0.01%`的bloom filter时，每个item仅需`19.170`bits
+
+##### Performance
+向bloom filter执行插入操作的时间复杂度为`O(K)`，其中`K`为hash functions的数量。
+
+对bloom filter执行存在性检查的时间复杂度为`O(K)`或`O(K*n)`（针对stacked filters场景），`n`为stacked filters的数量
+