doc: 阅读Cuckoo Filter文档
This commit is contained in:
@@ -143,6 +143,9 @@
|
|||||||
- [Performance](#performance)
|
- [Performance](#performance)
|
||||||
- [Cuckoo filter](#cuckoo-filter)
|
- [Cuckoo filter](#cuckoo-filter)
|
||||||
- [User Cases](#user-cases)
|
- [User Cases](#user-cases)
|
||||||
|
- [Example](#example-2)
|
||||||
|
- [Cuckoo vs Bloom Filter](#cuckoo-vs-bloom-filter)
|
||||||
|
- [Sizing Cuckoo filters](#sizing-cuckoo-filters)
|
||||||
|
|
||||||
|
|
||||||
# redis
|
# redis
|
||||||
@@ -2433,4 +2436,58 @@ Bloom filter和Cuckoo filter的实现逻辑如下:
|
|||||||
|
|
||||||
##### User Cases
|
##### User Cases
|
||||||
在应用中,Cuckoo filter拥有如下使用示例:
|
在应用中,Cuckoo filter拥有如下使用示例:
|
||||||
- `Targeted ad campaigns`: 在该场景下,Cuckoo filter主要用于处理`用户是否参与了指定活动`
|
- `Targeted ad campaigns`: 在该场景下,Cuckoo filter主要用于处理`用户是否参与了指定活动`。为每个活动都使用一个Cuckoo filter,并且向Cuckoo filter中添加目标用户的id。每次用户访问时,都进行如下校验:
|
||||||
|
- 如果用户id包含在cuckoo filter中,则代表用户没有参与过活动,向用户展示广告
|
||||||
|
- 如果用户点击广告并进行参与,从cuckoo filter中移除user id
|
||||||
|
- 如果用户id不包含在cuckoo filter中,那么代表该用户已经参加过该活动,尝试下一个ad/Cuckoo filter
|
||||||
|
- `Discount code`: 该场景下,Cuckoo filter主要用于处理`折扣码/优惠券是否已经被使用`。可以向Cuckoo Filter中注入所有的折扣码,在每次尝试使用折扣码时,都通过Cuckoo Filter校验:
|
||||||
|
- 如果cuckoo filter表示该折扣码不存在,则校验失败,折扣码已经被使用
|
||||||
|
- 如果cuckoo filter表示该折扣码存在,则继续通过maindatabase来进行校验(`适配false positive的场景`),如果maindatabase校验通过,则将该折扣码从cuckoo filter中也移除
|
||||||
|
|
||||||
|
##### Example
|
||||||
|
```redis-cli
|
||||||
|
> CF.RESERVE bikes:models 1000
|
||||||
|
OK
|
||||||
|
> CF.ADD bikes:models "Smoky Mountain Striker"
|
||||||
|
(integer) 1
|
||||||
|
> CF.EXISTS bikes:models "Smoky Mountain Striker"
|
||||||
|
(integer) 1
|
||||||
|
> CF.EXISTS bikes:models "Terrible Bike Name"
|
||||||
|
(integer) 0
|
||||||
|
> CF.DEL bikes:models "Smoky Mountain Striker"
|
||||||
|
(integer) 1
|
||||||
|
```
|
||||||
|
上述是Cuckoo Filter的使用示例,其通过`CF.RESERVE`创建了初始容量为1000的cuckoo filter。
|
||||||
|
|
||||||
|
> 当key不存在时,直接调用`CF.ADD`命令,也能自动创建一个新的Cuckoo filter,但是通过`CF.RESERVE`命令创建能够按需指定容量。
|
||||||
|
|
||||||
|
##### Cuckoo vs Bloom Filter
|
||||||
|
在插入items时,Bloom Filter的性能和可拓展性通常要更好。`但是,Cuckoo filter的check operation执行更快,并且允许删除操作`。
|
||||||
|
|
||||||
|
##### Sizing Cuckoo filters
|
||||||
|
在Cuckoo filters中,一个bucket可以包含多个entries,每个entry都可以存储一个fingerprint。如果cuckoo filter中所有的entries都存储了fingerprint,那么将没有empty slot来存储新的元素,此时,cuckoo filter将被看作`full`。
|
||||||
|
|
||||||
|
在使用cuckoo filter时,应该保留一定比例的空闲空间。
|
||||||
|
|
||||||
|
当在创建一个新cuckoo filter时,需要指定其capacity和bucket size:
|
||||||
|
```redis-cli
|
||||||
|
CF.RESERVE {key} {capacity} [BUCKETSIZE bucketSize] [MAXITERATIONS maxIterations]
|
||||||
|
[EXPANSION expansion]
|
||||||
|
```
|
||||||
|
- `capacity`:
|
||||||
|
- capacity可以通过如下公式来计算`n * f / a`
|
||||||
|
- `n`代表`numbe of items`
|
||||||
|
- `f`代表`fingerprint length in bits`,如下为8
|
||||||
|
- `a`代表`fill rate or load factor (0<=a<=1)
|
||||||
|
- 基于Cuckoo filter的工作机制,filter在capacity到达上限之前就会声明自身为full,故而fill rate永远不会到达100%
|
||||||
|
- `bucksize`:
|
||||||
|
- bucksize代表每个buckets中可以存储的元素个数,bucket size越大,fill rate越高,但是error rate也会越高,并且会略微影响性能
|
||||||
|
- `error rate`的计算公式为`error_rate = (buckets * hash_functions)/2^fingerprint_size = (buckets*2)/256`
|
||||||
|
- 当bucket size为1时,fill rate为55%,false positive rate大概为`2/256`,即约等于`0.78%`,这也是可以实现的最小false positive rate
|
||||||
|
- 当`buckets`变大时,error rate也会线性的增加,filter的fill rate也会增加。当bucket size为3时,false positive rate约为`2.34%`,并且fill rate约为80%。当bucket size为4时,false positive rate约为`3.12%`,fill rate约为95%
|
||||||
|
- `EXPANSION`:
|
||||||
|
- `when filter self-declare itself full`,其会自动拓展,生成额外的sub-filter,该操作会降低性能并增加error rate。新创建`sub-filter`的容量为`prev_sub_filter_size * EXPANSION`
|
||||||
|
- 该默认值为1
|
||||||
|
- `MAXITERATIONS`:
|
||||||
|
- `MAXITERATIONS`代表`the number of attempts to find a slot for incoming fingerprint`。一旦filter为full后,如果`MAXITERATIONS`越大,插入越慢。
|
||||||
|
- 该默认值为20
|
||||||
Reference in New Issue
Block a user