doc: 阅读redis t-digest文档

This commit is contained in:
asahi
2025-09-23 14:27:27 +08:00
parent 1a01da4c34
commit 3ed69c87a0

View File

@@ -146,6 +146,15 @@
- [Example](#example-2)
- [Cuckoo vs Bloom Filter](#cuckoo-vs-bloom-filter)
- [Sizing Cuckoo filters](#sizing-cuckoo-filters)
- [t-digest](#t-digest)
- [Use Cases](#use-cases)
- [Examples](#examples-1)
- [Estimating fractions or ranks by values](#estimating-fractions-or-ranks-by-values)
- [Estimating values by fractions or ranks](#estimating-values-by-fractions-or-ranks)
- [trimmed mean](#trimmed-mean)
- [TDIGEST.MERGE](#tdigestmerge)
- [Retrieving sketch information](#retrieving-sketch-information)
- [Resetting a sketch](#resetting-a-sketch)
# redis
@@ -2491,3 +2500,107 @@ CF.RESERVE {key} {capacity} [BUCKETSIZE bucketSize] [MAXITERATIONS maxIterations
- `MAXITERATIONS`:
- `MAXITERATIONS`代表`the number of attempts to find a slot for incoming fingerprint`。一旦filter为full后如果`MAXITERATIONS`越大,插入越慢。
- 该默认值为20
#### t-digest
t-digest是一种`probabilistic data structure`,其允许在不对`set中所有数据`进行实际存储与排序的情况下获取数据的`percentile point`。故而,其可以针对如下场景:`What's the average latency for 99% of my database operations`
- 在不使用`t-digest`时,如果要获取上述指标,需要对每位用户都存储平均延迟,并且对平均延迟进行排序,排除最后的百分之一数据,并计算剩余数据的平均值。该操作过程十分耗时
- 而通过t-digest可以解决该方面的问题
t-digest可以用于其他百分位相关的问题例如`trimmed means`:
- `A trimmed mean is the mean value from the sketch, excluding observation values outside the low and high cutoff percentiles.`例如,`0.1 trimmed means`代表排除最低的10%和最高的10%之后计算出的平均值
##### Use Cases
- `Hardware/Software monitoring`
- 当测量online server response latency可以需要观测如下指标
- What are the 50th, 90th, and 99th percentiles of the measured latencies
- Which fraction of the measured latencies are less than 25 milliseconds
- What is the mean latency, ignoring outliers? or What is the mean latency between the 10th and the 90th percentile?
- `Online Gaming`:
- 当online gaming platform涉及数百万用户时可能需要观测如下指标
- Your score is better than x percent of the game sessions played.
- There were about y game sessions where people scored larger than you.
- To have a better score than 90% of the games played, your score should be z
- `Network traffic`:
- 在对网络传输中的ip packets进行监测时如需探测ddos攻击可能需要观测如下指标
- 过去1s的packets数量是否超过了过去所有packets数量的99%
- 在正常网络环境下期望看到多少packets?
##### Examples
在如下示例中,将会创建一个`compression为100`的t-digest并且向其中添加item。在t-digest数据结构中`compression`参数用于在内存消耗和精确度之间做权衡。`compression的默认值为100`当compression值指定的更大时t-digest的精确度会更高。
```redis-cli
> TDIGEST.CREATE bikes:sales COMPRESSION 100
OK
> TDIGEST.ADD bikes:sales 21
OK
> TDIGEST.ADD bikes:sales 150 95 75 34
OK
```
##### Estimating fractions or ranks by values
t-digest中一个有用的特性为`CDF`(definition of rank)其能给出小于或等于给定值的fraction。该特性能够解决`What's the percentage of observations with a value lower or equal to X`。
> 更精确的说,`TDIGEST.CDF will return the estimated fraction of observations in the sketch that are smaller than X plus half the number of observations that are equal to X. `
> 也可以使用`TDIGEST.RANK`命令相比于返回fraction其会返回value的rank。
```redis-cli
> TDIGEST.CREATE racer_ages
OK
> TDIGEST.ADD racer_ages 45.88 44.2 58.03 19.76 39.84 69.28 50.97 25.41 19.27 85.71 42.63
OK
> TDIGEST.CDF racer_ages 50
1) "0.63636363636363635"
> TDIGEST.RANK racer_ages 50
1) (integer) 7
> TDIGEST.RANK racer_ages 50 40
1) (integer) 7
2) (integer) 4
```
同样的TDIGEST也支持`TDIGEST.REVRANK`命令,其返回的结果是`the number of (observations larger than a given value + half the observations equal to the given value)`。
##### Estimating values by fractions or ranks
`TDIGEST.QUANTILE key fraction ...`命令可以根据fraction来获取`an estimation of the value (floating point) that is smaller than the given fraction of observations.`。
`TDIGEST.BYRANK key rank ...`命令可以根据rank来获取`an estimation of the value (floating point) with that rank`。
使用示例如下所示:
```redis-cli
> TDIGEST.QUANTILE racer_ages .5
1) "44.200000000000003"
> TDIGEST.BYRANK racer_ages 4
1) "42.630000000000003"
```
`TDIGEST.BYREVRANK`命令可以根据`reverse rank`来获取value。
##### trimmed mean
如果要计算`TDIGEST`结构的`trimmed mean`,可以使用`TDIGEST.TRIMMED_MEAN {key} low_fraction high_fraction`来获取。
##### TDIGEST.MERGE
可以通过`TDIGEST.MERGE`命令来对sketches进行merge操作。
假设为3台servers测量了latency此时需要合并多台servers的测量结果并且获取合并后结果中90%、95%、99%的latency此时可以使用`TDIGEST.MERGE`命令。
TDIGEST.MERGE命令的格式如下
```redis-cli
TDIGEST.MERGE destKey numKeys sourceKey... [COMPRESSION compression] [OVERRIDE]
```
在使用上述命令时:
- `如果destKey之前不存在`将会自动创建destKey并且将合并结果设置到key的值
- `如果destKey之前已经存在` 那么destKey的old value将会和`values of source keys`一起合并。如果需要覆盖原`destkey`的内容,需要指定`OVERRIDE`选项
##### Retrieving sketch information
`TDIGEST.MIN`和`TDIGEST.MAX`命令可以用于获取sketch中的最小值和最大值使用示例如下
```redis-cli
> TDIGEST.MIN racer_ages
"19.27"
> TDIGEST.MAX racer_ages
"85.709999999999994"
```
`如果TDIGEST为空那么TDIGEST.MIN和TDIGEST.MAX命令都会返回nan`。
##### Resetting a sketch
通过`TDIGEST.RESET`命令能够对sketch进行重置示例如下
```redis-cli
> TDIGEST.RESET racer_ages
OK
```