doc: 阅读redis t-digest文档
This commit is contained in:
@@ -146,6 +146,15 @@
|
||||
- [Example](#example-2)
|
||||
- [Cuckoo vs Bloom Filter](#cuckoo-vs-bloom-filter)
|
||||
- [Sizing Cuckoo filters](#sizing-cuckoo-filters)
|
||||
- [t-digest](#t-digest)
|
||||
- [Use Cases](#use-cases)
|
||||
- [Examples](#examples-1)
|
||||
- [Estimating fractions or ranks by values](#estimating-fractions-or-ranks-by-values)
|
||||
- [Estimating values by fractions or ranks](#estimating-values-by-fractions-or-ranks)
|
||||
- [trimmed mean](#trimmed-mean)
|
||||
- [TDIGEST.MERGE](#tdigestmerge)
|
||||
- [Retrieving sketch information](#retrieving-sketch-information)
|
||||
- [Resetting a sketch](#resetting-a-sketch)
|
||||
|
||||
|
||||
# redis
|
||||
@@ -2490,4 +2499,108 @@ CF.RESERVE {key} {capacity} [BUCKETSIZE bucketSize] [MAXITERATIONS maxIterations
|
||||
- 该默认值为1
|
||||
- `MAXITERATIONS`:
|
||||
- `MAXITERATIONS`代表`the number of attempts to find a slot for incoming fingerprint`。一旦filter为full后,如果`MAXITERATIONS`越大,插入越慢。
|
||||
- 该默认值为20
|
||||
- 该默认值为20
|
||||
|
||||
#### t-digest
|
||||
t-digest是一种`probabilistic data structure`,其允许在不对`set中所有数据`进行实际存储与排序的情况下获取数据的`percentile point`。故而,其可以针对如下场景:`What's the average latency for 99% of my database operations`
|
||||
- 在不使用`t-digest`时,如果要获取上述指标,需要对每位用户都存储平均延迟,并且对平均延迟进行排序,排除最后的百分之一数据,并计算剩余数据的平均值。该操作过程十分耗时
|
||||
- 而通过t-digest可以解决该方面的问题
|
||||
|
||||
t-digest可以用于其他百分位相关的问题,例如`trimmed means`:
|
||||
- `A trimmed mean is the mean value from the sketch, excluding observation values outside the low and high cutoff percentiles.`例如,`0.1 trimmed means`代表排除最低的10%和最高的10%之后计算出的平均值
|
||||
|
||||
##### Use Cases
|
||||
- `Hardware/Software monitoring`:
|
||||
- 当测量online server response latency,可以需要观测如下指标
|
||||
- What are the 50th, 90th, and 99th percentiles of the measured latencies
|
||||
- Which fraction of the measured latencies are less than 25 milliseconds
|
||||
- What is the mean latency, ignoring outliers? or What is the mean latency between the 10th and the 90th percentile?
|
||||
- `Online Gaming`:
|
||||
- 当online gaming platform涉及数百万用户时,可能需要观测如下指标:
|
||||
- Your score is better than x percent of the game sessions played.
|
||||
- There were about y game sessions where people scored larger than you.
|
||||
- To have a better score than 90% of the games played, your score should be z
|
||||
- `Network traffic`:
|
||||
- 在对网络传输中的ip packets进行监测时,如需探测ddos攻击,可能需要观测如下指标:
|
||||
- 过去1s的packets数量是否超过了过去所有packets数量的99%
|
||||
- 在正常网络环境下,期望看到多少packets?
|
||||
##### Examples
|
||||
在如下示例中,将会创建一个`compression为100`的t-digest并且向其中添加item。在t-digest数据结构中,`compression`参数用于在内存消耗和精确度之间做权衡。`compression的默认值为100`,当compression值指定的更大时,t-digest的精确度会更高。
|
||||
|
||||
```redis-cli
|
||||
> TDIGEST.CREATE bikes:sales COMPRESSION 100
|
||||
OK
|
||||
> TDIGEST.ADD bikes:sales 21
|
||||
OK
|
||||
> TDIGEST.ADD bikes:sales 150 95 75 34
|
||||
OK
|
||||
```
|
||||
|
||||
##### Estimating fractions or ranks by values
|
||||
t-digest中一个有用的特性为`CDF`(definition of rank),其能给出小于或等于给定值的fraction。该特性能够解决`What's the percentage of observations with a value lower or equal to X`。
|
||||
> 更精确的说,`TDIGEST.CDF will return the estimated fraction of observations in the sketch that are smaller than X plus half the number of observations that are equal to X. `
|
||||
|
||||
> 也可以使用`TDIGEST.RANK`命令,相比于返回fraction,其会返回value的rank。
|
||||
|
||||
```redis-cli
|
||||
> TDIGEST.CREATE racer_ages
|
||||
OK
|
||||
> TDIGEST.ADD racer_ages 45.88 44.2 58.03 19.76 39.84 69.28 50.97 25.41 19.27 85.71 42.63
|
||||
OK
|
||||
> TDIGEST.CDF racer_ages 50
|
||||
1) "0.63636363636363635"
|
||||
> TDIGEST.RANK racer_ages 50
|
||||
1) (integer) 7
|
||||
> TDIGEST.RANK racer_ages 50 40
|
||||
1) (integer) 7
|
||||
2) (integer) 4
|
||||
```
|
||||
同样的,TDIGEST也支持`TDIGEST.REVRANK`命令,其返回的结果是`the number of (observations larger than a given value + half the observations equal to the given value)`。
|
||||
|
||||
##### Estimating values by fractions or ranks
|
||||
`TDIGEST.QUANTILE key fraction ...`命令可以根据fraction来获取`an estimation of the value (floating point) that is smaller than the given fraction of observations.`。
|
||||
|
||||
`TDIGEST.BYRANK key rank ...`命令可以根据rank来获取`an estimation of the value (floating point) with that rank`。
|
||||
|
||||
使用示例如下所示:
|
||||
```redis-cli
|
||||
> TDIGEST.QUANTILE racer_ages .5
|
||||
1) "44.200000000000003"
|
||||
> TDIGEST.BYRANK racer_ages 4
|
||||
1) "42.630000000000003"
|
||||
```
|
||||
`TDIGEST.BYREVRANK`命令可以根据`reverse rank`来获取value。
|
||||
|
||||
##### trimmed mean
|
||||
如果要计算`TDIGEST`结构的`trimmed mean`,可以使用`TDIGEST.TRIMMED_MEAN {key} low_fraction high_fraction`来获取。
|
||||
|
||||
##### TDIGEST.MERGE
|
||||
可以通过`TDIGEST.MERGE`命令来对sketches进行merge操作。
|
||||
|
||||
假设为3台servers测量了latency,此时需要合并多台servers的测量结果,并且获取合并后结果中90%、95%、99%的latency,此时可以使用`TDIGEST.MERGE`命令。
|
||||
|
||||
TDIGEST.MERGE命令的格式如下:
|
||||
```redis-cli
|
||||
TDIGEST.MERGE destKey numKeys sourceKey... [COMPRESSION compression] [OVERRIDE]
|
||||
```
|
||||
|
||||
在使用上述命令时:
|
||||
- `如果destKey之前不存在`:将会自动创建destKey并且将合并结果设置到key的值
|
||||
- `如果destKey之前已经存在`: 那么destKey的old value将会和`values of source keys`一起合并。如果需要覆盖原`destkey`的内容,需要指定`OVERRIDE`选项
|
||||
|
||||
##### Retrieving sketch information
|
||||
`TDIGEST.MIN`和`TDIGEST.MAX`命令可以用于获取sketch中的最小值和最大值,使用示例如下:
|
||||
```redis-cli
|
||||
> TDIGEST.MIN racer_ages
|
||||
"19.27"
|
||||
> TDIGEST.MAX racer_ages
|
||||
"85.709999999999994"
|
||||
```
|
||||
`如果TDIGEST为空,那么TDIGEST.MIN和TDIGEST.MAX命令都会返回nan`。
|
||||
|
||||
##### Resetting a sketch
|
||||
通过`TDIGEST.RESET`命令能够对sketch进行重置,示例如下:
|
||||
```redis-cli
|
||||
> TDIGEST.RESET racer_ages
|
||||
OK
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user