From 3ed69c87a0dc62eba431169ceecfd8a07061ef19 Mon Sep 17 00:00:00 2001 From: asahi Date: Tue, 23 Sep 2025 14:27:27 +0800 Subject: [PATCH] =?UTF-8?q?doc:=20=E9=98=85=E8=AF=BBredis=20t-digest?= =?UTF-8?q?=E6=96=87=E6=A1=A3?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- 中间件/redis/redis.md | 115 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 114 insertions(+), 1 deletion(-) diff --git a/中间件/redis/redis.md b/中间件/redis/redis.md index bcb357e..ee382a5 100644 --- a/中间件/redis/redis.md +++ b/中间件/redis/redis.md @@ -146,6 +146,15 @@ - [Example](#example-2) - [Cuckoo vs Bloom Filter](#cuckoo-vs-bloom-filter) - [Sizing Cuckoo filters](#sizing-cuckoo-filters) + - [t-digest](#t-digest) + - [Use Cases](#use-cases) + - [Examples](#examples-1) + - [Estimating fractions or ranks by values](#estimating-fractions-or-ranks-by-values) + - [Estimating values by fractions or ranks](#estimating-values-by-fractions-or-ranks) + - [trimmed mean](#trimmed-mean) + - [TDIGEST.MERGE](#tdigestmerge) + - [Retrieving sketch information](#retrieving-sketch-information) + - [Resetting a sketch](#resetting-a-sketch) # redis @@ -2490,4 +2499,108 @@ CF.RESERVE {key} {capacity} [BUCKETSIZE bucketSize] [MAXITERATIONS maxIterations - 该默认值为1 - `MAXITERATIONS`: - `MAXITERATIONS`代表`the number of attempts to find a slot for incoming fingerprint`。一旦filter为full后,如果`MAXITERATIONS`越大,插入越慢。 - - 该默认值为20 \ No newline at end of file + - 该默认值为20 + +#### t-digest +t-digest是一种`probabilistic data structure`,其允许在不对`set中所有数据`进行实际存储与排序的情况下获取数据的`percentile point`。故而,其可以针对如下场景:`What's the average latency for 99% of my database operations` +- 在不使用`t-digest`时,如果要获取上述指标,需要对每位用户都存储平均延迟,并且对平均延迟进行排序,排除最后的百分之一数据,并计算剩余数据的平均值。该操作过程十分耗时 +- 而通过t-digest可以解决该方面的问题 + +t-digest可以用于其他百分位相关的问题,例如`trimmed means`: +- `A trimmed mean is the mean value from the sketch, excluding observation values outside the low and high cutoff percentiles.`例如,`0.1 trimmed means`代表排除最低的10%和最高的10%之后计算出的平均值 + +##### Use Cases +- `Hardware/Software monitoring`: + - 当测量online server response latency,可以需要观测如下指标 + - What are the 50th, 90th, and 99th percentiles of the measured latencies + - Which fraction of the measured latencies are less than 25 milliseconds + - What is the mean latency, ignoring outliers? or What is the mean latency between the 10th and the 90th percentile? +- `Online Gaming`: + - 当online gaming platform涉及数百万用户时,可能需要观测如下指标: + - Your score is better than x percent of the game sessions played. + - There were about y game sessions where people scored larger than you. + - To have a better score than 90% of the games played, your score should be z +- `Network traffic`: + - 在对网络传输中的ip packets进行监测时,如需探测ddos攻击,可能需要观测如下指标: + - 过去1s的packets数量是否超过了过去所有packets数量的99% + - 在正常网络环境下,期望看到多少packets? +##### Examples +在如下示例中,将会创建一个`compression为100`的t-digest并且向其中添加item。在t-digest数据结构中,`compression`参数用于在内存消耗和精确度之间做权衡。`compression的默认值为100`,当compression值指定的更大时,t-digest的精确度会更高。 + +```redis-cli +> TDIGEST.CREATE bikes:sales COMPRESSION 100 +OK +> TDIGEST.ADD bikes:sales 21 +OK +> TDIGEST.ADD bikes:sales 150 95 75 34 +OK +``` + +##### Estimating fractions or ranks by values +t-digest中一个有用的特性为`CDF`(definition of rank),其能给出小于或等于给定值的fraction。该特性能够解决`What's the percentage of observations with a value lower or equal to X`。 +> 更精确的说,`TDIGEST.CDF will return the estimated fraction of observations in the sketch that are smaller than X plus half the number of observations that are equal to X. ` + +> 也可以使用`TDIGEST.RANK`命令,相比于返回fraction,其会返回value的rank。 + +```redis-cli +> TDIGEST.CREATE racer_ages +OK +> TDIGEST.ADD racer_ages 45.88 44.2 58.03 19.76 39.84 69.28 50.97 25.41 19.27 85.71 42.63 +OK +> TDIGEST.CDF racer_ages 50 +1) "0.63636363636363635" +> TDIGEST.RANK racer_ages 50 +1) (integer) 7 +> TDIGEST.RANK racer_ages 50 40 +1) (integer) 7 +2) (integer) 4 +``` +同样的,TDIGEST也支持`TDIGEST.REVRANK`命令,其返回的结果是`the number of (observations larger than a given value + half the observations equal to the given value)`。 + +##### Estimating values by fractions or ranks +`TDIGEST.QUANTILE key fraction ...`命令可以根据fraction来获取`an estimation of the value (floating point) that is smaller than the given fraction of observations.`。 + +`TDIGEST.BYRANK key rank ...`命令可以根据rank来获取`an estimation of the value (floating point) with that rank`。 + +使用示例如下所示: +```redis-cli +> TDIGEST.QUANTILE racer_ages .5 +1) "44.200000000000003" +> TDIGEST.BYRANK racer_ages 4 +1) "42.630000000000003" +``` +`TDIGEST.BYREVRANK`命令可以根据`reverse rank`来获取value。 + +##### trimmed mean +如果要计算`TDIGEST`结构的`trimmed mean`,可以使用`TDIGEST.TRIMMED_MEAN {key} low_fraction high_fraction`来获取。 + +##### TDIGEST.MERGE +可以通过`TDIGEST.MERGE`命令来对sketches进行merge操作。 + +假设为3台servers测量了latency,此时需要合并多台servers的测量结果,并且获取合并后结果中90%、95%、99%的latency,此时可以使用`TDIGEST.MERGE`命令。 + +TDIGEST.MERGE命令的格式如下: +```redis-cli +TDIGEST.MERGE destKey numKeys sourceKey... [COMPRESSION compression] [OVERRIDE] +``` + +在使用上述命令时: +- `如果destKey之前不存在`:将会自动创建destKey并且将合并结果设置到key的值 +- `如果destKey之前已经存在`: 那么destKey的old value将会和`values of source keys`一起合并。如果需要覆盖原`destkey`的内容,需要指定`OVERRIDE`选项 + +##### Retrieving sketch information +`TDIGEST.MIN`和`TDIGEST.MAX`命令可以用于获取sketch中的最小值和最大值,使用示例如下: +```redis-cli +> TDIGEST.MIN racer_ages +"19.27" +> TDIGEST.MAX racer_ages +"85.709999999999994" +``` +`如果TDIGEST为空,那么TDIGEST.MIN和TDIGEST.MAX命令都会返回nan`。 + +##### Resetting a sketch +通过`TDIGEST.RESET`命令能够对sketch进行重置,示例如下: +```redis-cli +> TDIGEST.RESET racer_ages +OK +```