$percentile (aggregation)
On this page本页内容
Definition定义
$percentile
New in version 7.0.7.0版新增。Returns an array of scalar values that correspond to specified percentile values.返回与指定百分位数相对应的标量值数组。You can use您可以将$percentile
as an accumulator in the$group
stage or as an aggegation expression.$percentage
用作$group
阶段中的累加器或聚合表达式。
Syntax语法
The syntax for $percentile
is:$percentage
的语法为:
{
$percentile: {
input: <expression>,
p: [ <expression1>, <expression2>, ... ],
method: <string>
}
}
Command Fields命令字段
$percentile
takes the following fields:采用以下字段:
input | Expression | $percentile input must be a field name or an expression that evaluates to a numeric type. input 必须是字段名或计算结果为数字类型的表达式。$percentile calculation ignores it$percentage 计算会忽略它. | |
p | Expression | $percentile p . p 中每个元素的百分比值。0.0 to 1.0 , inclusive.0.0 到1.0 (包括0.0 和1.0 )范围内的数值。$percentile p . p 中元素相同的顺序返回结果。 | |
method | String | mongod uses to calculate the percentile value. mongod 用于计算百分位数的方法。'approximate' .'approximate' 。 |
Behavior行为
You can use 您可以在以下位置使用$percentile
in:$percentage
:
$group
stages as an accumulator阶段作为累加器$setWindowFields
stages as an accumulator阶段作为累加器$project
stages as an aggregation expression阶段作为聚合表达式
$percentile
has the following characteristics as an accumulator, it:作为累加器,它具有以下特性:
Calculates a single result for all the documents in the stage.计算阶段中所有文档的单个结果。Uses the t-digest使用t-摘要algorithm to calculate approximate, percentile based metrics.
算法计算基于百分比的近似度量。
Uses approximate methods to scale to large volumes of data.使用近似方法扩展到大量数据。
$percentile
has the following characteristics as an aggregation expression, it:作为聚合表达式,它具有以下特性:
Accepts an array as input接受数组作为输入Calculates a separate result for each input document为每个输入文档计算单独的结果
Type of Operation操作类型
In a 在$group
stage, $percentile
is an accumulator and calculates a value for all documents in the window.$group
阶段,$percentage
是一个累加器,用于计算窗口中所有文档的值。
In a 在$project
stage, $percentile
is an aggregation expression and calculates values for each document.$project
阶段中,$percentage
是一个聚合表达式,用于计算每个文档的值。
In 在$setWindowFields
stages, $percentile
returns a result for each document like an aggregation expression, but the results are computed over groups of documents like an accumulator.$setWindowFields
阶段中,$percentage
为每个文档返回一个结果,类似于聚合表达式,但结果是在文档组上计算的,类似于累加器。
Calculation Considerations计算注意事项
In 在$group
stages, $percentile
always uses an approximate calculation method.$group
阶段,$percentage
始终使用近似计算方法。
In 在$project
stages, $percentile
might use the discrete calculation method even when the approximate method is specified.$project
阶段,即使指定了近似方法,$percentage
也可能使用离散计算方法。
In 在$setWindowFields
stages, the workload determines the calculation method that $percentile
uses.$setWindowFields
阶段中,工作负载决定了$percentage
使用的计算方法。
The computed percentiles 即使在相同的数据集上,计算的百分位数$percentile
returns might vary, even on the same datasets. This is because the algorithm calculates approximate values.$percentage
返回值也可能有所不同。这是因为该算法计算近似值。
Duplicate samples can cause ambiguity. 重复的样本可能导致歧义。If there are a large number of duplicates, the percentile values may not represent the actual sample distribution. 如果存在大量重复,则百分位数可能无法代表实际的样本分布。Consider a data set where all the samples are the same. 考虑一个所有样本都相同的数据集。All of the values in the data set fall at or below any percentile. 数据集中的所有值都位于或低于任何百分位数。A "50th percentile" value would actually represent either 0 or 100 percent of the samples.“第50个百分位”的值实际上代表0或100%的样本。
$percentile
returns the minimum value for 返回p = 0.0
.p=0.0
时的最小值。
$percentile
returns the maximum value for 返回p = 1.0
.p=1.0
时的最大值。
Array Input数组输入
If you use 如果在$percentile
as an aggregation expression in a $project
stage, you can use an array as input. The syntax is:$project
阶段中使用$percentage
作为聚合表达式,则可以使用数组作为输入。语法为:
{
$percentile: {
input: [ <expression1, <expression2>, .., <expressionN> ],
p: [ <expression1>, <expression2>, ... ],
method: <string>
}
}
Window Functions窗口函数
A window function lets you calculate results over a moving "window" of neighboring documents. 窗口函数使您可以在相邻文档的移动“窗口”上计算结果。As each document passes though the pipeline, the 当每个文档通过管道时,$setWindowFields
stage:$setWindowFields
阶段:
Recomputes the set of documents in the current window在当前窗口中重新计算文档集calculates a value for all documents in the set计算集合中所有文档的值returns a single value for that document返回该文档的单个值
You can use 您可以在$percentile
in a $setWindowFields
stage to calculate rolling statistics for time series or other related data.$setWindowFields
阶段中使用$percentage
来计算时间序列或其他相关数据的滚动统计信息。
When you use 在$percentile
in a $setWindowField
stage, the input
value must be a field name. If you enter an array instead of a field name, the operation fails.$setWindowField
阶段中使用$percentage
时,input
值必须是字段名。如果输入的是数组而不是字段名,则操作将失败。
Examples实例
The following examples use the 以下示例使用testScores
collection. testScores
集合。Create the collection:创建集合:
db.testScores.insertMany( [
{ studentId: "2345", test01: 62, test02: 81, test03: 80 },
{ studentId: "2356", test01: 60, test02: 83, test03: 79 },
{ studentId: "2358", test01: 67, test02: 82, test03: 78 },
{ studentId: "2367", test01: 64, test02: 72, test03: 77 },
{ studentId: "2369", test01: 60, test02: 53, test03: 72 }
] )
Calculate a Single Value as an Accumulator将单个值计算为累加器
Create an accumulator that calculates a single percentile value:创建一个累加器,用于计算单个百分比值:
db.testScores.aggregate( [
{
$group: {
_id: null,
test01_percentiles: {
$percentile: {
input: "$test01",
p: [ 0.95 ],
method: 'approximate'
}
},
}
}
] )
Output:输出:
{ _id: null, test01_percentiles: [ 67 ] }
The _id
field value is null
so $group
selects all the documents in the collection._id
字段值为null
,因此$group
选择集合中的所有文档。
The percentile
accumulator takes its input data from the test01
field.percentile
累加器从test01
字段获取其输入数据。
In this example, the percentiles array, 在本例中,百分位数数组p
, has one value so the $percentile
operator only calculates one term for the test01
data. p
有一个值,因此$percentage
运算符只为test01
数据计算一个项。The 95th percentile value is 第95个百分位数为67
.67
。
Calculate Multiple Values as an Accumulator作为累加器计算多个值
Create an accumulator that calculates multiple percentile values:创建一个计算多个百分比值的累加器:
db.testScores.aggregate( [
{
$group: {
_id: null,
test01_percentiles: {
$percentile: {
input: "$test01",
p: [ 0.5, 0.75, 0.9, 0.95 ],
method: 'approximate'
}
},
test02_percentiles: {
$percentile: {
input: "$test02",
p: [ 0.5, 0.75, 0.9, 0.95 ],
method: 'approximate'
}
},
test03_percentiles: {
$percentile: {
input: "$test03",
p: [ 0.5, 0.75, 0.9, 0.95 ],
method: 'approximate'
}
},
test03_percent_alt: {
$percentile: {
input: "$test03",
p: [ 0.9, 0.5, 0.75, 0.95 ],
method: 'approximate'
}
},
}
}
] )
Output:输出:
{
_id: null,
test01_percentiles: [ 62, 64, 67, 67 ],
test02_percentiles: [ 81, 82, 83, 83 ],
test03_percentiles: [ 78, 79, 80, 80 ],
test03_percent_alt: [ 80, 78, 79, 80 ]
}
The _id
field value is null
so $group
selects all the documents in the collection._id
字段值为null
,因此$group
选择集合中的所有文档。
The 百分位数累加器计算percentile
accumulator calculates values for three fields, test01
, test02
, and test03
.test01
、test02
和test03
三个字段的值。
The accumulator calculates the 50th, 75th, 90th, and 95th percentile values for each input field.累加器计算每个输入字段的第50、第75、第90和第95百分位数。
The percentile values are returned in the same order as the elements of 百分位数的返回顺序与p
. The values in test03_percentiles
and test03_percent_alt
are the same, but their order is different. p
的元素相同。test03_percentiles
和test03_persent_alt
中的值相同,但顺序不同。The order of elements in each result array matches the corresponding order of elements in 每个结果数组中元素的顺序与p
.p
中元素的相应顺序相匹配。
Use $percentile
in a $project
Stage在$project
阶段中使用$percentage
$percentile
in a $project
StageIn a 在$project
stage, $percentile
is an aggregation expression and calculates values for each document.$project
阶段中,$percentage
是一个聚合表达式,用于计算每个文档的值。
You can use a field name or an array as input in a 您可以使用字段名或数组作为$project
stage.$project
阶段中的输入。
db.testScores.aggregate( [
{
$project: {
_id: 0,
studentId: 1,
testPercentiles: {
$percentile: {
input: [ "$test01", "$test02", "$test03" ],
p: [ 0.5, 0.95 ],
method: 'approximate'
}
}
}
}
] )
Output:输出:
{ studentId: '2345', testPercentiles: [ 80, 81 ] },
{ studentId: '2356', testPercentiles: [ 79, 83 ] },
{ studentId: '2358', testPercentiles: [ 78, 82 ] },
{ studentId: '2367', testPercentiles: [ 72, 77 ] },
{ studentId: '2369', testPercentiles: [ 60, 72 ] }
When 当$percentile
is an aggregation expression there is a result for each studentId
.$percentage
是一个聚合表达式时,每个studentId
都有一个结果。
Use $percentile
in a $setWindowField
Stage在$setWindowField
阶段中使用$percentage
$percentile
in a $setWindowField
StageTo base your percentile values on local data trends, use 要根据本地数据趋势确定百分比值,请在$percentile
in a $setWindowField
aggregation pipeline stage.$setWindowField
聚合管道阶段中使用$percentage
。
This example creates a window to filter scores:本例创建了一个筛选分数的窗口:
db.testScores.aggregate( [
{
$setWindowFields: {
sortBy: { test01: 1 },
output: {
test01_95percentile: {
$percentile: {
input: "$test01",
p: [ 0.95 ],
method: 'approximate'
},
window: {
range: [ -3, 3 ]
}
}
}
}
},
{
$project: {
_id: 0,
studentId: 1,
test01_95percentile: 1
}
}
] )
Output:输出:
{ studentId: '2356', test01_95percentile: [ 62 ] },
{ studentId: '2369', test01_95percentile: [ 62 ] },
{ studentId: '2345', test01_95percentile: [ 64 ] },
{ studentId: '2367', test01_95percentile: [ 67 ] },
{ studentId: '2358', test01_95percentile: [ 67 ] }
In this example, the percentile calculation for each document also incorporates data from the three documents before and after it.在本例中,每个文档的百分比计算还合并了之前和之后三个文档的数据。
Learn More了解更多信息
The $median
operator is a special case of the $percentile
operator that uses a fixed value of p: [ 0.5 ]
.$median
运算符是使用固定值p:[0.5]
的$percentage
运算符的特殊情况。
For more information on window functions, see: 有关窗口函数的详细信息,请参阅:$setWindowFields
.$setWindowFields
。