Database Manual / Reference / Query Language / Accumulators

$percentile (aggregation)(聚合)

Definition定义

$percentile

New in version 7.0.在版本7.0中新增。

Returns an array of scalar values that correspond to specified percentile values.返回与指定百分位值对应的标量值数组。

You can use $percentile as an accumulator in the $group stage or as an aggegation expression.您可以在$group阶段将$percentile用作累加器,也可以将其用作聚集表达式

Syntax语法

The syntax for $percentile is:$percentile的语法是:

{
$percentile: {
input: <expression>,
p: [ <expression1>, <expression2>, ... ],
method: <string>
}
}

Command Fields命令字段

$percentile takes the following fields:采用以下字段:

Field字段Type类型Necessity必要性Description描述
inputExpression表达式Required必需$percentile calculates the percentile values of this data. 计算此数据的百分位值。input must be a field name or an expression that evaluates to a numeric type. 必须是字段名或计算结果为数字类型的表达式。If the expression cannot be converted to a numeric type, the $percentile calculation ignores it.如果表达式无法转换为数值类型,则$percentile计算将忽略它。
pExpression表达式Required必需

$percentile calculates a percentile value for each element in p. The elements represent percentages and must evaluate to numeric values in the range 0.0 to 1.0, inclusive.$percentile计算p中每个元素的百分位值。元素表示百分比,并且必须计算为0.01.0(包括0.01.0)范围内的数值。

$percentile returns results in the same order as the elements in p.$percentile返回的结果与p中的元素顺序相同。

methodString字符串Required必需The method that mongod uses to calculate the percentile value. The method must be 'approximate'.mongod用来计算百分位值的方法。该方法必须是“近似”的。

Behavior行为

You can use $percentile in:您可以在以下情况下使用$percentile

  • $group stages as an accumulator$group阶段中作为累加器
  • $setWindowFields stages as an accumulator阶段中作为累加器
  • $project stages as an aggregation expression阶段中作为聚合表达式

$percentile has the following characteristics as an accumulator, it:作为累加器,它具有以下特点:

  • Calculates a single result for all the documents in the stage.为阶段中的所有文档计算单个结果。
  • Uses the t-digest algorithm to calculate approximate, percentile based metrics.使用t-digest算法计算基于百分位数的近似指标。
  • Uses approximate methods to scale to large volumes of data.使用近似方法扩展到大量数据。

$percentile has the following characteristics as an aggregation expression, it:作为聚合表达式,它具有以下特征:

  • Accepts an array as input接受数组作为输入
  • Calculates a separate result for each input document为每个输入文档计算单独的结果

Type of Operation操作类型

In a $group stage, $percentile is an accumulator and calculates a value for all documents in the window.$group阶段,$percentile是一个累加器,用于计算窗口中所有文档的值。

In a $project stage, $percentile is an aggregation expression and calculates values for each document.$project阶段,$percentile是一个聚合表达式,用于计算每个文档的值。

In $setWindowFields stages, $percentile returns a result for each document like an aggregation expression, but the results are computed over groups of documents like an accumulator.$setWindowFields阶段,$percentile像聚合表达式一样返回每个文档的结果,但结果是像累加器一样在文档组上计算的。

Calculation Considerations计算注意事项

In $group stages, $percentile always uses an approximate calculation method.$group阶段,$percentile始终使用近似计算方法。

In $project stages, $percentile might use the discrete calculation method even when the approximate method is specified.$project阶段,即使指定了近似方法,$percentile也可能使用离散计算方法。

In $setWindowFields stages, the workload determines the calculation method that $percentile uses.$setWindowFields阶段,工作负载决定了$percentile使用的计算方法。

The computed percentiles $percentile returns might vary, even on the same datasets. This is because the algorithm calculates approximate values.即使在相同的数据集上,计算出的百分位数$percentile返回也可能有所不同。这是因为该算法计算近似值。

Duplicate samples can cause ambiguity. If there are a large number of duplicates, the percentile values may not represent the actual sample distribution. Consider a data set where all the samples are the same. All of the values in the data set fall at or below any percentile. 重复的样本可能会导致歧义。如果存在大量重复项,百分位值可能无法代表实际的样本分布。考虑一个所有样本都相同的数据集。数据集中的所有值都处于或低于任何百分位数。A "50th percentile" value would actually represent either 0 or 100 percent of the samples.“第50百分位”值实际上表示样本的0%或100%。

$percentile returns the minimum value for p = 0.0.对于p=0.0$percentile返回最小值。

$percentile returns the maximum value for p = 1.0.对于p=1.0$percentile返回最大值。

Array Input数组输入

If you use $percentile as an aggregation expression in a $project stage, you can use an array as input. The syntax is:如果在$project阶段使用$percentile作为聚合表达式,则可以使用数组作为输入。语法为:

{
$percentile: {
input: [ <expression1, <expression2>, .., <expressionN> ],
p: [ <expression1>, <expression2>, ... ],
method: <string>
}
}

Window Functions窗口函数

A window function lets you calculate results over a moving "window" of neighboring documents. As each document passes though the pipeline, the $setWindowFields stage:窗口函数允许您在相邻文档的移动“窗口”上计算结果。当每个文档通过管道时,$setWindowFields阶段:

  • Recomputes the set of documents in the current window重新计算当前窗口中的文档集
  • calculates a value for all documents in the set计算集合中所有文档的值
  • returns a single value for that document为该文档返回一个值

You can use $percentile in a $setWindowFields stage to calculate rolling statistics for time series or other related data.您可以在$setWindowFields阶段中使用$percentile来计算时间序列或其他相关数据的滚动统计数据。

When you use $percentile in a $setWindowField stage, the input value must be a field name. If you enter an array instead of a field name, the operation fails.$setWindowField阶段使用$percentile时,input值必须是字段名。如果输入数组而不是字段名,则操作失败。

Examples示例

The following examples use the testScores collection. Create the collection:以下示例使用testScores集合。创建集合:

db.testScores.insertMany( [
{ studentId: "2345", test01: 62, test02: 81, test03: 80 },
{ studentId: "2356", test01: 60, test02: 83, test03: 79 },
{ studentId: "2358", test01: 67, test02: 82, test03: 78 },
{ studentId: "2367", test01: 64, test02: 72, test03: 77 },
{ studentId: "2369", test01: 60, test02: 53, test03: 72 }
] )

Calculate a Single Value as an Accumulator将单个值作为累加器进行计算

Create an accumulator that calculates a single percentile value:创建一个计算单个百分位值的累加器:

db.testScores.aggregate( [
{
$group: {
_id: null,
test01_percentiles: {
$percentile: {
input: "$test01",
p: [ 0.95 ],
method: 'approximate'
}
},
}
}
] )

Output:输出:

{ _id: null, test01_percentiles: [ 67 ] }

The _id field value is null so $group selects all the documents in the collection._id字段值为null,因此$group选择集合中的所有文档。

The percentile accumulator takes its input data from the test01 field.$percentile累加器从test01字段获取输入数据。

In this example, the percentiles array, p, has one value so the $percentile operator only calculates one term for the test01 data. The 95th percentile value is 67.在这个例子中,百分位数数组p有一个值,因此$percentile运算符只计算test01数据的一个项。第95百分位值为67。

Calculate Multiple Values as an Accumulator作为累加器计算多个值

Create an accumulator that calculates multiple percentile values:创建一个计算多个百分位值的累加器:

db.testScores.aggregate( [
{
$group: {
_id: null,
test01_percentiles: {
$percentile: {
input: "$test01",
p: [ 0.5, 0.75, 0.9, 0.95 ],
method: 'approximate'
}
},
test02_percentiles: {
$percentile: {
input: "$test02",
p: [ 0.5, 0.75, 0.9, 0.95 ],
method: 'approximate'
}
},
test03_percentiles: {
$percentile: {
input: "$test03",
p: [ 0.5, 0.75, 0.9, 0.95 ],
method: 'approximate'
}
},
test03_percent_alt: {
$percentile: {
input: "$test03",
p: [ 0.9, 0.5, 0.75, 0.95 ],
method: 'approximate'
}
},
}
}
] )

Output:输出:

{
_id: null,
test01_percentiles: [ 62, 64, 67, 67 ],
test02_percentiles: [ 81, 82, 83, 83 ],
test03_percentiles: [ 78, 79, 80, 80 ],
test03_percent_alt: [ 80, 78, 79, 80 ]
}

The _id field value is null so $group selects all the documents in the collection._id字段值为空,因此$group选择集合中的所有文档。

The percentile accumulator calculates values for three fields, test01, test02, and test03.percentile数累加器计算三个字段test01test02test03的值。

The accumulator calculates the 50th, 75th, 90th, and 95th percentile values for each input field.累加器计算每个输入字段的第50、75、90和95百分位值。

The percentile values are returned in the same order as the elements of p. The values in test03_percentiles and test03_percent_alt are the same, but their order is different. The order of elements in each result array matches the corresponding order of elements in p.百分位值的返回顺序与p的元素顺序相同。test03_percentilestest03_ppercent_alt中的值相同,但顺序不同。每个结果数组中元素的顺序与p中元素的相应顺序相匹配。

Use $percentile in a $project Stage$project阶段中使用$percentile

In a $project stage, $percentile is an aggregation expression and calculates values for each document.$project阶段,$percentile是一个聚合表达式,用于计算每个文档的值。

You can use a field name or an array as input in a $project stage.您可以在$project阶段使用字段名或数组作为输入。

db.testScores.aggregate( [
{
$project: {
_id: 0,
studentId: 1,
testPercentiles: {
$percentile: {
input: [ "$test01", "$test02", "$test03" ],
p: [ 0.5, 0.95 ],
method: 'approximate'
}
}
}
}
] )

Output:输出:

{ studentId: '2345', testPercentiles: [ 80, 81 ] },
{ studentId: '2356', testPercentiles: [ 79, 83 ] },
{ studentId: '2358', testPercentiles: [ 78, 82 ] },
{ studentId: '2367', testPercentiles: [ 72, 77 ] },
{ studentId: '2369', testPercentiles: [ 60, 72 ] }

When $percentile is an aggregation expression there is a result for each studentId.$percentile是一个聚合表达式时,每个studentId都有一个结果。

Use $percentile in a $setWindowField Stage$setWindowField阶段中使用$percentile

To base your percentile values on local data trends, use $percentile in a $setWindowField aggregation pipeline stage.要将百分位值基于本地数据趋势,请在$setWindowField聚合管道阶段使用$percentile

This example creates a window to filter scores:此示例创建了一个窗口来筛选分数:

db.testScores.aggregate( [
{
$setWindowFields: {
sortBy: { test01: 1 },
output: {
test01_95percentile: {
$percentile: {
input: "$test01",
p: [ 0.95 ],
method: 'approximate'
},
window: {
range: [ -3, 3 ]
}
}
}
}
},
{
$project: {
_id: 0,
studentId: 1,
test01_95percentile: 1
}
}
] )

Output:输出:

{ studentId: '2356', test01_95percentile: [ 62 ] },
{ studentId: '2369', test01_95percentile: [ 62 ] },
{ studentId: '2345', test01_95percentile: [ 64 ] },
{ studentId: '2367', test01_95percentile: [ 67 ] },
{ studentId: '2358', test01_95percentile: [ 67 ] }

In this example, the percentile calculation for each document also incorporates data from the three documents before and after it.在这个例子中,每个文档的百分位数计算还包含了之前和之后三个文档的数据。

Learn More了解更多

The $median operator is a special case of the $percentile operator that uses a fixed value of p: [ 0.5 ].$median运算符是$percentile运算符的一个特例,它使用固定值p:[0.5]

For more information on window functions, see: $setWindowFields.有关窗口函数的更多信息,请参阅:$setWindowFields