Database Manual / Reference / Query Language / Accumulators

$median (accumulator operator)(蓄能器运算符)

Definition定义

$median

New in version 7.0.在版本7.0中新增。

Returns an approximation of the median, the 50th percentile, as a scalar value.返回中位值的近似值,即第50百分位,作为标量值。

You can use $median as an accumulator in the $group stage or as an aggegation expression.您可以在$group阶段将$median用作累加器,也可以将其用作聚合表达式

Syntax语法

The syntax for $median is:$median的语法是:

{
$median: {
input: <number>,
method: <string>
}
}

Command Fields命令字段

$median takes the following fields:采用以下字段:

Field字段Type类型Necessity必要性Description描述
inputExpression表达Required必需$median calculates the 50th percentile value of this data. input must be a field name or an expression that evaluates to a numeric type. If the expression cannot be converted to a numeric type, the $median calculation ignores it.$mediate计算此数据的第50百分位值。input必须是字段名或计算结果为数字类型的表达式。如果表达式无法转换为数值类型,$median计算将忽略它。
methodString字符串Required必需The method that mongod uses to calculate the 50th percentile value. The method must be 'approximate'.mongod用来计算第50百分位值的方法。该方法必须是“近似”的。

Behavior行为

You can use $median in:您可以在以下情况下使用$median

  • $group stages as an accumulator$group阶段中作为累加器
  • $setWindowFields stages as an accumulator$setWindowFields阶段中作为累加器
  • $project stages as an aggregation expression$project阶段中作为聚合表达式

$median has the following characteristics as an accumulator, it:作为累加器,它具有以下特点:

  • Calculates a single result for all the documents in the stage.为阶段中的所有文档计算单个结果。
  • Uses the t-digest algorithm to calculate approximate, percentile based metrics.使用t-digest算法计算基于百分位数的近似指标。
  • Uses approximate methods to scale to large volumes of data.使用近似方法扩展到大量数据。

$median has the following characteristics as an aggregation expression, it:$median作为聚合表达式具有以下特征,它:

  • Accepts an array as input接受数组作为输入
  • Calculates a separate result for each input document为每个输入文档计算单独的结果

Type of Operation操作类型

In a $group stage, $median is an accumulator and calculates a value for all documents in the window.$group阶段,$median是一个累加器,用于计算窗口中所有文档的值。

In a $project stage, $median is an aggregation expression and calculates values for each document.$project阶段,$median是一个聚合表达式,用于计算每个文档的值。

In $setWindowFields stages, $median returns a result for each document like an aggregation expression, but the results are computed over groups of documents like an accumulator.$setWindowFields阶段,$median像聚合表达式一样返回每个文档的结果,但结果是像累加器一样在文档组上计算的。

Calculation Considerations计算注意事项

In $group stages, $median always uses an approximate calculation method.$group阶段,$median总是使用近似计算方法。

In $project stages, $median might use the discrete calculation method even when the approximate method is specified.在$项目阶段,即使指定了近似方法,$median也可能使用离散计算方法。

In $setWindowFields stages, the workload determines the calculation method that $median uses.$setWindowFields阶段,工作负载决定了$median使用的计算方法。

The computed percentiles $median returns might vary, even on the same datasets. This is because the algorithm calculates approximate values.即使在相同的数据集上,计算出的百分位数$中值回报也可能有所不同。这是因为该算法计算近似值。

Duplicate samples can cause ambiguity. If there are a large number of duplicates, the percentile values may not represent the actual sample distribution. Consider a data set where all the samples are the same. All of the values in the data set fall at or below any percentile. A "50th percentile" value would actually represent either 0 or 100 percent of the samples.重复的样本可能会导致歧义。如果存在大量重复项,百分位值可能无法代表实际的样本分布。考虑一个所有样本都相同的数据集。数据集中的所有值都处于或低于任何百分位数。“第50百分位”值实际上表示样本的0%或100%。

Array Input数组输入

If you use $median as an aggregation expression in a $project stage, you can use an array as input. $median ignores non-numeric array values.如果在$project阶段使用$median作为聚合表达式,则可以使用数组作为输入。$median忽略非数字数组值。

The syntax is:

{
$median:
{
input: [ <expression1, <expression2>, ..., <expressionN> ],
method: <string>
}
}

Window Functions窗口函数

A window function lets you calculate results over a moving "window" of neighboring documents. As each document passes though the pipeline, the $setWindowFields stage:窗口函数允许您在相邻文档的移动“窗口”上计算结果。当每个文档通过管道时,$setWindowFields阶段:

  • Recomputes the set of documents in the current window重新计算当前窗口中的文档集
  • calculates a value for all documents in the set计算集合中所有文档的值
  • returns a single value for that document为该文档返回一个值

You can use $median in a $setWindowFields stage to calculate rolling statistics for time series or other related data.您可以在$setWindowFields阶段使用$median来计算时间序列或其他相关数据的滚动统计。

When you use $median in a $setWindowField stage, the input value must be a field name. If you enter an array instead of a field name, the operation fails.$setWindowField阶段使用$median时,输入值必须是字段名。如果输入数组而不是字段名,则操作失败。

Examples示例

The following examples use the testScores collection. Create the collection:以下示例使用testScores集合。创建集合:

db.testScores.insertMany( [
{ studentId: "2345", test01: 62, test02: 81, test03: 80 },
{ studentId: "2356", test01: 60, test02: 83, test03: 79 },
{ studentId: "2358", test01: 67, test02: 82, test03: 78 },
{ studentId: "2367", test01: 64, test02: 72, test03: 77 },
{ studentId: "2369", test01: 60, test02: 53, test03: 72 }
] )

Use $median as an Accumulator使用$median作为累加器

Create an accumulator that calculates the median value:创建一个计算中值的累加器:

db.testScores.aggregate( [
{
$group: {
_id: null,
test01_median: {
$median: {
input: "$test01",
method: 'approximate'
}
}
}
}
] )

Output:输出:

{ _id: null, test01_median: 62 }

The _id field value is null so $group selects all the documents in the collection._id字段值为空,因此$group选择集合中的所有文档。

The $median accumulator takes its input from the test01 field. $median calculates the median value for the field, 62 in this example.$median累加器从test01字段获取输入。$median计算字段的中值,在本例中为62

Use $median in a $project Stage$project阶段使用$median

In a $group stage, $median is an accumulator and calculates a single value for all documents. In a $project stage, $median is an aggregation expression and calculates values for each document.$group阶段,$median是一个累加器,为所有文档计算一个值。在$project阶段,$median是一个聚合表达式,用于计算每个文档的值。

You can use a field name or an array as input in a $project stage.您可以在$project阶段使用字段名或数组作为输入。

db.testScores.aggregate( [
{
$project: {
_id: 0,
studentId: 1,
testMedians: {
$median: {
input: [ "$test01", "$test02", "$test03" ],
method: 'approximate'
}
}
}
}
] )

Output:输出:

{ studentId: '2345', testMedians: 80 },
{ studentId: '2356', testMedians: 79 },
{ studentId: '2358', testMedians: 78 },
{ studentId: '2367', testMedians: 72 },
{ studentId: '2369', testMedians: 60 }

When $median is an aggregation expression there is a result for each studentId.$median是一个聚合表达式时,每个studentId都有一个结果。

Use $median in a $setWindowField Stage$setWindowField阶段中使用$median

To base your percentile values on local data trends, use $median in a $setWindowField aggregation pipeline stage.要将百分位值基于本地数据趋势,请在$setWindowField聚合管道阶段使用$mediate

This example creates a window to filter scores:此示例创建了一个窗口来筛选分数:

db.testScores.aggregate( [
{
$setWindowFields: {
sortBy: { test01: 1 },
output: {
test01_median: {
$median: {
input: "$test01",
method: 'approximate'
},
window: {
range: [ -3, 3 ]
}
}
}
}
},
{
$project: {
_id: 0,
studentId: 1,
test01_median: 1
}
}
] )

Output:输出:

{ studentId: '2356', test01_median: 60 },
{ studentId: '2369', test01_median: 60 },
{ studentId: '2345', test01_median: 60 },
{ studentId: '2367', test01_median: 64 },
{ studentId: '2358', test01_median: 64 }

In this example, the median calculation for each document also incorporates data from the three documents before and after it.在这个例子中,每个文档的中值计算还包含了之前和之后三个文档的数据。

Learn More了解更多

The $percentile operator is a more general version of the $median operator that allows you to set one or more percentile values.$percentury运算符是$median运算符的更通用版本,允许您设置一个或多个百分位值。

For more information on window functions, see: $setWindowFields.有关窗口函数的更多信息,请参阅:$setWindowFields