Docs HomeMongoDB Manual

$median (aggregation)

Definition定义

$median

New in version 7.0. 7.0版新增。

Returns an approximation of the median, the 50th percentile, as a scalar value.以标量值的形式返回中值(第50个百分位数)的近似值。

You can use $median as an accumulator in the $group stage or as an aggegation expression.您可以将$median用作$group阶段中的累加器聚合表达式

Syntax语法

The syntax for $median is:$median的语法为:

{
$median: {
input: <number>,
method: <string>
}
}

Command Fields命令字段

$median takes the following fields:采用以下字段:

Field字段Type类型Necessity必要性Description描述
inputExpressionRequired必要的$median calculates the 50th percentile value of this data. 计算该数据的第50百分位值。input must be a field name or an expression that evaluates to a numeric type. 必须是字段名或计算结果为数字类型的表达式。If the expression cannot be converted to a numeric type, the $median calculation ignores it.如果表达式无法转换为数字类型,则$median计算会忽略它。
methodStringRequired必要的The method that mongod uses to calculate the 50th percentile value. mongod用于计算第50个百分位数的方法。The method must be 'approximate'.该方法必须是“近似的”。

Behavior行为

You can use $median in:您可以在以下位置使用$median

  • $group stages as an accumulator阶段中,作为累加器
  • $setWindowFields stages as an accumulator阶段中,作为累加器
  • $project stages as an aggregation expression阶段中,作为聚合表达式

$median has the following characteristics as an accumulator, it:作为蓄能器,它具有以下特性:

  • Calculates a single result for all the documents in the stage.计算阶段中所有文档的单个结果。
  • Uses the t-digest algorithm to calculate approximate, percentile based metrics.使用t-摘要算法计算基于百分比的近似度量。
  • Uses approximate methods to scale to large volumes of data.使用近似方法扩展到大量数据。

$median has the following characteristics as an aggregation expression, it:作为聚合表达式,它具有以下特性:

  • Accepts an array as input接受数组作为输入
  • Calculates a separate result for each input document为每个输入文档计算单独的结果

Type of Operation操作类型

In a $group stage, $median is an accumulator and calculates a value for all documents in the window.$group阶段,$median是一个累加器,用于计算窗口中所有文档的值。

In a $project stage, $median is an aggregation expression and calculates values for each document.$project阶段,$median是一个聚合表达式,用于计算每个文档的值。

In $setWindowFields stages, $median returns a result for each document like an aggregation expression, but the results are computed over groups of documents like an accumulator.$setWindowFields阶段中,$median为每个文档返回一个结果,就像聚合表达式一样,但结果是在文档组上计算的,就像累加器一样。

Calculation Considerations计算注意事项

In $group stages, $median always uses an approximate calculation method.$group阶段,$median始终使用近似计算方法。

In $project stages, $median might use the discrete calculation method even when the approximate method is specified.$project阶段,即使指定了近似方法,$median也可能使用离散计算方法。

In $setWindowFields stages, the workload determines the calculation method that $median uses.$setWindowFields阶段,工作负载决定$median使用的计算方法。

The computed percentiles $median returns might vary, even on the same datasets. This is because the algorithm calculates approximate values.即使在相同的数据集上,计算的百分位数$median回报率也可能有所不同。这是因为该算法计算近似值。

Duplicate samples can cause ambiguity. 重复的样本可能导致歧义。If there are a large number of duplicates, the percentile values may not represent the actual sample distribution. 如果存在大量重复,则百分位数可能无法代表实际的样本分布。Consider a data set where all the samples are the same. 考虑一个所有样本都相同的数据集。All of the values in the data set fall at or below any percentile. 数据集中的所有值都位于或低于任何百分位数。A "50th percentile" value would actually represent either 0 or 100 percent of the samples.“第50个百分位”的值实际上代表0或100%的样本。

Array Input数组输入

If you use $median as an aggregation expression in a $project stage, you can use an array as input. 如果在$project阶段中使用$median作为聚合表达式,则可以使用数组作为输入。$median ignores non-numeric array values.忽略非数字数组值。

The syntax is:语法为:

{
$median:
{
input: [ <expression1, <expression2>, ..., <expressionN> ],
method: <string>
}
}

Window Functions窗口函数

A window function lets you calculate results over a moving "window" of neighboring documents. 窗口函数使您可以在相邻文档的移动“窗口”上计算结果。As each document passes though the pipeline, the $setWindowFields stage:当每个文档通过管道时,$setWindowFields阶段:

  • Recomputes the set of documents in the current window在当前窗口中重新计算文档集
  • calculates a value for all documents in the set计算集合中所有文档的值
  • returns a single value for that document返回该文档的单个值

You can use $median in a $setWindowFields stage to calculate rolling statistics for time series or other related data.您可以在$setWindowFields阶段中使用$median来计算时间序列或其他相关数据的滚动统计信息。

When you use $median in a $setWindowField stage, the input value must be a field name. If you enter an array instead of a field name, the operation fails.$setWindowField阶段中使用$median时,input值必须是字段名。如果输入的是数组而不是字段名,则操作将失败。

Examples实例

The following examples use the testScores collection. 以下示例使用testScores集合。Create the collection:创建集合:

db.testScores.insertMany( [
{ studentId: "2345", test01: 62, test02: 81, test03: 80 },
{ studentId: "2356", test01: 60, test02: 83, test03: 79 },
{ studentId: "2358", test01: 67, test02: 82, test03: 78 },
{ studentId: "2367", test01: 64, test02: 72, test03: 77 },
{ studentId: "2369", test01: 60, test02: 53, test03: 72 }
] )

Use $median as an Accumulator使用$median作为累加器

Create an accumulator that calculates the median value:创建一个计算中值的累加器:

db.testScores.aggregate( [
{
$group: {
_id: null,
test01_median: {
$median: {
input: "$test01",
method: 'approximate'
}
}
}
}
] )

Output:输出:

{ _id: null, test01_median: 62 }

The _id field value is null so $group selects all the documents in the collection._id字段值为null,因此$group选择集合中的所有文档。

The $median accumulator takes its input from the test01 field. $median calculates the median value for the field, 62 in this example.$median累加器的输入来自test01字段$median计算字段的中值,在本例中为62

Use $median in a $project Stage$project阶段中使用$median

In a $group stage, $median is an accumulator and calculates a single value for all documents. $group阶段,$median是一个累加器,为所有文档计算一个值。In a $project stage, $median is an aggregation expression and calculates values for each document.$project阶段,$median是一个聚合表达式,用于计算每个文档的值。

You can use a field name or an array as input in a $project stage.您可以使用字段名或数组作为$project阶段中的输入。

db.testScores.aggregate( [
{
$project: {
_id: 0,
studentId: 1,
testMedians: {
$median: {
input: [ "$test01", "$test02", "$test03" ],
method: 'approximate'
}
}
}
}
] )

Output:输出:

{ studentId: '2345', testMedians: 80 },
{ studentId: '2356', testMedians: 79 },
{ studentId: '2358', testMedians: 78 },
{ studentId: '2367', testMedians: 72 },
{ studentId: '2369', testMedians: 60 }

When $median is an aggregation expression there is a result for each studentId.$median是一个聚合表达式时,每个studentId都有一个结果。

Use $median in a $setWindowField Stage$setWindowField阶段中使用$median

To base your percentile values on local data trends, use $median in a $setWindowField aggregation pipeline stage.要根据本地数据趋势确定百分位数,请在$setWindowField聚合管道阶段中使用$median

This example creates a window to filter scores:本例创建了一个筛选分数的窗口:

db.testScores.aggregate( [
{
$setWindowFields: {
sortBy: { test01: 1 },
output: {
test01_median: {
$median: {
input: "$test01",
method: 'approximate'
},
window: {
range: [ -3, 3 ]
}
}
}
}
},
{
$project: {
_id: 0,
studentId: 1,
test01_median: 1
}
}
] )

Output:输出:

{ studentId: '2356', test01_median: 60 },
{ studentId: '2369', test01_median: 60 },
{ studentId: '2345', test01_median: 60 },
{ studentId: '2367', test01_median: 64 },
{ studentId: '2358', test01_median: 64 }

In this example, the median calculation for each document also incorporates data from the three documents before and after it.在本例中,每个文档的中值计算还包含了之前和之后三个文档的数据。

Learn More了解更多信息

The $percentile operator is a more general version of the $median operator that allows you to set one or more percentile values.$percentile运算符是$median运算符的更通用版本,允许您设置一个或多个百分位数。

For more information on window functions, see: $setWindowFields.有关窗口函数的详细信息,请参阅:$setWindowFields