Aggregation Pipeline Optimization聚合管道优化

On this page本页内容

Aggregation pipeline operations have an optimization phase which attempts to reshape the pipeline for improved performance.聚合管道操作有一个优化阶段,该阶段尝试重塑管道以提高性能。

To see how the optimizer transforms a particular aggregation pipeline, include the explain option in the db.collection.aggregate() method.要查看优化器如何转换特定聚合管道,请在db.collection.aggregate()方法中包含explain选项。

Optimizations are subject to change between releases.优化可能会在不同版本之间发生变化。

In addition to learning about the aggregation pipeline optimizations performed during the optimization phase, you will also see how to improve aggregation pipeline performance using indexes and document filters. 除了学习在优化阶段执行的聚合管道优化之外,您还将了解如何使用索引和文档筛选器提高聚合管道性能。See Improve Performance with Indexes and Document Filters.请参阅使用索引和文档筛选器提高性能

Projection Optimization投影优化

The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. 聚合管道可以确定是否只需要文档中字段的子集来获得结果。If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.如果是,管道将只使用那些必填字段,从而减少通过管道的数据量。

Pipeline Sequence Optimization管道序列优化

($project or $unset or $addFields or $set) + $match Sequence Optimization序列优化

For an aggregation pipeline that contains a projection stage ($project or $unset or $addFields or $set) followed by a $match stage, MongoDB moves any filters in the $match stage that do not require values computed in the projection stage to a new $match stage before the projection.对于包含投影阶段($project$unset$addFields$set)后接$match阶段的聚合管道,MongoDB将$match阶段中不需要在投影阶段中计算值的任何筛选器移动到投影之前的新$match

If an aggregation pipeline contains multiple projection and/or $match stages, MongoDB performs this optimization for each $match stage, moving each $match filter before all projection stages that the filter does not depend on.如果聚合管道包含多个投影和/或$match阶段,MongoDB会对每个$match步骤执行此优化,将每个$match筛选器移动到筛选器不依赖的所有投影阶段之前。

Consider a pipeline of the following stages:考虑以下阶段的管道:

{ $addFields: {
    maxTime: { $max: "$times" },
    minTime: { $min: "$times" }
} },
{ $project: {
    _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
    avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: {
    name: "Joe Schmoe",
    maxTime: { $lt: 20 },
    minTime: { $gt: 5 },
    avgTime: { $gt: 7 }
} }

The optimizer breaks up the $match stage into four individual filters, one for each key in the $match query document. 优化器将$match阶段分解为四个单独的筛选器,每个筛选器用于$match查询文档中的每个键。The optimizer then moves each filter before as many projection stages as possible, creating new $match stages as needed. 然后优化器将每个筛选器移动到尽可能多的投影阶段之前,根据需要创建新的$match阶段。Given this example, the optimizer produces the following optimizedpipeline:给定此示例,优化器生成以下优化的管道:

{ $match: { name: "Joe Schmoe" } },
{ $addFields: {
    maxTime: { $max: "$times" },
    minTime: { $min: "$times" }
} },
{ $match: { maxTime: { $lt: 20 }, minTime: { $gt: 5 } } },
{ $project: {
    _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
    avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: { avgTime: { $gt: 7 } } }

The $match filter { avgTime: { $gt: 7 } } depends on the $project stage to compute the avgTime field. $match筛选器{ avgTime: { $gt: 7 } }依赖$project阶段来计算avgTime字段。The $project stage is the last projection stage in this pipeline, so the $match filter on avgTime could not be moved.$project阶段是该管道中的最后一个投影阶段,因此无法移动avgTime上的$match筛选器。

The maxTime and minTime fields are computed in the $addFields stage but have no dependency on the $project stage. maxTimeminTime字段在$addFields阶段中计算,但不依赖于$project阶段。The optimizer created a new $match stage for the filters on these fields and placed it before the $project stage.优化器为这些字段上的筛选器创建了一个新的$match阶段,并将其放在$project阶段之前。

The $match filter { name: "Joe Schmoe" } does not use any values computed in either the $project or $addFields stages so it was moved to a new $match stage before both of the projection stages.$match筛选器{ name: "Joe Schmoe" }不使用在$project$addFields阶段中计算的任何值,因此它被移动到两个投影阶段之前的新$match阶段。

Note注意

After optimization, the filter { name: "Joe Schmoe" } is in a $match stage at the beginning of the pipeline. 优化后,筛选器{ name: "Joe Schmoe" }处于管道开始的$match阶段。This has the added benefit of allowing the aggregation to use an index on the name field when initially querying the collection. 这还有一个额外的好处,即允许聚合在最初查询集合时使用name字段上的索引。See Improve Performance with Indexes and Document Filters for more information.有关更多信息,请参阅使用索引和文档筛选器提高性能

$sort + $match Sequence Optimization序列优化

When you have a sequence with $sort followed by a $match, the $match moves before the $sort to minimize the number of objects to sort. 如果序列的$sort后面跟$match,则$match会移动到$sort之前,以最小化要排序的对象数量。For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:

{ $sort: { age : -1 } },
{ $match: { status: 'A' } }

During the optimization phase, the optimizer transforms the sequence to the following:在优化阶段,优化器将序列转换为以下内容:

{ $match: { status: 'A' } },
{ $sort: { age : -1 } }

$redact + $match Sequence Optimization序列优化

When possible, when the pipeline has the $redact stage immediately followed by the $match stage, the aggregation can sometimes add a portion of the $match stage before the $redact stage. 如果可能,当管道的$redact阶段紧接着$match阶段时,聚合有时可以将$match部分添加到$redact之前。If the added $match stage is at the start of a pipeline, the aggregation can use an index as well as query the collection to limit the number of documents that enter the pipeline. 如果添加的$match阶段位于管道的开始,则聚合可以使用索引以及查询集合来限制进入管道的文档数量。See Improve Performance with Indexes and Document Filters for more information.有关更多信息,请参阅使用索引和文档筛选器提高性能

For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:

{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }

The optimizer can add the same $match stage before the $redact stage:优化器可以在$redact阶段之前添加相同的$match阶段:

{ $match: { year: 2014 } },
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }

$project/$unset + $skip Sequence Optimization序列优化

When you have a sequence with $project or $unset followed by $skip, the $skip moves before $project. For example, if the pipeline consists of the following stages:如果序列中有$project$unset后跟$skip,则$skip将移到$project之前。例如,如果管道由以下阶段组成:

{ $sort: { age : -1 } },
{ $project: { status: 1, name: 1 } },
{ $skip: 5 }

During the optimization phase, the optimizer transforms the sequence to the following:在优化阶段,优化器将序列转换为以下内容:

{ $sort: { age : -1 } },
{ $skip: 5 },
{ $project: { status: 1, name: 1 } }

Pipeline Coalescence Optimization管道聚结优化

When possible, the optimization phase coalesces a pipeline stage into its predecessor. 如果可能,优化阶段将管道阶段合并到其前一阶段。Generally, coalescence occurs after any sequence reordering optimization.通常,合并发生在任何序列重新排序优化之后

$sort + $limit Coalescence合并

Changed in version 4.0.在版本4.0中更改

When a $sort precedes a $limit, the optimizer can coalesce the $limit into the $sort if no intervening stages modify the number of documents (e.g. $unwind, $group). $limit之前时,如果没有中间阶段修改文档的数量(例如,$unwind$group),优化器可以将$limit合并到$sort中。MongoDB will not coalesce the $limit into the $sort if there are pipeline stages that change the number of documents between the $sort and $limit stages..$sort如果有管道阶段改变$sort$limit阶段之间的文档数量,MongoDB将不会将$limit合并到$sort中。

For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:

{ $sort : { age : -1 } },
{ $project : { age : 1, status : 1, name : 1 } },
{ $limit: 5 }

During the optimization phase, the optimizer coalesces the sequence to the following:在优化阶段,优化器将序列合并为以下内容:

{
    "$sort" : {
       "sortKey" : {
          "age" : -1
       },
       "limit" : NumberLong(5)
    }
},
{ "$project" : {
         "age" : 1,
         "status" : 1,
         "name" : 1
  }
}

This allows the sort operation to only maintain the top n results as it progresses, where n is the specified limit, and MongoDB only needs to store n items in memory [1]. 这允许排序操作在进行时只保留前n个结果,其中n是指定的限制,MongoDB只需要在内存中存储n个项[1]See $sort Operator and Memory for more information.有关详细信息,请参阅$sort运算符和内存

Note注意
Sequence Optimization with $skip使用$skip进行序列优化

If there is a $skip stage between the $sort and $limit stages, MongoDB will coalesce the $limit into the $sort stage and increase the $limit value by the $skip amount. 如果$sort$limit阶段之间有$skip阶段,MongoDB会将$limit合并到$sort阶段,并将$limit值增加$skip总量。See $sort + $skip + $limit Sequence for an example.有关示例,请参阅$sort+$skip+$limit序列

[1] The optimization will still apply when allowDiskUse is true and the n items exceed the aggregation memory limit.allowDiskUsetruen项超过聚合内存限制时,优化仍将适用。

$limit + $limit Coalescence合并

When a $limit immediately follows another $limit, the two stages can coalesce into a single $limit where the limit amount is the smaller of the two initial limit amounts. 当一个$limit紧随另一个$limit之后时,两个阶段可以合并为一个$limit,其中限制总量为两个初始限制总量中的较小者For example, a pipeline contains the following sequence:例如,管道包含以下序列:

{ $limit: 100 },
{ $limit: 10 }

Then the second $limit stage can coalesce into the first $limit stage and result in a single $limit stage where the limit amount 10 is the minimum of the two initial limits 100 and 10.则第二个$limit阶段可以合并到第一个$limit阶段中并产生单个$limit阶段,其中限量10是两个初始限量10010中的最小值。

{ $limit: 10 }

$skip + $skip Coalescence合并

When a $skip immediately follows another $skip, the two stages can coalesce into a single $skip where the skip amount is the sum of the two initial skip amounts. 当一个$skip紧接着另一个$skip时,这两个阶段可以合并为单个$skip,其中跳过量是两个初始跳过量之For example, a pipeline contains the following sequence:例如,管道包含以下序列:

{ $skip: 5 },
{ $skip: 2 }

Then the second $skip stage can coalesce into the first $skip stage and result in a single $skip stage where the skip amount 7 is the sum of the two initial limits 5 and 2.则第二个$skip阶段可以合并到第一个$skip阶段中,并产生单个$skip阶段。

{ $skip: 7 }

$match + $match Coalescence合并

When a $match immediately follows another $match, the two stages can coalesce into a single $match combining the conditions with an $and. 当一个$match紧接着另一个$match时,这两个阶段可以合并成一个$match,将条件与$and结合起来。For example, a pipeline contains the following sequence:例如,管道包含以下序列:

{ $match: { year: 2014 } },
{ $match: { status: "A" } }

Then the second $match stage can coalesce into the first $match stage and result in a single $match stage然后,第二个$match阶段可以合并到第一个$match阶段中,并生成一个$match阶段

{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }

$lookup + $unwind Coalescence合并

When a $unwind immediately follows another $lookup, and the $unwind operates on the as field of the $lookup, the optimizer can coalesce the $unwind into the $lookup stage. $unwind紧跟在另一个$lookup之后,并且$unwind操作$lookupas字段时,优化器可以将$unwind合并到$lookup阶段中。This avoids creating large intermediate documents.这避免了创建大型中间文档。

For example, a pipeline contains the following sequence:例如,管道包含以下序列:

{
  $lookup: {
    from: "otherCollection",
    as: "resultingArray",
    localField: "x",
    foreignField: "y"
  }
},
{ $unwind: "$resultingArray"}

The optimizer can coalesce the $unwind stage into the $lookup stage. 优化器可以将$unwind阶段合并到$lookup阶段。If you run the aggregation with explain option, the explain output shows the coalesced stage:如果使用explain选项运行聚合,explain输出将显示合并阶段:

{
  $lookup: {
    from: "otherCollection",
    as: "resultingArray",
    localField: "x",
    foreignField: "y",
    unwinding: { preserveNullAndEmptyArrays: false }
  }
}

$group Optimization优化

New in version 5.2.在版本5.2中新增

Starting in MongoDB 5.2, MongoDB uses the slot-based execution query engine to execute $group stages when $group is either:从MongoDB 5.2开始,MongoDB使用基于时隙的执行查询引擎来执行$group阶段,当$group为:

  • The first stage in the pipeline.管道中的第一阶段。
  • Part of a series of stages executed by the slot-based engine that occurs at the beginning of the pipeline. 由基于时隙的引擎执行的一系列阶段的一部分,发生在管道的开始。For example, if a pipeline begins with $match followed by $group, the $match and $group stages are executed by the slot-based engine.例如,如果管道以$match开头,后跟$group,则$match$group阶段由基于时隙的引擎执行。

In most cases, the slot-based engine provides improved performance and lower CPU and memory costs compared to the classic query engine.在大多数情况下,与经典查询引擎相比,基于时隙的引擎提供了更好的性能和更低的CPU和内存成本。

To verify that the slot-based engine is used, run the aggregation with the .explain() option. 要验证是否使用了基于插槽的引擎,请使用explain()选项运行聚合。This option outputs information on the aggregation's query plan.此选项输出有关聚合查询计划的信息。

When the slot-based query execution engine is used for $group, the explain results include:当基于时隙的查询执行引擎用于$group时,解释结果包括:

  • explain.explainVersion: '2'
  • explain.queryPlanner.winningPlan.queryPlan.stage: "GROUP"

Improve Performance with Indexes and Document Filters使用索引和文档筛选器提高性能

The following sections show how you can improve aggregation performance using indexes and document filters.以下各节介绍如何使用索引和文档筛选器提高聚合性能。

Indexes索引

The query planner analyzes an aggregation pipeline to determine if indexes can be used to improve pipeline performance.查询规划器分析聚合管道,以确定是否可以使用索引来提高管道性能。

The following list shows some pipeline stages that can use indexes:以下列表显示了一些可以使用索引的管道阶段:

$match stage阶段
$match can use an index to filter documents if $match is the first stage in a pipeline.如果$match是管道中的第一阶段,则$match可以使用索引筛选文档。
$sort stage阶段
$sort can use an index if $sort is not preceded by a $project, $unwind, or $group stage.如果$sort前面没有$project$unwind$group阶段,则$sort可以使用索引。
$group stage阶段

$group can potentially use an index to find the first document in each group if:在以下情况下,$group可以使用索引查找每个组中的第一个文档:

  • $group is preceded by $sort that sorts the field to group by, and前面是$sort,用于对要分组的字段进行排序,以及
  • there is an index on the grouped field that matches the sort order, and分组字段上有一个索引与排序顺序匹配,并且
  • $first is the only accumulator in $group.$first$group中唯一的累加器。

See $group Performance Optimizations for an example.有关示例,请参阅$group性能优化

$geoNear stage阶段
$geoNear can use a geospatial index. 可以使用地理空间索引。$geoNear must be the first stage in an aggregation pipeline.必须是聚合管道中的第一阶段。

Starting in MongoDB 4.2, in some cases, an aggregation pipeline can use a DISTINCT_SCAN index plan, which typically has higher performance than IXSCAN.从MongoDB 4.2开始,在某些情况下,聚合管道可以使用DISTINCT_SCAN索引计划,这通常比IXSCAN具有更高的性能。

Indexes can cover queries in an aggregation pipeline. 索引可以覆盖聚合管道中的查询。A covered query uses an index to return all of the documents and has high performance.覆盖查询使用索引返回所有文档,具有高性能。

Document Filters文档筛选器

If your aggregation operation requires only a subset of the documents in a collection, filter the documents first:如果聚合操作只需要集合中文档的子集,请先筛选文档:

  • Use the $match, $limit, and $skip stages to restrict the documents that enter the pipeline.使用$match$limit$skip阶段限制进入管道的文档。
  • When possible, put $match at the beginning of the pipeline to use indexes that scan the matching documents in a collection.如果可能,将$match放在管道的开头,以使用索引扫描集合中的匹配文档。
  • $match followed by $sort at the start of the pipeline is equivalent to a single query with a sort, and can use an index.管道开始处的$match后跟$sort相当于一个带有排序的查询,并且可以使用索引。

Example示例

$sort + $skip + $limit Sequence序列

A pipeline contains a sequence of $sort followed by a $skip followed by a $limit:管道包含一个$sort序列,后面跟着$skip,后面跟着一个$limit

{ $sort: { age : -1 } },
{ $skip: 10 },
{ $limit: 5 }

The optimizer performs $sort + $limit Coalescence to transforms the sequence to the following:优化器执行$sort+$limit合并,将序列转换为以下内容:

{
   "$sort" : {
      "sortKey" : {
         "age" : -1
      },
      "limit" : NumberLong(15)
   }
},
{
   "$skip" : NumberLong(10)
}

MongoDB increases the $limit amount with the reordering.MongoDB通过重新排序增加了$limit总量。

Tip提示
See also: 参阅:
←  Aggregation PipelineAggregation Pipeline Limits →