Docs HomeMongoDB Manual

Aggregation Pipeline Optimization聚合管道优化

Aggregation pipeline operations have an optimization phase which attempts to reshape the pipeline for improved performance.聚合管道操作具有优化阶段,该阶段试图重塑管道以提高性能。

To see how the optimizer transforms a particular aggregation pipeline, include the explain option in the db.collection.aggregate() method.要查看优化器如何转换特定的聚合管道,请在db.collection.aggregate()方法中包含explain选项。

Optimizations are subject to change between releases.优化可能会在不同版本之间发生变化。

In addition to learning about the aggregation pipeline optimizations performed during the optimization phase, you will also see how to improve aggregation pipeline performance using indexes and document filters.除了了解在优化阶段执行的聚合管道优化外,您还将了解如何使用索引和文档筛选器来提高聚合管道性能。

Projection Optimization投影优化

The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. 聚合管道可以确定是否只需要文档中字段的子集即可获得结果。If so, the pipeline only uses those fields, reducing the amount of data passing through the pipeline.如果是这样,管道只使用这些字段,从而减少了通过管道的数据量。

$project Stage Placement阶段布置

When you use a $project stage it should typically be the last stage in your pipeline, used to specify which fields to return to the client.当您使用$project阶段时,它通常应该是管道中的最后一个阶段,用于指定要返回给客户端的字段。

Using a $project stage at the beginning or middle of a pipeline to reduce the number of fields passed to subsequent pipeline stages is unlikely to improve performance, as the database performs this optimization automatically.在管道的开始或中间使用$project阶段来减少传递到后续管道阶段的字段数量不太可能提高性能,因为数据库会自动执行此优化。

Pipeline Sequence Optimization管道顺序优化

($project or $unset or $addFields or $set) + $match Sequence Optimization序列优化

For an aggregation pipeline that contains a projection stage ($project or $unset or $addFields or $set) followed by a $match stage, MongoDB moves any filters in the $match stage that do not require values computed in the projection stage to a new $match stage before the projection.对于包含一个投影阶段($project$unset$addFields$set)和一个$match阶段的聚合管道,MongoDB会在投影之前将$match中不需要在投影阶段计算值的任何筛选器移动到一个新的$match阶段。

If an aggregation pipeline contains multiple projection and/or $match stages, MongoDB performs this optimization for each $match stage, moving each $match filter before all projection stages that the filter does not depend on.如果聚合管道包含多个投影和/或$match阶段,MongoDB会对每个$match步骤执行此优化,将每个$match筛选器移到该筛选器不依赖的所有投影阶段之前。

Consider a pipeline of the following stages:考虑以下阶段的管道:

{ $addFields: {
maxTime: { $max: "$times" },
minTime: { $min: "$times" }
} },
{ $project: {
_id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: {
name: "Joe Schmoe",
maxTime: { $lt: 20 },
minTime: { $gt: 5 },
avgTime: { $gt: 7 }
} }

The optimizer breaks up the $match stage into four individual filters, one for each key in the $match query document. The optimizer then moves each filter before as many projection stages as possible, creating new $match stages as needed. 优化器将$match阶段分解为四个单独的筛选器,$match查询文档中的每个键对应一个筛选器。然后,优化器将每个筛选器移动到尽可能多的投影阶段之前,根据需要创建新的$match阶段。Given this example, the optimizer produces the following optimized pipeline:给定此示例,优化器将生成以下优化的管道:

{ $match: { name: "Joe Schmoe" } },
{ $addFields: {
maxTime: { $max: "$times" },
minTime: { $min: "$times" }
} },
{ $match: { maxTime: { $lt: 20 }, minTime: { $gt: 5 } } },
{ $project: {
_id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: { avgTime: { $gt: 7 } } }

The $match filter { avgTime: { $gt: 7 } } depends on the $project stage to compute the avgTime field. $match筛选器{ avgTime: { $gt: 7 } }依赖于$project阶段来计算avgTime字段。The $project stage is the last projection stage in this pipeline, so the $match filter on avgTime could not be moved.$project阶段是此管道中的最后一个投影阶段,因此无法移动avgTime上的$match筛选器。

The maxTime and minTime fields are computed in the $addFields stage but have no dependency on the $project stage. maxTimeminTime字段在$addFields阶段中计算,但不依赖于$project阶段。The optimizer created a new $match stage for the filters on these fields and placed it before the $project stage.优化器为这些字段上的筛选器创建了一个新的$match阶段,并将其放置在 $project阶段之前。

The $match filter { name: "Joe Schmoe" } does not use any values computed in either the $project or $addFields stages so it was moved to a new $match stage before both of the projection stages.$match筛选器{ name: "Joe Schmoe" }不使用在$project$addFields阶段中计算的任何值,因此在这两个投影阶段之前,它被移到了一个新的$match阶段。

Note

After optimization, the filter { name: "Joe Schmoe" } is in a $match stage at the beginning of the pipeline. This has the added benefit of allowing the aggregation to use an index on the name field when initially querying the collection. See Improve Performance with Indexes and Document Filters for more information.优化后,筛选器{ name: "Joe Schmoe" }处于管道开始处的$match阶段。这还有一个额外的好处,即允许聚合在最初查询集合时对name字段使用索引。有关详细信息,请参阅 使用索引和文档筛选器提高性能

$sort + $match Sequence Optimization序列优化

When you have a sequence with $sort followed by a $match, the $match moves before the $sort to minimize the number of objects to sort. For example, if the pipeline consists of the following stages:如果序列的$sort后跟$match,则$match会移动到$sort之前,以最大限度地减少要排序的对象数。例如,如果管道由以下阶段组成:

{ $sort: { age : -1 } },
{ $match: { status: 'A' } }

During the optimization phase, the optimizer transforms the sequence to the following:在优化阶段,优化器将序列转换为以下内容:

{ $match: { status: 'A' } },
{ $sort: { age : -1 } }

$redact + $match Sequence Optimization序列优化

When possible, when the pipeline has the $redact stage immediately followed by the $match stage, the aggregation can sometimes add a portion of the $match stage before the $redact stage. 在可能的情况下,当管道的$redact阶段紧接着$match阶段时,聚合有时可以在$redact阶段之前添加$match阶段的一部分。If the added $match stage is at the start of a pipeline, the aggregation can use an index as well as query the collection to limit the number of documents that enter the pipeline. 如果添加的$match阶段位于管道的开头,则聚合可以使用索引以及查询集合来限制进入管道的文档数量。See Improve Performance with Indexes and Document Filters for more information.有关详细信息,请参阅使用索引和文档筛选器提高性能

For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:

{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }

The optimizer can add the same $match stage before the $redact stage:优化器可以在$redact阶段之前添加相同的$match阶段:

{ $match: { year: 2014 } },
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }

$project/$unset + $skip Sequence Optimization序列优化

When you have a sequence with $project or $unset followed by $skip, the $skip moves before $project. 当您有一个$project$unset后跟$skip的序列时,$skip会移动到$project之前。For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:

{ $sort: { age : -1 } },
{ $project: { status: 1, name: 1 } },
{ $skip: 5 }

During the optimization phase, the optimizer transforms the sequence to the following:在优化阶段,优化器将序列转换为以下内容:

{ $sort: { age : -1 } },
{ $skip: 5 },
{ $project: { status: 1, name: 1 } }

Pipeline Coalescence Optimization管道聚结优化

When possible, the optimization phase coalesces a pipeline stage into its predecessor. 在可能的情况下,优化阶段将管道阶段合并为其前一阶段。Generally, coalescence occurs after any sequence reordering optimization.通常,合并发生在任何序列重新排序优化之后。

$sort + $limit Coalescence聚结

Changed in version 4.04.0版更改.

When a $sort precedes a $limit, the optimizer can coalesce the $limit into the $sort if no intervening stages modify the number of documents (e.g. $unwind, $group). MongoDB will not coalesce the $limit into the $sort if there are pipeline stages that change the number of documents between the $sort and $limit stages.$sort$limit之前时,如果没有中间阶段修改文档数量(例如$unwind$group),优化器可以将$limit合并为$sort。如果存在在$sort$limit阶段之间更改文档数量的管道阶段,MongoDB将不会将$limit合并为$sort

For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:

{ $sort : { age : -1 } },
{ $project : { age : 1, status : 1, name : 1 } },
{ $limit: 5 }

During the optimization phase, the optimizer coalesces the sequence to the following:在优化阶段,优化器将序列合并为以下内容:

{
"$sort" : {
"sortKey" : {
"age" : -1
},
"limit" : NumberLong(5)
}
},
{ "$project" : {
"age" : 1,
"status" : 1,
"name" : 1
}
}

This allows the sort operation to only maintain the top n results as it progresses, where n is the specified limit, and MongoDB only needs to store n items in memory [1]. 这允许排序操作在进行过程中只维护前n个结果,其中n是指定的限制,MongoDB只需要在内存中存储n个项[1]See $sort Operator and Memory for more information.有关详细信息,请参阅$sort运算符和内存

Note

Sequence Optimization with $skip$skip的序列优化

If there is a $skip stage between the $sort and $limit stages, MongoDB will coalesce the $limit into the $sort stage and increase the $limit value by the $skip amount. 如果$sort$limit阶段之间有一个$skip阶段,MongoDB将把$limit合并到$sort阶段,并将$limit值增加$skip值。See $sort + $skip + $limit Sequence for an example.有关示例,请参阅$sort+$skip+$limit序列

[1] The optimization will still apply when allowDiskUse is true and the n items exceed the aggregation memory limit.allowDiskUsetrue并且n个项目超过聚合内存限制时,优化仍然适用。

$limit + $limit Coalescence聚结

When a $limit immediately follows another $limit, the two stages can coalesce into a single $limit where the limit amount is the smaller of the two initial limit amounts. 当一个$limit紧跟在另一个$limit之后时,这两个阶段可以合并为一个单一的$limit,其中限制量是两个初始限制量中较小的一个。For example, a pipeline contains the following sequence:例如,管道包含以下序列:

{ $limit: 100 },
{ $limit: 10 }

Then the second $limit stage can coalesce into the first $limit stage and result in a single $limit stage where the limit amount 10 is the minimum of the two initial limits 100 and 10.然后第二$limit阶段可以合并为第一$limit阶段,并导致单个$limit阶段,其中限定量10是两个初始限定量10010中的最小值。

{ $limit: 10 }

$skip + $skip Coalescence聚结

When a $skip immediately follows another $skip, the two stages can coalesce into a single $skip where the skip amount is the sum of the two initial skip amounts. For example, a pipeline contains the following sequence:$skip紧跟在另一个$skip之后时,这两个阶段可以合并为一个$skip,其中跳过量是两个初始跳过量的总和。例如,管道包含以下序列:

{ $skip: 5 },
{ $skip: 2 }

Then the second $skip stage can coalesce into the first $skip stage and result in a single $skip stage where the skip amount 7 is the sum of the two initial limits 5 and 2.然后第二个$skip阶段可以合并为第一个$skip阶段,并导致单个$skip阶段,其中跳过量7是两个初始限定量52的总和。

{ $skip: 7 }

$match + $match Coalescence聚结

When a $match immediately follows another $match, the two stages can coalesce into a single $match combining the conditions with an $and. $match紧跟在另一个$match之后时,这两个阶段可以合并为单个$match,将条件与$and组合在一起。For example, a pipeline contains the following sequence:例如,管道包含以下序列:

{ $match: { year: 2014 } },
{ $match: { status: "A" } }

Then the second $match stage can coalesce into the first $match stage and result in a single $match stage然后,第二个$match阶段可以合并为第一个$match阶段,并产生一个$match阶段。

{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }

$lookup + $unwind Coalescence聚结

When a $unwind immediately follows another $lookup, and the $unwind operates on the as field of the $lookup, the optimizer can coalesce the $unwind into the $lookup stage. 当一个$unwind紧跟在另一个$lookup之后,并且$unwind$lookupas字段上操作时,优化器可以将$unwind合并到$lookup阶段。This avoids creating large intermediate documents.这样可以避免创建大型中间文档。

For example, a pipeline contains the following sequence:例如,管道包含以下序列:

{
$lookup: {
from: "otherCollection",
as: "resultingArray",
localField: "x",
foreignField: "y"
}
},
{ $unwind: "$resultingArray"}

The optimizer can coalesce the $unwind stage into the $lookup stage. 优化器可以将$unwind阶段合并为$lookup阶段。If you run the aggregation with explain option, the explain output shows the coalesced stage:如果使用explain选项运行聚合,则explain输出将显示合并阶段:

{
$lookup: {
from: "otherCollection",
as: "resultingArray",
localField: "x",
foreignField: "y",
unwinding: { preserveNullAndEmptyArrays: false }
}
}

Slot-Based Query Execution Engine Pipeline Optimizations基于槽的查询执行引擎管道优化

MongoDB can use the slot-based query execution engine to execute certain pipeline stages when specific conditions are met. In most cases, the slot-based execution engine provides improved performance and lower CPU and memory costs compared to the classic query engine.当满足特定条件时,MongoDB可以使用基于slot的查询执行引擎来执行某些管道阶段。在大多数情况下,与传统的查询引擎相比,基于插槽的执行引擎提供了改进的性能以及更低的CPU和内存成本。

To verify that the slot-based execution engine is used, run the aggregation with the explain option. This option outputs information on the aggregation's query plan. 要验证是否使用了基于插槽的执行引擎,请使用explain选项运行聚合。此选项输出有关聚合的查询计划的信息。For more information on using explain with aggregations, see Return Information on Aggregation Pipeline Operation.有关在聚合中使用explain的详细信息,请参阅返回聚合管道操作的信息

The following sections describe:以下各节介绍:

  • The conditions when the slot-based execution engine is used for aggregation.基于插槽的执行引擎用于聚合时的条件。
  • How to verify if the slot-based execution engine was used.如何验证是否使用了基于插槽的执行引擎。

$group Optimization优化

New in version 5.2. 5.2版新增。

Starting in version 5.2, MongoDB uses the slot-based execution query engine to execute $group stages if either:从5.2版本开始,MongoDB使用基于槽的执行查询引擎来执行$group阶段,如果:

  • $group is the first stage in the pipeline.是管道中的第一阶段。
  • All preceding stages in the pipeline can also be executed by the slot-based execution engine.管道中的所有先前阶段也可以由基于槽的执行引擎执行。

When the slot-based query execution engine is used for $group, the explain results include:当基于槽的查询执行引擎用于$group时,解释结果包括:

  • explain.explainVersion: '2'
  • queryPlanner.winningPlan.queryPlan.stage: "GROUP"

    The location of the queryPlanner object depends on whether the pipeline contains stages after the $group stage which cannot be executed using the slot-based execution engine.queryPlanner对象的位置取决于管道中$group阶段之后是否包含无法使用基于槽的执行引擎执行的阶段。

    • If $group is the last stage or all stages after $group can be executed using the slot-based execution engine, the queryPlanner object is in the top-level explain output object (explain.queryPlanner).如果$group是最后一个阶段,或者$group之后的所有阶段都可以使用基于槽的执行引擎执行,则queryPlanner对象位于顶层解释输出对象(explain.queryPlanner)中。
    • If the pipeline contains stages after $group which cannot be executed using the slot-based execution engine, the queryPlanner object is in explain.stages[0].$cursor.queryPlanner.如果管道在$group之后包含无法使用基于槽的执行引擎执行的阶段,则queryPlanner对象处于explain.stages[0].$cursor.queryPlanner中。

$lookup Optimization优化

New in version 6.0. 6.0版新增。

Starting in version 6.0, MongoDB can use the slot-based execution query engine to execute $lookup stages if all preceding stages in the pipeline can also be executed by the slot-based execution engine and none of the following conditions are true:从6.0版本开始,MongoDB可以使用基于插槽的执行查询引擎来执行$lookup阶段,如果管道中的所有前面的阶段也可以由基于插槽的运行引擎来执行,并且以下条件都不成立:

  • The $lookup operation executes a pipeline on a joined collection. $lookup操作在联接的集合上执行管道。To see an example of this kind of operation, see Join Conditions and Subqueries on a Joined Collection.要查看此类操作的示例,请参阅相交集合上的相交条件和子查询
  • The $lookup's localField or foreignField specify numeric components. $lookuplocalFieldforeignField指定数字组件。For example: 例如:{ localField: "restaurant.0.review" }.
  • The from field of any $lookup in the pipeline specifies a view or sharded collection.管道中任何$lookupfrom字段都指定了一个视图或分片集合。

When the slot-based query execution engine is used for $lookup, the explain results include:当基于槽的查询执行引擎用于$lookup时,解释结果包括:

  • explain.explainVersion: '2'
  • queryPlanner.winningPlan.queryPlan.stage: "EQ_LOOKUP". EQ_LOOKUP means "equality lookup".意思是“相等查找”。

    The location of the queryPlanner object depends on whether the pipeline contains stages after the $lookup stage which cannot be executed using the slot-based execution engine.queryPlanner对象的位置取决于管道中$lookup阶段之后是否包含无法使用基于槽的执行引擎执行的阶段。

    • If $lookup is the last stage or all stages after $lookup can be executed using the slot-based execution engine, the queryPlanner object is in the top-level explain output object (explain.queryPlanner).如果$lookup是最后一个阶段,或者$lookup之后的所有阶段都可以使用基于槽的执行引擎执行,则queryPlanner对象位于顶层explain输出对象(explain.queryPlanner)中。
    • If the pipeline contains stages after $lookup which cannot be executed using the slot-based execution engine, the queryPlanner object is in explain.stages[0].$cursor.queryPlanner.如果管道在$lookup之后包含无法使用基于槽的执行引擎执行的阶段,则queryPlanner对象处于explain.stages[0].$cursor.queryPlanner中。

Improve Performance with Indexes and Document Filters使用索引和文档筛选器提高性能

The following sections show how you can improve aggregation performance using indexes and document filters.以下部分展示了如何使用索引和文档筛选器来提高聚合性能。

Indexes索引

An aggregation pipeline can use indexes from the input collection to improve performance. 聚合管道可以使用输入集合中的索引来提高性能。Using an index limits the amount of documents a stage processes. Ideally, an index can cover the stage query. 使用索引会限制阶段处理的文档数量。理想情况下,索引可以覆盖阶段查询。A covered query has especially high performance, since the index returns all matching documents.覆盖查询具有特别高的性能,因为索引返回所有匹配的文档。

For example, a pipeline that consists of $match, $sort, $group can benefit from indexes at every stage:例如,由$match$sort$group组成的管道可以从每个阶段的索引中获益:

  • An index on the $match query field efficiently identifies the relevant data$match查询字段上的索引可以有效地识别相关数据
  • An index on the sorting field returns data in sorted order for the $sort stage排序字段上的索引按$sort阶段的排序顺序返回数据
  • An index on the grouping field that matches the $sort order returns all of the field values needed for the $group stage, making it a covered query.分组字段上与$sort顺序匹配的索引返回$group阶段所需的所有字段值,使其成为一个覆盖查询。

To determine whether a pipeline uses indexes, review the query plan and look for IXSCAN or DISTINCT_SCAN plans.要确定管道是否使用索引,请查看查询计划并查找IXSCANDISTINCT_SCAN计划。

Note

In some cases, the query planner uses a DISTINCT_SCAN index plan that returns one document per index key value. 在某些情况下,查询计划器使用DISTINCT_SCAN索引计划,该索引计划为每个索引键值返回一个文档。DISTINCT_SCAN executes faster than IXSCAN if there are multiple documents per key value. 如果每个键值有多个文档,则DISTINCT_SCAN执行速度比IXSCAN快。However, index scan parameters might affect the time comparison of DISTINCT_SCAN and IXSCAN.但是,索引扫描参数可能会影响DISTINCT_SCANIXSCAN的时间比较。

For early stages in your aggregation pipeline, consider indexing the query fields. Stages that can benefit from indexes are:对于聚合管道的早期阶段,请考虑对查询字段进行索引。可以从索引中获益的阶段有:

$match stage阶段
During the $match stage, the server can use an index if $match is the first stage in the pipeline, after any optimizations from the query planner.$match阶段,如果$match是管道中的第一个阶段,则在查询计划器进行任何优化之后,服务器可以使用索引。
$sort stage阶段
During the $sort stage, the server can use an index if the stage is not preceded by a $project, $unwind, or $group stage.$sort阶段期间,如果阶段前面没有$project$unwind$group阶段,则服务器可以使用索引。
$group stage阶段

During the $group stage, the server can use an index to quickly find the $first or $last document in each group if the stage meets both of these conditions:$group阶段,如果该阶段同时满足以下两个条件,则服务器可以使用索引快速查找每组中的$first$last文档:

See $group Performance Optimizations for an example.有关示例,请参阅$group性能优化

$geoNear stage阶段
The server always uses an index for the $geoNear stage, since it requires a geospatial index.服务器始终使用$geoNear阶段的索引,因为它需要地理空间索引

Additionally, stages later in the pipeline that retrieve data from other, unmodified collections can use indexes on those collections for optimization. These stages include:此外,管道中稍后从其他未修改集合检索数据的阶段可以使用这些集合上的索引进行优化。这些阶段包括:

Document Filters文档筛选器

If your aggregation operation requires only a subset of the documents in a collection, filter the documents first:如果聚合操作只需要集合中文档的子集,请首先筛选文档:

  • Use the $match, $limit, and $skip stages to restrict the documents that enter the pipeline.使用$match$limit$skip阶段来限制进入管道的文档。
  • When possible, put $match at the beginning of the pipeline to use indexes that scan the matching documents in a collection.如果可能,将$match放在管道的开头,以使用索引扫描集合中的匹配文档。
  • $match followed by $sort at the start of the pipeline is equivalent to a single query with a sort, and can use an index.在管道开始时$match后跟$sort相当于一个带有排序的单个查询,并且可以使用索引。

Example实例

$sort + $skip + $limit Sequence序列

A pipeline contains a sequence of $sort followed by a $skip followed by a $limit:管道包含一个$sort后面跟着$skip后面跟着$limit的序列:

{ $sort: { age : -1 } },
{ $skip: 10 },
{ $limit: 5 }

The optimizer performs $sort + $limit Coalescence to transforms the sequence to the following:优化器执行$sort+$limit聚结,将序列转换为以下内容:

{
"$sort" : {
"sortKey" : {
"age" : -1
},
"limit" : NumberLong(15)
}
},
{
"$skip" : NumberLong(10)
}

MongoDB increases the $limit amount with the reordering.MongoDB通过重新排序增加了$limit金额。

Tip

See also: 另请参阅:

explain option in the db.collection.aggregate()db.collection.aggregate()中的explain选项