On this page本页内容
Aggregation pipeline operations have an optimization phase which attempts to reshape the pipeline for improved performance.聚合管道操作有一个优化阶段,该阶段尝试重塑管道以提高性能。
To see how the optimizer transforms a particular aggregation pipeline, include the 要查看优化器如何转换特定聚合管道,请在explain
option in the db.collection.aggregate()
method.db.collection.aggregate()
方法中包含explain
选项。
Optimizations are subject to change between releases.优化可能会在不同版本之间发生变化。
In addition to learning about the aggregation pipeline optimizations performed during the optimization phase, you will also see how to improve aggregation pipeline performance using indexes and document filters. 除了学习在优化阶段执行的聚合管道优化之外,您还将了解如何使用索引和文档筛选器提高聚合管道性能。See Improve Performance with Indexes and Document Filters.请参阅使用索引和文档筛选器提高性能。
The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. 聚合管道可以确定是否只需要文档中字段的子集来获得结果。If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.如果是,管道将只使用那些必填字段,从而减少通过管道的数据量。
$project
or $unset
or $addFields
or $set
) + $match
For an aggregation pipeline that contains a projection stage (对于包含投影阶段($project
or $unset
or $addFields
or $set
) followed by a $match
stage, MongoDB moves any filters in the $match
stage that do not require values computed in the projection stage to a new $match
stage before the projection.$project
或$unset
或$addFields
或$set
)后接$match
阶段的聚合管道,MongoDB将$match
阶段中不需要在投影阶段中计算值的任何筛选器移动到投影之前的新$match
。
If an aggregation pipeline contains multiple projection and/or 如果聚合管道包含多个投影和/或$match
stages, MongoDB performs this optimization for each $match
stage, moving each $match
filter before all projection stages that the filter does not depend on.$match
阶段,MongoDB会对每个$match
步骤执行此优化,将每个$match
筛选器移动到筛选器不依赖的所有投影阶段之前。
Consider a pipeline of the following stages:考虑以下阶段的管道:
{ $addFields: { maxTime: { $max: "$times" }, minTime: { $min: "$times" } } }, { $project: { _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1, avgTime: { $avg: ["$maxTime", "$minTime"] } } }, { $match: { name: "Joe Schmoe", maxTime: { $lt: 20 }, minTime: { $gt: 5 }, avgTime: { $gt: 7 } } }
The optimizer breaks up the 优化器将$match
stage into four individual filters, one for each key in the $match
query document. $match
阶段分解为四个单独的筛选器,每个筛选器用于$match
查询文档中的每个键。The optimizer then moves each filter before as many projection stages as possible, creating new 然后优化器将每个筛选器移动到尽可能多的投影阶段之前,根据需要创建新的$match
stages as needed. $match
阶段。Given this example, the optimizer produces the following optimizedpipeline:给定此示例,优化器生成以下优化的管道:
{ $match: { name: "Joe Schmoe" } }, { $addFields: { maxTime: { $max: "$times" }, minTime: { $min: "$times" } } }, { $match: { maxTime: { $lt: 20 }, minTime: { $gt: 5 } } }, { $project: { _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1, avgTime: { $avg: ["$maxTime", "$minTime"] } } }, { $match: { avgTime: { $gt: 7 } } }
The $match
filter { avgTime: { $gt: 7 } }
depends on the $project
stage to compute the avgTime
field. $match
筛选器{ avgTime: { $gt: 7 } }
依赖$project
阶段来计算avgTime
字段。The $project
stage is the last projection stage in this pipeline, so the $match
filter on avgTime
could not be moved.$project
阶段是该管道中的最后一个投影阶段,因此无法移动avgTime
上的$match
筛选器。
The maxTime
and minTime
fields are computed in the $addFields
stage but have no dependency on the $project
stage. maxTime
和minTime
字段在$addFields
阶段中计算,但不依赖于$project
阶段。The optimizer created a new 优化器为这些字段上的筛选器创建了一个新的$match
stage for the filters on these fields and placed it before the $project
stage.$match
阶段,并将其放在$project
阶段之前。
The $match
filter { name: "Joe Schmoe" }
does not use any values computed in either the $project
or $addFields
stages so it was moved to a new $match
stage before both of the projection stages.$match
筛选器{ name: "Joe Schmoe" }
不使用在$project
或$addFields
阶段中计算的任何值,因此它被移动到两个投影阶段之前的新$match
阶段。
After optimization, the filter 优化后,筛选器{ name: "Joe Schmoe" }
is in a $match
stage at the beginning of the pipeline. { name: "Joe Schmoe" }
处于管道开始的$match
阶段。This has the added benefit of allowing the aggregation to use an index on the 这还有一个额外的好处,即允许聚合在最初查询集合时使用name
field when initially querying the collection. name
字段上的索引。See Improve Performance with Indexes and Document Filters for more information.有关更多信息,请参阅使用索引和文档筛选器提高性能。
$sort
+ $match
When you have a sequence with 如果序列的$sort
followed by a $match
, the $match
moves before the $sort
to minimize the number of objects to sort. $sort
后面跟$match
,则$match
会移动到$sort
之前,以最小化要排序的对象数量。For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:
{ $sort: { age : -1 } }, { $match: { status: 'A' } }
During the optimization phase, the optimizer transforms the sequence to the following:在优化阶段,优化器将序列转换为以下内容:
{ $match: { status: 'A' } }, { $sort: { age : -1 } }
$redact
+ $match
When possible, when the pipeline has the 如果可能,当管道的$redact
stage immediately followed by the $match
stage, the aggregation can sometimes add a portion of the $match
stage before the $redact
stage. $redact
阶段紧接着$match
阶段时,聚合有时可以将$match
部分添加到$redact
之前。If the added 如果添加的$match
stage is at the start of a pipeline, the aggregation can use an index as well as query the collection to limit the number of documents that enter the pipeline. $match
阶段位于管道的开始,则聚合可以使用索引以及查询集合来限制进入管道的文档数量。See Improve Performance with Indexes and Document Filters for more information.有关更多信息,请参阅使用索引和文档筛选器提高性能。
For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } }, { $match: { year: 2014, category: { $ne: "Z" } } }
The optimizer can add the same 优化器可以在$match
stage before the $redact
stage:$redact
阶段之前添加相同的$match
阶段:
{ $match: { year: 2014 } }, { $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } }, { $match: { year: 2014, category: { $ne: "Z" } } }
$project
/$unset
+ $skip
When you have a sequence with 如果序列中有$project
or $unset
followed by $skip
, the $skip
moves before $project
. For example, if the pipeline consists of the following stages:$project
或$unset
后跟$skip
,则$skip
将移到$project
之前。例如,如果管道由以下阶段组成:
{ $sort: { age : -1 } }, { $project: { status: 1, name: 1 } }, { $skip: 5 }
During the optimization phase, the optimizer transforms the sequence to the following:在优化阶段,优化器将序列转换为以下内容:
{ $sort: { age : -1 } }, { $skip: 5 }, { $project: { status: 1, name: 1 } }
When possible, the optimization phase coalesces a pipeline stage into its predecessor. 如果可能,优化阶段将管道阶段合并到其前一阶段。Generally, coalescence occurs after any sequence reordering optimization.通常,合并发生在任何序列重新排序优化之后。
$sort
+ $limit
Changed in version 4.0.在版本4.0中更改。
When a 在$sort
precedes a $limit
, the optimizer can coalesce the $limit
into the $sort
if no intervening stages modify the number of documents (e.g. $unwind
, $group
). $limit
之前时,如果没有中间阶段修改文档的数量(例如,$unwind
,$group
),优化器可以将$limit
合并到$sort
中。MongoDB will not coalesce the 当$limit
into the $sort
if there are pipeline stages that change the number of documents between the $sort
and $limit
stages..$sort
如果有管道阶段改变$sort
和$limit
阶段之间的文档数量,MongoDB将不会将$limit
合并到$sort
中。
For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:
{ $sort : { age : -1 } }, { $project : { age : 1, status : 1, name : 1 } }, { $limit: 5 }
During the optimization phase, the optimizer coalesces the sequence to the following:在优化阶段,优化器将序列合并为以下内容:
{ "$sort" : { "sortKey" : { "age" : -1 }, "limit" : NumberLong(5) } }, { "$project" : { "age" : 1, "status" : 1, "name" : 1 } }
This allows the sort operation to only maintain the top 这允许排序操作在进行时只保留前n
results as it progresses, where n
is the specified limit, and MongoDB only needs to store n
items in memory [1]. n
个结果,其中n
是指定的限制,MongoDB只需要在内存中存储n
个项[1]。See 有关详细信息,请参阅$sort
Operator and Memory for more information.$sort
运算符和内存。
$skip
进行序列优化If there is a 如果$skip
stage between the $sort
and $limit
stages, MongoDB will coalesce the $limit
into the $sort
stage and increase the $limit
value by the $skip
amount. $sort
和$limit
阶段之间有$skip
阶段,MongoDB会将$limit
合并到$sort
阶段,并将$limit
值增加$skip
总量。See 有关示例,请参阅$sort
+ $skip
+ $limit
Sequence for an example.$sort
+$skip
+$limit
序列。
[1] | allowDiskUse is true and the n items exceed the aggregation memory limit.allowDiskUse 为true 且n 项超过聚合内存限制时,优化仍将适用。 |
$limit
+ $limit
When a 当一个$limit
immediately follows another $limit
, the two stages can coalesce into a single $limit
where the limit amount is the smaller of the two initial limit amounts. $limit
紧随另一个$limit
之后时,两个阶段可以合并为一个$limit
,其中限制总量为两个初始限制总量中的较小者。For example, a pipeline contains the following sequence:例如,管道包含以下序列:
{ $limit: 100 }, { $limit: 10 }
Then the second 则第二个$limit
stage can coalesce into the first $limit
stage and result in a single $limit
stage where the limit amount 10
is the minimum of the two initial limits 100
and 10
.$limit
阶段可以合并到第一个$limit
阶段中并产生单个$limit
阶段,其中限量10
是两个初始限量100
和10
中的最小值。
{ $limit: 10 }
$skip
+ $skip
When a 当一个$skip
immediately follows another $skip
, the two stages can coalesce into a single $skip
where the skip amount is the sum of the two initial skip amounts. $skip
紧接着另一个$skip
时,这两个阶段可以合并为单个$skip
,其中跳过量是两个初始跳过量之和。For example, a pipeline contains the following sequence:例如,管道包含以下序列:
{ $skip: 5 }, { $skip: 2 }
Then the second 则第二个$skip
stage can coalesce into the first $skip
stage and result in a single $skip
stage where the skip amount 7
is the sum of the two initial limits 5
and 2
.$skip
阶段可以合并到第一个$skip
阶段中,并产生单个$skip
阶段。
{ $skip: 7 }
$match
+ $match
When a 当一个$match
immediately follows another $match
, the two stages can coalesce into a single $match
combining the conditions with an $and
. $match
紧接着另一个$match
时,这两个阶段可以合并成一个$match
,将条件与$and
结合起来。For example, a pipeline contains the following sequence:例如,管道包含以下序列:
{ $match: { year: 2014 } }, { $match: { status: "A" } }
Then the second 然后,第二个$match
stage can coalesce into the first $match
stage and result in a single $match
stage$match
阶段可以合并到第一个$match
阶段中,并生成一个$match
阶段
{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }
$lookup
+ $unwind
When a 当$unwind
immediately follows another $lookup
, and the $unwind
operates on the as
field of the $lookup
, the optimizer can coalesce the $unwind
into the $lookup
stage. $unwind
紧跟在另一个$lookup
之后,并且$unwind
操作$lookup
的as
字段时,优化器可以将$unwind
合并到$lookup
阶段中。This avoids creating large intermediate documents.这避免了创建大型中间文档。
For example, a pipeline contains the following sequence:例如,管道包含以下序列:
{ $lookup: { from: "otherCollection", as: "resultingArray", localField: "x", foreignField: "y" } }, { $unwind: "$resultingArray"}
The optimizer can coalesce the 优化器可以将$unwind
stage into the $lookup
stage. $unwind
阶段合并到$lookup
阶段。If you run the aggregation with 如果使用explain
option, the explain
output shows the coalesced stage:explain
选项运行聚合,explain
输出将显示合并阶段:
{ $lookup: { from: "otherCollection", as: "resultingArray", localField: "x", foreignField: "y", unwinding: { preserveNullAndEmptyArrays: false } } }
$group
New in version 5.2.在版本5.2中新增。
Starting in MongoDB 5.2, MongoDB uses the slot-based execution query engine to execute 从MongoDB 5.2开始,MongoDB使用基于时隙的执行查询引擎来执行$group
stages when $group
is either:$group
阶段,当$group
为:
$match
followed by $group
, the $match
and $group
stages are executed by the slot-based engine.$match
开头,后跟$group
,则$match
和$group
阶段由基于时隙的引擎执行。In most cases, the slot-based engine provides improved performance and lower CPU and memory costs compared to the classic query engine.在大多数情况下,与经典查询引擎相比,基于时隙的引擎提供了更好的性能和更低的CPU和内存成本。
To verify that the slot-based engine is used, run the aggregation with the 要验证是否使用了基于插槽的引擎,请使用.explain()
option. explain()
选项运行聚合。This option outputs information on the aggregation's query plan.此选项输出有关聚合查询计划的信息。
When the slot-based query execution engine is used for 当基于时隙的查询执行引擎用于$group
, the explain results include:$group
时,解释结果包括:
explain.explainVersion: '2'
explain.queryPlanner.winningPlan.queryPlan.stage: "GROUP"
The following sections show how you can improve aggregation performance using indexes and document filters.以下各节介绍如何使用索引和文档筛选器提高聚合性能。
The query planner analyzes an aggregation pipeline to determine if indexes can be used to improve pipeline performance.查询规划器分析聚合管道,以确定是否可以使用索引来提高管道性能。
The following list shows some pipeline stages that can use indexes:以下列表显示了一些可以使用索引的管道阶段:
$match
$match
can use an index to filter documents if $match
is the first stage in a pipeline.$match
是管道中的第一阶段,则$match
可以使用索引筛选文档。$sort
$sort
can use an index if $sort
is not preceded by a $project
, $unwind
, or $group
stage.$sort
前面没有$project
、$unwind
或$group
阶段,则$sort
可以使用索引。$group
在以下情况下,$group
can potentially use an index to find the first document in each group if:$group
可以使用索引查找每个组中的第一个文档:
$group
$sort
that sorts the field to group by, and$sort
,用于对要分组的字段进行排序,以及$first
is the only accumulator in $group
.$first
是$group
中唯一的累加器。See 有关示例,请参阅$group
Performance Optimizations for an example.$group
性能优化。
$geoNear
$geoNear
$geoNear
Starting in MongoDB 4.2, in some cases, an aggregation pipeline can use a 从MongoDB 4.2开始,在某些情况下,聚合管道可以使用DISTINCT_SCAN
index plan, which typically has higher performance than IXSCAN
.DISTINCT_SCAN
索引计划,这通常比IXSCAN
具有更高的性能。
Indexes can cover queries in an aggregation pipeline. 索引可以覆盖聚合管道中的查询。A covered query uses an index to return all of the documents and has high performance.覆盖查询使用索引返回所有文档,具有高性能。
If your aggregation operation requires only a subset of the documents in a collection, filter the documents first:如果聚合操作只需要集合中文档的子集,请先筛选文档:
$match
, $limit
, and $skip
stages to restrict the documents that enter the pipeline.$match
、$limit
和$skip
阶段限制进入管道的文档。$match
at the beginning of the pipeline to use indexes that scan the matching documents in a collection.$match
放在管道的开头,以使用索引扫描集合中的匹配文档。$match
followed by $sort
at the start of the pipeline is equivalent to a single query with a sort, and can use an index.$match
后跟$sort
相当于一个带有排序的查询,并且可以使用索引。$sort
+ $skip
+ $limit
A pipeline contains a sequence of 管道包含一个$sort
followed by a $skip
followed by a $limit
:$sort
序列,后面跟着$skip
,后面跟着一个$limit
:
{ $sort: { age : -1 } }, { $skip: 10 }, { $limit: 5 }
The optimizer performs 优化器执行$sort
+ $limit
Coalescence to transforms the sequence to the following:$sort
+$limit
合并,将序列转换为以下内容:
{ "$sort" : { "sortKey" : { "age" : -1 }, "limit" : NumberLong(15) } }, { "$skip" : NumberLong(10) }
MongoDB increases the MongoDB通过重新排序增加了$limit
amount with the reordering.$limit
总量。
explain
option in the db.collection.aggregate()