Aggregation Pipeline Optimization聚合管道优化
On this page本页内容
Aggregation pipeline operations have an optimization phase which attempts to reshape the pipeline for improved performance.聚合管道操作具有优化阶段,该阶段试图重塑管道以提高性能。
To see how the optimizer transforms a particular aggregation pipeline, include the 要查看优化器如何转换特定的聚合管道,请在explain
option in the db.collection.aggregate()
method.db.collection.aggregate()
方法中包含explain
选项。
Optimizations are subject to change between releases.优化可能会在不同版本之间发生变化。
In addition to learning about the aggregation pipeline optimizations performed during the optimization phase, you will also see how to improve aggregation pipeline performance using indexes and document filters.除了了解在优化阶段执行的聚合管道优化外,您还将了解如何使用索引和文档筛选器来提高聚合管道性能。
Projection Optimization投影优化
The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. 聚合管道可以确定是否只需要文档中字段的子集即可获得结果。If so, the pipeline only uses those fields, reducing the amount of data passing through the pipeline.如果是这样,管道只使用这些字段,从而减少了通过管道的数据量。
$project
Stage Placement阶段布置
When you use a 当您使用$project
stage it should typically be the last stage in your pipeline, used to specify which fields to return to the client.$project
阶段时,它通常应该是管道中的最后一个阶段,用于指定要返回给客户端的字段。
Using a 在管道的开始或中间使用$project
stage at the beginning or middle of a pipeline to reduce the number of fields passed to subsequent pipeline stages is unlikely to improve performance, as the database performs this optimization automatically.$project
阶段来减少传递到后续管道阶段的字段数量不太可能提高性能,因为数据库会自动执行此优化。
Pipeline Sequence Optimization管道顺序优化
($project
or $unset
or $addFields
or $set
) + $match
Sequence Optimization序列优化
For an aggregation pipeline that contains a projection stage (对于包含一个投影阶段($project
or $unset
or $addFields
or $set
) followed by a $match
stage, MongoDB moves any filters in the $match
stage that do not require values computed in the projection stage to a new $match
stage before the projection.$project
或$unset
或$addFields
或$set
)和一个$match
阶段的聚合管道,MongoDB会在投影之前将$match
中不需要在投影阶段计算值的任何筛选器移动到一个新的$match
阶段。
If an aggregation pipeline contains multiple projection and/or 如果聚合管道包含多个投影和/或$match
stages, MongoDB performs this optimization for each $match
stage, moving each $match
filter before all projection stages that the filter does not depend on.$match
阶段,MongoDB会对每个$match
步骤执行此优化,将每个$match
筛选器移到该筛选器不依赖的所有投影阶段之前。
Consider a pipeline of the following stages:考虑以下阶段的管道:
{ $addFields: {
maxTime: { $max: "$times" },
minTime: { $min: "$times" }
} },
{ $project: {
_id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: {
name: "Joe Schmoe",
maxTime: { $lt: 20 },
minTime: { $gt: 5 },
avgTime: { $gt: 7 }
} }
The optimizer breaks up the 优化器将$match
stage into four individual filters, one for each key in the $match
query document. The optimizer then moves each filter before as many projection stages as possible, creating new $match
stages as needed. $match
阶段分解为四个单独的筛选器,$match
查询文档中的每个键对应一个筛选器。然后,优化器将每个筛选器移动到尽可能多的投影阶段之前,根据需要创建新的$match
阶段。Given this example, the optimizer produces the following optimized pipeline:给定此示例,优化器将生成以下优化的管道:
{ $match: { name: "Joe Schmoe" } },
{ $addFields: {
maxTime: { $max: "$times" },
minTime: { $min: "$times" }
} },
{ $match: { maxTime: { $lt: 20 }, minTime: { $gt: 5 } } },
{ $project: {
_id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: { avgTime: { $gt: 7 } } }
The $match
filter { avgTime: { $gt: 7 } }
depends on the $project
stage to compute the avgTime
field. $match
筛选器{ avgTime: { $gt: 7 } }
依赖于$project
阶段来计算avgTime
字段。The $project
stage is the last projection stage in this pipeline, so the $match
filter on avgTime
could not be moved.$project
阶段是此管道中的最后一个投影阶段,因此无法移动avgTime
上的$match
筛选器。
The maxTime
and minTime
fields are computed in the $addFields
stage but have no dependency on the $project
stage. maxTime
和minTime
字段在$addFields
阶段中计算,但不依赖于$project
阶段。The optimizer created a new 优化器为这些字段上的筛选器创建了一个新的$match
stage for the filters on these fields and placed it before the $project
stage.$match
阶段,并将其放置在 $project
阶段之前。
The $match
filter { name: "Joe Schmoe" }
does not use any values computed in either the $project
or $addFields
stages so it was moved to a new $match
stage before both of the projection stages.$match
筛选器{ name: "Joe Schmoe" }
不使用在$project
或$addFields
阶段中计算的任何值,因此在这两个投影阶段之前,它被移到了一个新的$match
阶段。
After optimization, the filter 优化后,筛选器{ name: "Joe Schmoe" }
is in a $match
stage at the beginning of the pipeline. This has the added benefit of allowing the aggregation to use an index on the name
field when initially querying the collection. See Improve Performance with Indexes and Document Filters for more information.{ name: "Joe Schmoe" }
处于管道开始处的$match
阶段。这还有一个额外的好处,即允许聚合在最初查询集合时对name
字段使用索引。有关详细信息,请参阅 使用索引和文档筛选器提高性能。
$sort
+ $match
Sequence Optimization序列优化
When you have a sequence with 如果序列的$sort
followed by a $match
, the $match
moves before the $sort
to minimize the number of objects to sort. For example, if the pipeline consists of the following stages:$sort
后跟$match
,则$match
会移动到$sort
之前,以最大限度地减少要排序的对象数。例如,如果管道由以下阶段组成:
{ $sort: { age : -1 } },
{ $match: { status: 'A' } }
During the optimization phase, the optimizer transforms the sequence to the following:在优化阶段,优化器将序列转换为以下内容:
{ $match: { status: 'A' } },
{ $sort: { age : -1 } }
$redact
+ $match
Sequence Optimization序列优化
When possible, when the pipeline has the 在可能的情况下,当管道的$redact
stage immediately followed by the $match
stage, the aggregation can sometimes add a portion of the $match
stage before the $redact
stage. $redact
阶段紧接着$match
阶段时,聚合有时可以在$redact
阶段之前添加$match
阶段的一部分。If the added 如果添加的$match
stage is at the start of a pipeline, the aggregation can use an index as well as query the collection to limit the number of documents that enter the pipeline. $match
阶段位于管道的开头,则聚合可以使用索引以及查询集合来限制进入管道的文档数量。See Improve Performance with Indexes and Document Filters for more information.有关详细信息,请参阅使用索引和文档筛选器提高性能。
For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }
The optimizer can add the same 优化器可以在$match
stage before the $redact
stage:$redact
阶段之前添加相同的$match
阶段:
{ $match: { year: 2014 } },
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }
$project
/$unset
+ $skip
Sequence Optimization序列优化
When you have a sequence with 当您有一个$project
or $unset
followed by $skip
, the $skip
moves before $project
. $project
或$unset
后跟$skip
的序列时,$skip
会移动到$project
之前。For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:
{ $sort: { age : -1 } },
{ $project: { status: 1, name: 1 } },
{ $skip: 5 }
During the optimization phase, the optimizer transforms the sequence to the following:在优化阶段,优化器将序列转换为以下内容:
{ $sort: { age : -1 } },
{ $skip: 5 },
{ $project: { status: 1, name: 1 } }
Pipeline Coalescence Optimization管道聚结优化
When possible, the optimization phase coalesces a pipeline stage into its predecessor. 在可能的情况下,优化阶段将管道阶段合并为其前一阶段。Generally, coalescence occurs after any sequence reordering optimization.通常,合并发生在任何序列重新排序优化之后。
$sort
+ $limit
Coalescence聚结
Changed in version 4.04.0版更改.
When a 当$sort
precedes a $limit
, the optimizer can coalesce the $limit
into the $sort
if no intervening stages modify the number of documents (e.g. $unwind
, $group
). MongoDB will not coalesce the $limit
into the $sort
if there are pipeline stages that change the number of documents between the $sort
and $limit
stages.$sort
在$limit
之前时,如果没有中间阶段修改文档数量(例如$unwind
、$group
),优化器可以将$limit
合并为$sort
。如果存在在$sort
和$limit
阶段之间更改文档数量的管道阶段,MongoDB将不会将$limit
合并为$sort
。
For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:
{ $sort : { age : -1 } },
{ $project : { age : 1, status : 1, name : 1 } },
{ $limit: 5 }
During the optimization phase, the optimizer coalesces the sequence to the following:在优化阶段,优化器将序列合并为以下内容:
{
"$sort" : {
"sortKey" : {
"age" : -1
},
"limit" : NumberLong(5)
}
},
{ "$project" : {
"age" : 1,
"status" : 1,
"name" : 1
}
}
This allows the sort operation to only maintain the top 这允许排序操作在进行过程中只维护前n
results as it progresses, where n
is the specified limit, and MongoDB only needs to store n
items in memory [1]. n
个结果,其中n
是指定的限制,MongoDB只需要在内存中存储n
个项[1]。See 有关详细信息,请参阅$sort
Operator and Memory for more information.$sort
运算符和内存。
Sequence Optimization with $skip带$skip
的序列优化
If there is a 如果$skip
stage between the $sort
and $limit
stages, MongoDB will coalesce the $limit
into the $sort
stage and increase the $limit
value by the $skip
amount. $sort
和$limit
阶段之间有一个$skip
阶段,MongoDB将把$limit
合并到$sort
阶段,并将$limit
值增加$skip
值。See 有关示例,请参阅$sort
+ $skip
+ $limit
Sequence for an example.$sort
+$skip
+$limit
序列。
[1] | allowDiskUse is true and the n items exceed the aggregation memory limit.allowDiskUse 为true 并且n 个项目超过聚合内存限制时,优化仍然适用。 |
$limit
+ $limit
Coalescence聚结
When a 当一个$limit
immediately follows another $limit
, the two stages can coalesce into a single $limit
where the limit amount is the smaller of the two initial limit amounts. $limit
紧跟在另一个$limit
之后时,这两个阶段可以合并为一个单一的$limit
,其中限制量是两个初始限制量中较小的一个。For example, a pipeline contains the following sequence:例如,管道包含以下序列:
{ $limit: 100 },
{ $limit: 10 }
Then the second 然后第二$limit
stage can coalesce into the first $limit
stage and result in a single $limit
stage where the limit amount 10
is the minimum of the two initial limits 100
and 10
.$limit
阶段可以合并为第一$limit
阶段,并导致单个$limit
阶段,其中限定量10
是两个初始限定量100
和10
中的最小值。
{ $limit: 10 }
$skip
+ $skip
Coalescence聚结
When a 当$skip
immediately follows another $skip
, the two stages can coalesce into a single $skip
where the skip amount is the sum of the two initial skip amounts. For example, a pipeline contains the following sequence:$skip
紧跟在另一个$skip
之后时,这两个阶段可以合并为一个$skip
,其中跳过量是两个初始跳过量的总和。例如,管道包含以下序列:
{ $skip: 5 },
{ $skip: 2 }
Then the second 然后第二个$skip
stage can coalesce into the first $skip
stage and result in a single $skip
stage where the skip amount 7
is the sum of the two initial limits 5
and 2
.$skip
阶段可以合并为第一个$skip
阶段,并导致单个$skip
阶段,其中跳过量7
是两个初始限定量5
和2
的总和。
{ $skip: 7 }
$match
+ $match
Coalescence聚结
When a 当$match
immediately follows another $match
, the two stages can coalesce into a single $match
combining the conditions with an $and
. $match
紧跟在另一个$match
之后时,这两个阶段可以合并为单个$match
,将条件与$and
组合在一起。For example, a pipeline contains the following sequence:例如,管道包含以下序列:
{ $match: { year: 2014 } },
{ $match: { status: "A" } }
Then the second 然后,第二个$match
stage can coalesce into the first $match
stage and result in a single $match
stage$match
阶段可以合并为第一个$match
阶段,并产生一个$match
阶段。
{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }
$lookup
+ $unwind
Coalescence聚结
When a 当一个$unwind
immediately follows another $lookup
, and the $unwind
operates on the as
field of the $lookup
, the optimizer can coalesce the $unwind
into the $lookup
stage. $unwind
紧跟在另一个$lookup
之后,并且$unwind
在$lookup
的as
字段上操作时,优化器可以将$unwind
合并到$lookup
阶段。This avoids creating large intermediate documents.这样可以避免创建大型中间文档。
For example, a pipeline contains the following sequence:例如,管道包含以下序列:
{
$lookup: {
from: "otherCollection",
as: "resultingArray",
localField: "x",
foreignField: "y"
}
},
{ $unwind: "$resultingArray"}
The optimizer can coalesce the 优化器可以将$unwind
stage into the $lookup
stage. $unwind
阶段合并为$lookup
阶段。If you run the aggregation with 如果使用explain
option, the explain
output shows the coalesced stage:explain
选项运行聚合,则explain
输出将显示合并阶段:
{
$lookup: {
from: "otherCollection",
as: "resultingArray",
localField: "x",
foreignField: "y",
unwinding: { preserveNullAndEmptyArrays: false }
}
}
Slot-Based Query Execution Engine Pipeline Optimizations基于槽的查询执行引擎管道优化
MongoDB can use the slot-based query execution engine to execute certain pipeline stages when specific conditions are met. In most cases, the slot-based execution engine provides improved performance and lower CPU and memory costs compared to the classic query engine.当满足特定条件时,MongoDB可以使用基于slot的查询执行引擎来执行某些管道阶段。在大多数情况下,与传统的查询引擎相比,基于插槽的执行引擎提供了改进的性能以及更低的CPU和内存成本。
To verify that the slot-based execution engine is used, run the aggregation with the 要验证是否使用了基于插槽的执行引擎,请使用explain
option. This option outputs information on the aggregation's query plan. explain
选项运行聚合。此选项输出有关聚合的查询计划的信息。For more information on using 有关在聚合中使用explain
with aggregations, see Return Information on Aggregation Pipeline Operation.explain
的详细信息,请参阅返回聚合管道操作的信息。
The following sections describe:以下各节介绍:
The conditions when the slot-based execution engine is used for aggregation.基于插槽的执行引擎用于聚合时的条件。How to verify if the slot-based execution engine was used.如何验证是否使用了基于插槽的执行引擎。
$group
Optimization优化
New in version 5.2. 5.2版新增。
Starting in version 5.2, MongoDB uses the slot-based execution query engine to execute 从5.2版本开始,MongoDB使用基于槽的执行查询引擎来执行$group
stages if either:$group
阶段,如果:
$group
is the first stage in the pipeline.是管道中的第一阶段。All preceding stages in the pipeline can also be executed by the slot-based execution engine.管道中的所有先前阶段也可以由基于槽的执行引擎执行。
When the slot-based query execution engine is used for 当基于槽的查询执行引擎用于$group
, the explain results include:$group
时,解释结果包括:
explain.explainVersion: '2'
queryPlanner.winningPlan.queryPlan.stage: "GROUP"
The location of thequeryPlanner
object depends on whether the pipeline contains stages after the$group
stage which cannot be executed using the slot-based execution engine.queryPlanner
对象的位置取决于管道中$group
阶段之后是否包含无法使用基于槽的执行引擎执行的阶段。If如果$group
is the last stage or all stages after$group
can be executed using the slot-based execution engine, thequeryPlanner
object is in the top-levelexplain
output object (explain.queryPlanner
).$group
是最后一个阶段,或者$group
之后的所有阶段都可以使用基于槽的执行引擎执行,则queryPlanner
对象位于顶层解释输出对象(explain.queryPlanner
)中。If the pipeline contains stages after如果管道在$group
which cannot be executed using the slot-based execution engine, thequeryPlanner
object is inexplain.stages[0].$cursor.queryPlanner
.$group
之后包含无法使用基于槽的执行引擎执行的阶段,则queryPlanner
对象处于explain.stages[0].$cursor.queryPlanner
中。
$lookup
Optimization优化
New in version 6.0. 6.0版新增。
Starting in version 6.0, MongoDB can use the slot-based execution query engine to execute 从6.0版本开始,MongoDB可以使用基于插槽的执行查询引擎来执行$lookup
stages if all preceding stages in the pipeline can also be executed by the slot-based execution engine and none of the following conditions are true:$lookup
阶段,如果管道中的所有前面的阶段也可以由基于插槽的运行引擎来执行,并且以下条件都不成立:
The$lookup
operation executes a pipeline on a joined collection.$lookup
操作在联接的集合上执行管道。To see an example of this kind of operation, see Join Conditions and Subqueries on a Joined Collection.要查看此类操作的示例,请参阅相交集合上的相交条件和子查询。The$lookup
'slocalField
orforeignField
specify numeric components.$lookup
的localField
或foreignField
指定数字组件。For example:例如:{ localField: "restaurant.0.review" }
.The管道中任何from
field of any$lookup
in the pipeline specifies a view or sharded collection.$lookup
的from
字段都指定了一个视图或分片集合。
When the slot-based query execution engine is used for 当基于槽的查询执行引擎用于$lookup
, the explain results include:$lookup
时,解释结果包括:
explain.explainVersion: '2'
queryPlanner.winningPlan.queryPlan.stage: "EQ_LOOKUP"
.EQ_LOOKUP
means "equality lookup".意思是“相等查找”。The location of thequeryPlanner
object depends on whether the pipeline contains stages after the$lookup
stage which cannot be executed using the slot-based execution engine.queryPlanner
对象的位置取决于管道中$lookup
阶段之后是否包含无法使用基于槽的执行引擎执行的阶段。If如果$lookup
is the last stage or all stages after$lookup
can be executed using the slot-based execution engine, thequeryPlanner
object is in the top-levelexplain
output object (explain.queryPlanner
).$lookup
是最后一个阶段,或者$lookup
之后的所有阶段都可以使用基于槽的执行引擎执行,则queryPlanner
对象位于顶层explain
输出对象(explain.queryPlanner
)中。If the pipeline contains stages after如果管道在$lookup
which cannot be executed using the slot-based execution engine, thequeryPlanner
object is inexplain.stages[0].$cursor.queryPlanner
.$lookup
之后包含无法使用基于槽的执行引擎执行的阶段,则queryPlanner
对象处于explain.stages[0].$cursor.queryPlanner
中。
Improve Performance with Indexes and Document Filters使用索引和文档筛选器提高性能
The following sections show how you can improve aggregation performance using indexes and document filters.以下部分展示了如何使用索引和文档筛选器来提高聚合性能。
Indexes索引
An aggregation pipeline can use indexes from the input collection to improve performance. 聚合管道可以使用输入集合中的索引来提高性能。Using an index limits the amount of documents a stage processes. Ideally, an index can cover the stage query. 使用索引会限制阶段处理的文档数量。理想情况下,索引可以覆盖阶段查询。A covered query has especially high performance, since the index returns all matching documents.覆盖查询具有特别高的性能,因为索引返回所有匹配的文档。
For example, a pipeline that consists of 例如,由$match
, $sort
, $group
can benefit from indexes at every stage:$match
、$sort
、$group
组成的管道可以从每个阶段的索引中获益:
An index on the$match
query field efficiently identifies the relevant data$match
查询字段上的索引可以有效地识别相关数据An index on the sorting field returns data in sorted order for the排序字段上的索引按$sort
stage$sort
阶段的排序顺序返回数据An index on the grouping field that matches the分组字段上与$sort
order returns all of the field values needed for the$group
stage, making it a covered query.$sort
顺序匹配的索引返回$group
阶段所需的所有字段值,使其成为一个覆盖查询。
To determine whether a pipeline uses indexes, review the query plan and look for 要确定管道是否使用索引,请查看查询计划并查找IXSCAN
or DISTINCT_SCAN
plans.IXSCAN
或DISTINCT_SCAN
计划。
In some cases, the query planner uses a 在某些情况下,查询计划器使用DISTINCT_SCAN
index plan that returns one document per index key value. DISTINCT_SCAN
索引计划,该索引计划为每个索引键值返回一个文档。如果每个键值有多个文档,则DISTINCT_SCAN
executes faster than IXSCAN
if there are multiple documents per key value. DISTINCT_SCAN
执行速度比IXSCAN
快。However, index scan parameters might affect the time comparison of 但是,索引扫描参数可能会影响DISTINCT_SCAN
and IXSCAN
.DISTINCT_SCAN
和IXSCAN
的时间比较。
For early stages in your aggregation pipeline, consider indexing the query fields. Stages that can benefit from indexes are:对于聚合管道的早期阶段,请考虑对查询字段进行索引。可以从索引中获益的阶段有:
$match
stage阶段During the在$match
stage, the server can use an index if$match
is the first stage in the pipeline, after any optimizations from the query planner.$match
阶段,如果$match
是管道中的第一个阶段,则在查询计划器进行任何优化之后,服务器可以使用索引。$sort
stage阶段During the在$sort
stage, the server can use an index if the stage is not preceded by a$project
,$unwind
, or$group
stage.$sort
阶段期间,如果阶段前面没有$project
、$unwind
或$group
阶段,则服务器可以使用索引。$group
stage阶段-
During the在$group
stage, the server can use an index to quickly find the$first
or$last
document in each group if the stage meets both of these conditions:$group
阶段,如果该阶段同时满足以下两个条件,则服务器可以使用索引快速查找每组中的$first
或$last
文档:The pipeline管道按同一字段进行sorts
andgroups
by the same field.sorts
和groups
。The$group
stage only uses the$first
or$last
accumulator operator.$group
阶段只使用$first
或$last
累加器运算符。
See $group Performance Optimizations for an example.有关示例,请参阅$group
性能优化。 $geoNear
stage阶段The server always uses an index for the服务器始终使用$geoNear
stage, since it requires a geospatial index.$geoNear
阶段的索引,因为它需要地理空间索引。
Additionally, stages later in the pipeline that retrieve data from other, unmodified collections can use indexes on those collections for optimization. These stages include:此外,管道中稍后从其他未修改集合检索数据的阶段可以使用这些集合上的索引进行优化。这些阶段包括:
Document Filters文档筛选器
If your aggregation operation requires only a subset of the documents in a collection, filter the documents first:如果聚合操作只需要集合中文档的子集,请首先筛选文档:
Use the使用$match
,$limit
, and$skip
stages to restrict the documents that enter the pipeline.$match
、$limit
和$skip
阶段来限制进入管道的文档。When possible, put如果可能,将$match
at the beginning of the pipeline to use indexes that scan the matching documents in a collection.$match
放在管道的开头,以使用索引扫描集合中的匹配文档。在管道开始时$match
followed by$sort
at the start of the pipeline is equivalent to a single query with a sort, and can use an index.$match
后跟$sort
相当于一个带有排序的单个查询,并且可以使用索引。
Example实例
$sort
+ $skip
+ $limit
Sequence序列
A pipeline contains a sequence of 管道包含一个$sort
followed by a $skip
followed by a $limit
:$sort
后面跟着$skip
后面跟着$limit
的序列:
{ $sort: { age : -1 } },
{ $skip: 10 },
{ $limit: 5 }
The optimizer performs 优化器执行$sort
+ $limit
Coalescence to transforms the sequence to the following:$sort
+$limit
聚结,将序列转换为以下内容:
{
"$sort" : {
"sortKey" : {
"age" : -1
},
"limit" : NumberLong(15)
}
},
{
"$skip" : NumberLong(10)
}
MongoDB increases the MongoDB通过重新排序增加了$limit
amount with the reordering.$limit
金额。
See also: 另请参阅:
explain
option in the db.collection.aggregate()
db.collection.aggregate()
中的explain
选项