Aggregation Pipeline Optimization聚合管道优化
On this page本页内容
Aggregation pipeline operations have an optimization phase which attempts to reshape the pipeline for improved performance.聚合管道操作具有优化阶段,该阶段试图重塑管道以提高性能。
To see how the optimizer transforms a particular aggregation pipeline, include the 要查看优化器如何转换特定的聚合管道,请在explain option in the db.collection.aggregate() method.db.collection.aggregate()方法中包含explain选项。
Optimizations are subject to change between releases.优化可能会在不同版本之间发生变化。
In addition to learning about the aggregation pipeline optimizations performed during the optimization phase, you will also see how to improve aggregation pipeline performance using indexes and document filters.除了了解在优化阶段执行的聚合管道优化外,您还将了解如何使用索引和文档筛选器来提高聚合管道性能。
Projection Optimization投影优化
The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. 聚合管道可以确定是否只需要文档中字段的子集即可获得结果。If so, the pipeline only uses those fields, reducing the amount of data passing through the pipeline.如果是这样,管道只使用这些字段,从而减少了通过管道的数据量。
$project Stage Placement阶段布置
When you use a 当您使用$project stage it should typically be the last stage in your pipeline, used to specify which fields to return to the client.$project阶段时,它通常应该是管道中的最后一个阶段,用于指定要返回给客户端的字段。
Using a 在管道的开始或中间使用$project stage at the beginning or middle of a pipeline to reduce the number of fields passed to subsequent pipeline stages is unlikely to improve performance, as the database performs this optimization automatically.$project阶段来减少传递到后续管道阶段的字段数量不太可能提高性能,因为数据库会自动执行此优化。
Pipeline Sequence Optimization管道顺序优化
($project or $unset or $addFields or $set) + $match Sequence Optimization序列优化
For an aggregation pipeline that contains a projection stage (对于包含一个投影阶段($project or $unset or $addFields or $set) followed by a $match stage, MongoDB moves any filters in the $match stage that do not require values computed in the projection stage to a new $match stage before the projection.$project或$unset或$addFields或$set)和一个$match阶段的聚合管道,MongoDB会在投影之前将$match中不需要在投影阶段计算值的任何筛选器移动到一个新的$match阶段。
If an aggregation pipeline contains multiple projection and/or 如果聚合管道包含多个投影和/或$match stages, MongoDB performs this optimization for each $match stage, moving each $match filter before all projection stages that the filter does not depend on.$match阶段,MongoDB会对每个$match步骤执行此优化,将每个$match筛选器移到该筛选器不依赖的所有投影阶段之前。
Consider a pipeline of the following stages:考虑以下阶段的管道:
{ $addFields: {
maxTime: { $max: "$times" },
minTime: { $min: "$times" }
} },
{ $project: {
_id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: {
name: "Joe Schmoe",
maxTime: { $lt: 20 },
minTime: { $gt: 5 },
avgTime: { $gt: 7 }
} }
The optimizer breaks up the 优化器将$match stage into four individual filters, one for each key in the $match query document. The optimizer then moves each filter before as many projection stages as possible, creating new $match stages as needed. $match阶段分解为四个单独的筛选器,$match查询文档中的每个键对应一个筛选器。然后,优化器将每个筛选器移动到尽可能多的投影阶段之前,根据需要创建新的$match阶段。Given this example, the optimizer produces the following optimized pipeline:给定此示例,优化器将生成以下优化的管道:
{ $match: { name: "Joe Schmoe" } },
{ $addFields: {
maxTime: { $max: "$times" },
minTime: { $min: "$times" }
} },
{ $match: { maxTime: { $lt: 20 }, minTime: { $gt: 5 } } },
{ $project: {
_id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: { avgTime: { $gt: 7 } } }
The $match filter { avgTime: { $gt: 7 } } depends on the $project stage to compute the avgTime field. $match筛选器{ avgTime: { $gt: 7 } }依赖于$project阶段来计算avgTime字段。The $project stage is the last projection stage in this pipeline, so the $match filter on avgTime could not be moved.$project阶段是此管道中的最后一个投影阶段,因此无法移动avgTime上的$match筛选器。
The maxTime and minTime fields are computed in the $addFields stage but have no dependency on the $project stage. maxTime和minTime字段在$addFields阶段中计算,但不依赖于$project阶段。The optimizer created a new 优化器为这些字段上的筛选器创建了一个新的$match stage for the filters on these fields and placed it before the $project stage.$match阶段,并将其放置在 $project阶段之前。
The $match filter { name: "Joe Schmoe" } does not use any values computed in either the $project or $addFields stages so it was moved to a new $match stage before both of the projection stages.$match筛选器{ name: "Joe Schmoe" }不使用在$project或$addFields阶段中计算的任何值,因此在这两个投影阶段之前,它被移到了一个新的$match阶段。
After optimization, the filter 优化后,筛选器{ name: "Joe Schmoe" } is in a $match stage at the beginning of the pipeline. This has the added benefit of allowing the aggregation to use an index on the name field when initially querying the collection. See Improve Performance with Indexes and Document Filters for more information.{ name: "Joe Schmoe" }处于管道开始处的$match阶段。这还有一个额外的好处,即允许聚合在最初查询集合时对name字段使用索引。有关详细信息,请参阅 使用索引和文档筛选器提高性能。
$sort + $match Sequence Optimization序列优化
When you have a sequence with 如果序列的$sort followed by a $match, the $match moves before the $sort to minimize the number of objects to sort. For example, if the pipeline consists of the following stages:$sort后跟$match,则$match会移动到$sort之前,以最大限度地减少要排序的对象数。例如,如果管道由以下阶段组成:
{ $sort: { age : -1 } },
{ $match: { status: 'A' } }
During the optimization phase, the optimizer transforms the sequence to the following:在优化阶段,优化器将序列转换为以下内容:
{ $match: { status: 'A' } },
{ $sort: { age : -1 } }
$redact + $match Sequence Optimization序列优化
When possible, when the pipeline has the 在可能的情况下,当管道的$redact stage immediately followed by the $match stage, the aggregation can sometimes add a portion of the $match stage before the $redact stage. $redact阶段紧接着$match阶段时,聚合有时可以在$redact阶段之前添加$match阶段的一部分。If the added 如果添加的$match stage is at the start of a pipeline, the aggregation can use an index as well as query the collection to limit the number of documents that enter the pipeline. $match阶段位于管道的开头,则聚合可以使用索引以及查询集合来限制进入管道的文档数量。See Improve Performance with Indexes and Document Filters for more information.有关详细信息,请参阅使用索引和文档筛选器提高性能。
For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }
The optimizer can add the same 优化器可以在$match stage before the $redact stage:$redact阶段之前添加相同的$match阶段:
{ $match: { year: 2014 } },
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }
$project/$unset + $skip Sequence Optimization序列优化
When you have a sequence with 当您有一个$project or $unset followed by $skip, the $skip moves before $project. $project或$unset后跟$skip的序列时,$skip会移动到$project之前。For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:
{ $sort: { age : -1 } },
{ $project: { status: 1, name: 1 } },
{ $skip: 5 }
During the optimization phase, the optimizer transforms the sequence to the following:在优化阶段,优化器将序列转换为以下内容:
{ $sort: { age : -1 } },
{ $skip: 5 },
{ $project: { status: 1, name: 1 } }
Pipeline Coalescence Optimization管道聚结优化
When possible, the optimization phase coalesces a pipeline stage into its predecessor. 在可能的情况下,优化阶段将管道阶段合并为其前一阶段。Generally, coalescence occurs after any sequence reordering optimization.通常,合并发生在任何序列重新排序优化之后。
$sort + $limit Coalescence聚结
Changed in version 4.04.0版更改.
When a 当$sort precedes a $limit, the optimizer can coalesce the $limit into the $sort if no intervening stages modify the number of documents (e.g. $unwind, $group). MongoDB will not coalesce the $limit into the $sort if there are pipeline stages that change the number of documents between the $sort and $limit stages.$sort在$limit之前时,如果没有中间阶段修改文档数量(例如$unwind、$group),优化器可以将$limit合并为$sort。如果存在在$sort和$limit阶段之间更改文档数量的管道阶段,MongoDB将不会将$limit合并为$sort。
For example, if the pipeline consists of the following stages:例如,如果管道由以下阶段组成:
{ $sort : { age : -1 } },
{ $project : { age : 1, status : 1, name : 1 } },
{ $limit: 5 }
During the optimization phase, the optimizer coalesces the sequence to the following:在优化阶段,优化器将序列合并为以下内容:
{
"$sort" : {
"sortKey" : {
"age" : -1
},
"limit" : NumberLong(5)
}
},
{ "$project" : {
"age" : 1,
"status" : 1,
"name" : 1
}
}
This allows the sort operation to only maintain the top 这允许排序操作在进行过程中只维护前n results as it progresses, where n is the specified limit, and MongoDB only needs to store n items in memory [1]. n个结果,其中n是指定的限制,MongoDB只需要在内存中存储n个项[1]。See 有关详细信息,请参阅$sort Operator and Memory for more information.$sort运算符和内存。
Sequence Optimization with $skip带$skip的序列优化
If there is a 如果$skip stage between the $sort and $limit stages, MongoDB will coalesce the $limit into the $sort stage and increase the $limit value by the $skip amount. $sort和$limit阶段之间有一个$skip阶段,MongoDB将把$limit合并到$sort阶段,并将$limit值增加$skip值。See 有关示例,请参阅$sort + $skip + $limit Sequence for an example.$sort+$skip+$limit序列。
| [1] | allowDiskUse is true and the n items exceed the aggregation memory limit.allowDiskUse为true并且n个项目超过聚合内存限制时,优化仍然适用。 |
$limit + $limit Coalescence聚结
When a 当一个$limit immediately follows another $limit, the two stages can coalesce into a single $limit where the limit amount is the smaller of the two initial limit amounts. $limit紧跟在另一个$limit之后时,这两个阶段可以合并为一个单一的$limit,其中限制量是两个初始限制量中较小的一个。For example, a pipeline contains the following sequence:例如,管道包含以下序列:
{ $limit: 100 },
{ $limit: 10 }
Then the second 然后第二$limit stage can coalesce into the first $limit stage and result in a single $limit stage where the limit amount 10 is the minimum of the two initial limits 100 and 10.$limit阶段可以合并为第一$limit阶段,并导致单个$limit阶段,其中限定量10是两个初始限定量100和10中的最小值。
{ $limit: 10 }
$skip + $skip Coalescence聚结
When a 当$skip immediately follows another $skip, the two stages can coalesce into a single $skip where the skip amount is the sum of the two initial skip amounts. For example, a pipeline contains the following sequence:$skip紧跟在另一个$skip之后时,这两个阶段可以合并为一个$skip,其中跳过量是两个初始跳过量的总和。例如,管道包含以下序列:
{ $skip: 5 },
{ $skip: 2 }
Then the second 然后第二个$skip stage can coalesce into the first $skip stage and result in a single $skip stage where the skip amount 7 is the sum of the two initial limits 5 and 2.$skip阶段可以合并为第一个$skip阶段,并导致单个$skip阶段,其中跳过量7是两个初始限定量5和2的总和。
{ $skip: 7 }
$match + $match Coalescence聚结
When a 当$match immediately follows another $match, the two stages can coalesce into a single $match combining the conditions with an $and. $match紧跟在另一个$match之后时,这两个阶段可以合并为单个$match,将条件与$and组合在一起。For example, a pipeline contains the following sequence:例如,管道包含以下序列:
{ $match: { year: 2014 } },
{ $match: { status: "A" } }
Then the second 然后,第二个$match stage can coalesce into the first $match stage and result in a single $match stage$match阶段可以合并为第一个$match阶段,并产生一个$match阶段。
{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }
$lookup + $unwind Coalescence聚结
When a 当一个$unwind immediately follows another $lookup, and the $unwind operates on the as field of the $lookup, the optimizer can coalesce the $unwind into the $lookup stage. $unwind紧跟在另一个$lookup之后,并且$unwind在$lookup的as字段上操作时,优化器可以将$unwind合并到$lookup阶段。This avoids creating large intermediate documents.这样可以避免创建大型中间文档。
For example, a pipeline contains the following sequence:例如,管道包含以下序列:
{
$lookup: {
from: "otherCollection",
as: "resultingArray",
localField: "x",
foreignField: "y"
}
},
{ $unwind: "$resultingArray"}
The optimizer can coalesce the 优化器可以将$unwind stage into the $lookup stage. $unwind阶段合并为$lookup阶段。If you run the aggregation with 如果使用explain option, the explain output shows the coalesced stage:explain选项运行聚合,则explain输出将显示合并阶段:
{
$lookup: {
from: "otherCollection",
as: "resultingArray",
localField: "x",
foreignField: "y",
unwinding: { preserveNullAndEmptyArrays: false }
}
}
Slot-Based Query Execution Engine Pipeline Optimizations基于槽的查询执行引擎管道优化
MongoDB can use the slot-based query execution engine to execute certain pipeline stages when specific conditions are met. In most cases, the slot-based execution engine provides improved performance and lower CPU and memory costs compared to the classic query engine.当满足特定条件时,MongoDB可以使用基于slot的查询执行引擎来执行某些管道阶段。在大多数情况下,与传统的查询引擎相比,基于插槽的执行引擎提供了改进的性能以及更低的CPU和内存成本。
To verify that the slot-based execution engine is used, run the aggregation with the 要验证是否使用了基于插槽的执行引擎,请使用explain option. This option outputs information on the aggregation's query plan. explain选项运行聚合。此选项输出有关聚合的查询计划的信息。For more information on using 有关在聚合中使用explain with aggregations, see Return Information on Aggregation Pipeline Operation.explain的详细信息,请参阅返回聚合管道操作的信息。
The following sections describe:以下各节介绍:
The conditions when the slot-based execution engine is used for aggregation.基于插槽的执行引擎用于聚合时的条件。How to verify if the slot-based execution engine was used.如何验证是否使用了基于插槽的执行引擎。
$group Optimization优化
New in version 5.2. 5.2版新增。
Starting in version 5.2, MongoDB uses the slot-based execution query engine to execute 从5.2版本开始,MongoDB使用基于槽的执行查询引擎来执行$group stages if either:$group阶段,如果:
$groupis the first stage in the pipeline.是管道中的第一阶段。All preceding stages in the pipeline can also be executed by the slot-based execution engine.管道中的所有先前阶段也可以由基于槽的执行引擎执行。
When the slot-based query execution engine is used for 当基于槽的查询执行引擎用于$group, the explain results include:$group时,解释结果包括:
explain.explainVersion: '2'queryPlanner.winningPlan.queryPlan.stage: "GROUP"The location of thequeryPlannerobject depends on whether the pipeline contains stages after the$groupstage which cannot be executed using the slot-based execution engine.queryPlanner对象的位置取决于管道中$group阶段之后是否包含无法使用基于槽的执行引擎执行的阶段。If如果$groupis the last stage or all stages after$groupcan be executed using the slot-based execution engine, thequeryPlannerobject is in the top-levelexplainoutput object (explain.queryPlanner).$group是最后一个阶段,或者$group之后的所有阶段都可以使用基于槽的执行引擎执行,则queryPlanner对象位于顶层解释输出对象(explain.queryPlanner)中。If the pipeline contains stages after如果管道在$groupwhich cannot be executed using the slot-based execution engine, thequeryPlannerobject is inexplain.stages[0].$cursor.queryPlanner.$group之后包含无法使用基于槽的执行引擎执行的阶段,则queryPlanner对象处于explain.stages[0].$cursor.queryPlanner中。
$lookup Optimization优化
New in version 6.0. 6.0版新增。
Starting in version 6.0, MongoDB can use the slot-based execution query engine to execute 从6.0版本开始,MongoDB可以使用基于插槽的执行查询引擎来执行$lookup stages if all preceding stages in the pipeline can also be executed by the slot-based execution engine and none of the following conditions are true:$lookup阶段,如果管道中的所有前面的阶段也可以由基于插槽的运行引擎来执行,并且以下条件都不成立:
The$lookupoperation executes a pipeline on a joined collection.$lookup操作在联接的集合上执行管道。To see an example of this kind of operation, see Join Conditions and Subqueries on a Joined Collection.要查看此类操作的示例,请参阅相交集合上的相交条件和子查询。The$lookup'slocalFieldorforeignFieldspecify numeric components.$lookup的localField或foreignField指定数字组件。For example:例如:{ localField: "restaurant.0.review" }.The管道中任何fromfield of any$lookupin the pipeline specifies a view or sharded collection.$lookup的from字段都指定了一个视图或分片集合。
When the slot-based query execution engine is used for 当基于槽的查询执行引擎用于$lookup, the explain results include:$lookup时,解释结果包括:
explain.explainVersion: '2'queryPlanner.winningPlan.queryPlan.stage: "EQ_LOOKUP".EQ_LOOKUPmeans "equality lookup".意思是“相等查找”。The location of thequeryPlannerobject depends on whether the pipeline contains stages after the$lookupstage which cannot be executed using the slot-based execution engine.queryPlanner对象的位置取决于管道中$lookup阶段之后是否包含无法使用基于槽的执行引擎执行的阶段。If如果$lookupis the last stage or all stages after$lookupcan be executed using the slot-based execution engine, thequeryPlannerobject is in the top-levelexplainoutput object (explain.queryPlanner).$lookup是最后一个阶段,或者$lookup之后的所有阶段都可以使用基于槽的执行引擎执行,则queryPlanner对象位于顶层explain输出对象(explain.queryPlanner)中。If the pipeline contains stages after如果管道在$lookupwhich cannot be executed using the slot-based execution engine, thequeryPlannerobject is inexplain.stages[0].$cursor.queryPlanner.$lookup之后包含无法使用基于槽的执行引擎执行的阶段,则queryPlanner对象处于explain.stages[0].$cursor.queryPlanner中。
Improve Performance with Indexes and Document Filters使用索引和文档筛选器提高性能
The following sections show how you can improve aggregation performance using indexes and document filters.以下部分展示了如何使用索引和文档筛选器来提高聚合性能。
Indexes索引
An aggregation pipeline can use indexes from the input collection to improve performance. 聚合管道可以使用输入集合中的索引来提高性能。Using an index limits the amount of documents a stage processes. Ideally, an index can cover the stage query. 使用索引会限制阶段处理的文档数量。理想情况下,索引可以覆盖阶段查询。A covered query has especially high performance, since the index returns all matching documents.覆盖查询具有特别高的性能,因为索引返回所有匹配的文档。
For example, a pipeline that consists of 例如,由$match, $sort, $group can benefit from indexes at every stage:$match、$sort、$group组成的管道可以从每个阶段的索引中获益:
An index on the$matchquery field efficiently identifies the relevant data$match查询字段上的索引可以有效地识别相关数据An index on the sorting field returns data in sorted order for the排序字段上的索引按$sortstage$sort阶段的排序顺序返回数据An index on the grouping field that matches the分组字段上与$sortorder returns all of the field values needed for the$groupstage, making it a covered query.$sort顺序匹配的索引返回$group阶段所需的所有字段值,使其成为一个覆盖查询。
To determine whether a pipeline uses indexes, review the query plan and look for 要确定管道是否使用索引,请查看查询计划并查找IXSCAN or DISTINCT_SCAN plans.IXSCAN或DISTINCT_SCAN计划。
In some cases, the query planner uses a 在某些情况下,查询计划器使用DISTINCT_SCAN index plan that returns one document per index key value. DISTINCT_SCAN索引计划,该索引计划为每个索引键值返回一个文档。如果每个键值有多个文档,则DISTINCT_SCAN executes faster than IXSCAN if there are multiple documents per key value. DISTINCT_SCAN执行速度比IXSCAN快。However, index scan parameters might affect the time comparison of 但是,索引扫描参数可能会影响DISTINCT_SCAN and IXSCAN.DISTINCT_SCAN和IXSCAN的时间比较。
For early stages in your aggregation pipeline, consider indexing the query fields. Stages that can benefit from indexes are:对于聚合管道的早期阶段,请考虑对查询字段进行索引。可以从索引中获益的阶段有:
$matchstage阶段During the在$matchstage, the server can use an index if$matchis the first stage in the pipeline, after any optimizations from the query planner.$match阶段,如果$match是管道中的第一个阶段,则在查询计划器进行任何优化之后,服务器可以使用索引。$sortstage阶段During the在$sortstage, the server can use an index if the stage is not preceded by a$project,$unwind, or$groupstage.$sort阶段期间,如果阶段前面没有$project、$unwind或$group阶段,则服务器可以使用索引。$groupstage阶段-
During the在$groupstage, the server can use an index to quickly find the$firstor$lastdocument in each group if the stage meets both of these conditions:$group阶段,如果该阶段同时满足以下两个条件,则服务器可以使用索引快速查找每组中的$first或$last文档:The pipeline管道按同一字段进行sortsandgroupsby the same field.sorts和groups。The$groupstage only uses the$firstor$lastaccumulator operator.$group阶段只使用$first或$last累加器运算符。
See $group Performance Optimizations for an example.有关示例,请参阅$group性能优化。 $geoNearstage阶段The server always uses an index for the服务器始终使用$geoNearstage, since it requires a geospatial index.$geoNear阶段的索引,因为它需要地理空间索引。
Additionally, stages later in the pipeline that retrieve data from other, unmodified collections can use indexes on those collections for optimization. These stages include:此外,管道中稍后从其他未修改集合检索数据的阶段可以使用这些集合上的索引进行优化。这些阶段包括:
Document Filters文档筛选器
If your aggregation operation requires only a subset of the documents in a collection, filter the documents first:如果聚合操作只需要集合中文档的子集,请首先筛选文档:
Use the使用$match,$limit, and$skipstages to restrict the documents that enter the pipeline.$match、$limit和$skip阶段来限制进入管道的文档。When possible, put如果可能,将$matchat the beginning of the pipeline to use indexes that scan the matching documents in a collection.$match放在管道的开头,以使用索引扫描集合中的匹配文档。在管道开始时$matchfollowed by$sortat the start of the pipeline is equivalent to a single query with a sort, and can use an index.$match后跟$sort相当于一个带有排序的单个查询,并且可以使用索引。
Example实例
$sort + $skip + $limit Sequence序列
A pipeline contains a sequence of 管道包含一个$sort followed by a $skip followed by a $limit:$sort后面跟着$skip后面跟着$limit的序列:
{ $sort: { age : -1 } },
{ $skip: 10 },
{ $limit: 5 }
The optimizer performs 优化器执行$sort + $limit Coalescence to transforms the sequence to the following:$sort+$limit聚结,将序列转换为以下内容:
{
"$sort" : {
"sortKey" : {
"age" : -1
},
"limit" : NumberLong(15)
}
},
{
"$skip" : NumberLong(10)
}
MongoDB increases the MongoDB通过重新排序增加了$limit amount with the reordering.$limit金额。
See also: 另请参阅:
explain option in the db.collection.aggregate()db.collection.aggregate()中的explain选项