Using Explain Plans使用解释计划
When using the MongoDB Query Language (MQL) to develop queries, it is important to view the explain plan for a query to determine if you've used the appropriate index and if you need to optimise other aspects of the query or the data model. 当使用MongoDB查询语言(MQL)开发查询时,重要的是查看查询的解释计划,以确定您是否使用了适当的索引,以及是否需要优化查询或数据模型的其他方面。An explain plan allows you to fully understand the performance implications of the query you have created.通过解释计划,您可以完全理解所创建查询的性能影响。
The same applies to aggregation pipelines and the ability to view an explain plan for the executed pipeline. 这同样适用于聚合管道以及查看已执行管道的解释计划的能力。However, with aggregations, an explain plan tends to be even more critical because considerably more complex logic can be assembled and run in the database. 然而,对于聚合,解释计划往往更为关键,因为可以在数据库中组装和运行相当复杂的逻辑。There are far more opportunities for performance bottlenecks to occur, requiring optimisation.出现性能瓶颈的机会要多得多,需要进行优化。
The MongoDB database engine will do its best to apply its own aggregation pipeline optimisations at runtime. Nevertheless, there could be some optimisations that only you can make. MongoDB数据库引擎将尽最大努力在运行时应用自己的聚合管道优化。尽管如此,还是可能有一些只有你才能做出的优化。A database engine should never optimise a pipeline in such a way as to risk changing the functional behaviour and outcome of the pipeline. 数据库引擎不应以可能改变管道功能行为和结果的方式优化管道。The database engine doesn't always have the extra context that your brain has, relating to the actual business problem at hand. 数据库引擎并不总是拥有大脑所拥有的与手头实际业务问题相关的额外上下文。It may not be able to make some types of judgement calls about what pipeline changes to apply to make it run faster. 它可能无法对要应用哪些管道更改以使其运行更快做出某些类型的判断。The availability of an explain plan for aggregations enables you to bridge this gap. 聚合解释计划的可用性使您能够弥合这一差距。It allows you to understand the database engine's applied optimisations and detect further potential optimisations you can manually implement in the pipeline.它允许您了解数据库引擎应用的优化,并检测您可以在管道中手动实现的进一步潜在优化。
Viewing An Explain Plan查看解释计划
To view the explain plan for an aggregation pipeline, you can execute commands such as the following:要查看聚合管道的解释计划,可以执行以下命令:
db.coll.explain().aggregate([{"$match": {"name": "Jo"}}]);
In this book, you will already have seen the convention used to firstly define a separate variable for the pipeline, followed by the call to the 在这本书中,您已经看到了用于首先为管道定义单独变量的约定,然后调用aggregate()
function, passing in the pipeline argument, as shown here:aggregate()
函数,传入管道参数,如下所示:
db.coll.aggregate(pipeline);
By adopting this approach, it's easier for you to use the same pipeline definition interchangeably with different commands. 通过采用这种方法,可以更容易地将相同的管道定义与不同的命令互换使用。Whilst prototyping and debugging a pipeline, it is handy for you to be able to quickly switch from executing the pipeline to instead generating the explain plan for the same defined pipeline, as follows:在对管道进行原型设计和调试时,您可以很方便地从执行管道快速切换到为相同定义的管道生成解释计划,如下所示:
db.coll.explain().aggregate(pipeline);
As with MQL, there are three different verbosity modes that you can generate an explain plan with, as shown below:与MQL一样,有三种不同的详细程度模式可用于生成解释计划,如下所示:
// QueryPlanner verbosity (default if no verbosity parameter provided)
db.coll.explain("queryPlanner").aggregate(pipeline);
// ExecutionStats verbosity
db.coll.explain("executionStats").aggregate(pipeline);
// AllPlansExecution verbosity
db.coll.explain("allPlansExecution").aggregate(pipeline);
In most cases, you will find that running the 在大多数情况下,您会发现运行executionStats
variant is the most informative mode. executionStats
变体是信息量最大的模式。Rather than showing just the query planner's thought process, it also provides actual statistics on the "winning" execution plan (e.g. the total keys examined, the total docs examined, etc.). 它不仅显示了查询计划器的思维过程,还提供了关于“获胜”执行计划的实际统计信息(例如,检查的总键、检查的总文档等)。However, this isn't the default because it actually executes the aggregation in addition to formulating the query plan. If the source collection is large or the pipeline is suboptimal, it will take a while to return the explain plan result.然而,这不是默认的,因为它除了制定查询计划之外,还实际执行聚合。如果源集合很大或管道不理想,则需要一段时间才能返回解释计划结果。
Note, the aggregate() function also provides a vestigial 注意,aggregate()函数还提供了一个残余explain
option to ask for an explain plan to be generated and returned. explain
选项,用于要求生成并返回解释计划。Nonetheless, this is more limited and cumbersome to use, so you should avoid it.尽管如此,这是更有限和繁琐的使用,所以你应该避免它。
Understanding The Explain Plan理解解释计划
To provide an example, let us assume a shop's data set includes information on each customer and what retail orders the customer has made over the years. 为了提供一个例子,让我们假设一家商店的数据集包括每个客户的信息,以及客户多年来的零售订单。The customer orders collection contains documents similar to the following example:客户订单集合包含与以下示例类似的文档:
{
"customer_id": "elise_smith@myemail.com",
"orders": [
{
"orderdate": ISODate("2020-01-13T09:32:07Z"),
"product_type": "GARDEN",
"value": NumberDecimal("99.99")
},
{
"orderdate": ISODate("2020-05-30T08:35:52Z"),
"product_type": "ELECTRONICS",
"value": NumberDecimal("231.43")
}
]
}
You've defined an index on the 您已经在customercustomer_id
field. _id
字段上定义了一个索引。You create the following aggregation pipeline to show the three most expensive orders made by a customer whose ID is 您创建以下聚合管道来显示ID为tonijones@myemail.com
, as shown below:tonijones@myemail.com
,如下所示:
var pipeline = [
// Unpack each order from customer orders array as a new separate record将客户订单数组中的每个订单拆包为一个新的单独记录
{"$unwind": {
"path": "$orders",
}},
// Match on only one customer只匹配一个客户
{"$match": {
"customer_id": "tonijones@myemail.com",
}},
// Sort customer's purchases by most expensive first
{"$sort" : {
"orders.value" : -1,
}},
// Show only the top 3 most expensive purchases仅显示最昂贵的前3项购买
{"$limit" : 3},
// Use the order's value as a top level field将订单的值用作顶级字段
{"$set": {
"order_value": "$orders.value",
}},
// Drop the document's id and orders sub-document from the results从结果中删除文档的id并订购子文档
{"$unset" : [
"_id",
"orders",
]},
];
Upon executing this aggregation against an extensive sample data set, you receive the following result:对大量样本数据集执行此聚合后,您将收到以下结果:
[
{
customer_id: 'tonijones@myemail.com',
order_value: NumberDecimal("1024.89")
},
{
customer_id: 'tonijones@myemail.com',
order_value: NumberDecimal("187.99")
},
{
customer_id: 'tonijones@myemail.com',
order_value: NumberDecimal("4.59")
}
]
You then request the query planner part of the explain plan:然后,您请求解释计划中的查询计划部分:
db.customer_orders.explain("queryPlanner").aggregate(pipeline);
The query plan output for this pipeline shows the following (excluding some information for brevity):该管道的查询计划输出显示以下内容(为简洁起见,排除了一些信息):
stages: [
{
'$cursor': {
queryPlanner: {
parsedQuery: { customer_id: { '$eq': 'tonijones@myemail.com' } },
winningPlan: {
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { customer_id: 1 },
indexName: 'customer_id_1',
direction: 'forward',
indexBounds: {
customer_id: [
'["tonijones@myemail.com", "tonijones@myemail.com"]'
]
}
}
},
}
}
},
{ '$unwind': { path: '$orders' } },
{ '$sort': { sortKey: { 'orders.value': -1 }, limit: 3 } },
{ '$set': { order_value: '$orders.value' } },
{ '$project': { _id: false, orders: false } }
]
You can deduce some illuminating insights from this query plan:您可以从这个查询计划中推断出一些启发性的见解:
-
To optimise the aggregation, the database engine has reordered the pipeline positioning the filter belonging to the为了优化聚合,数据库引擎对管道进行了重新排序,将属于$match
to the top of the pipeline.$match
的筛选器定位到管道的顶部。The database engine moves the content of数据库引擎在不更改聚合的功能行为或结果的情况下,将$match
ahead of the$unwind
stage without changing the aggregation's functional behaviour or outcome.$match
的内容移动到$unload
阶段之前。 -
The first stage of the database optimised version of the pipeline is an internal数据库优化版本的管道的第一个阶段是内部$cursor
stage, regardless of the order you placed the pipeline stages in.$cursor
阶段,与管道阶段的顺序无关。The$cursor
runtime stage is always the first action executed for any aggregation.$cursor
运行时阶段始终是为任何聚合执行的第一个操作。Under the covers, the aggregation engine reuses the MQL query engine to perform a "regular" query against the collection, with a filter based on the aggregation's在幕后,聚合引擎重用MQL查询引擎来对集合执行“常规”查询,并使用基于聚合的$match
contents.$match
内容的筛选器。The aggregation runtime uses the resulting query cursor to pull batches of records.聚合运行时使用生成的查询游标来提取一批记录。This is similar to how a client application with a MongoDB driver uses a query cursor when remotely invoking an MQL query to pull batches.这类似于具有MongoDB驱动程序的客户端应用程序在远程调用MQL查询以提取批次时使用查询游标的方式。As with a normal MQL query, the regular database query engine will try to use an index if it makes sense.与普通MQL查询一样,如果有意义,则常规数据库查询引擎将尝试使用索引。In this case an index is indeed leveraged, as is visible in the embedded在这种情况下,确实利用了索引,如嵌入的$queryPlanner
metadata, showing the"stage" : "IXSCAN"
element and the index used,"indexName" : "customer_id_1"
.$queryPlanner
元数据中所示,其中显示了"stage" : "IXSCAN"
元素和使用的索引"indexName" : "customer_id_1"
。 -
To further optimise the aggregation, the database engine has collapsed the为了进一步优化聚合,数据库引擎将$sort
and$limit
into a single special internal sort stage which can perform both actions in one go.$sort
和$limit
分解为一个特殊的内部排序阶段,该阶段可以一次性执行这两个操作。In this situation, during the sorting process, the aggregation engine only has to track the current three most expensive orders in memory.在这种情况下,在排序过程中,聚合引擎只需要跟踪内存中当前最昂贵的三个订单。It does not have to hold the whole data set in memory when sorting, which may otherwise be resource prohibitive in many scenarios, requiring more RAM than is available.排序时,它不必将整个数据集保存在内存中,否则在许多情况下,这可能会导致资源限制,需要比可用内存更多的RAM。
You might also want to see the execution stats part of the explain plan. 您可能还想在解释计划中查看执行统计信息。The specific new information shown in executionStats
, versus the default of queryPlanner
, is identical to the normal MQL explain plan returned for a regular find()
operation. executionStats
中显示的特定新信息与queryPlanner
的默认信息相比,与为常规find()
操作返回的普通MQL解释计划相同。Consequently, for aggregations, similar principles to MQL apply for spotting things like "have I used the optimal index?" and "does my data model lend itself to efficiently processing this query?".因此,对于聚合,类似于MQL的原则适用于发现诸如“我是否使用了最佳索引?”和“我的数据模型是否有助于有效处理此查询?”之类的问题。
You ask for the execution stats part of the explain plan:您询问解释计划中的执行统计数据部分:
db.customer_orders.explain("executionStats").aggregate(pipeline);
Below is a redacted example of the output you will see, highlighting some of the most relevant metadata elements you should generally focus on.下面是您将看到的输出的一个经过编辑的示例,突出显示了您通常应该关注的一些最相关的元数据元素。
executionStats: {
nReturned: 1,
totalKeysExamined: 1,
totalDocsExamined: 1,
executionStages: {
stage: 'FETCH',
nReturned: 1,
works: 2,
advanced: 1,
docsExamined: 1,
inputStage: {
stage: 'IXSCAN',
nReturned: 1,
works: 2,
advanced: 1,
keyPattern: { customer_id: 1 },
indexName: 'customer_id_1',
direction: 'forward',
indexBounds: {
customer_id: [
'["tonijones@myemail.com", "tonijones@myemail.com"]'
]
},
keysExamined: 1,
}
}
}
Here, this part of the plan also shows that the aggregation uses the existing index. 在这里,计划的这一部分还显示了聚合使用现有索引。Because 由于totalKeysExamined
and totalDocsExamined
match, the aggregation fully leverages this index to identify the required records, which is good news. totalKeysExamined
和totalDocsExamined
匹配,聚合完全利用该索引来识别所需的记录,这是个好消息。Nevertheless, the targeted index doesn't necessarily mean the aggregation's query part is fully optimised. 尽管如此,目标索引并不一定意味着聚合的查询部分得到了充分优化。For example, if there is the need to reduce latency further, you can do some analysis to determine if the index can completely cover the query. 例如,如果需要进一步减少延迟,可以进行一些分析,以确定索引是否可以完全覆盖查询。Suppose the cursor query part of the aggregation is satisfied entirely using the index and does not have to examine any raw documents. 假设聚合的游标查询部分完全使用索引来满足,并且不必检查任何原始文档。In that case, you will see 在这种情况下,您将在解释计划中看到totalDocsExamined: 0
in the explain plan.totalDocsExamined: 0
。