Sharding Considerations分片注意事项
MongoDB Sharding isn't just an effective way to scale out your database to hold more data and support higher transactional throughput. MongoDB 分片不仅仅是扩展数据库以容纳更多数据和支持更高事务吞吐量的有效方法。Sharding also helps you scale out your analytical workloads, potentially enabling aggregations to complete far quicker. 分片还可以帮助您扩展分析工作负载,有可能使聚合更快地完成。Depending on the nature of your aggregation and some adherence to best practices, the cluster may execute parts of the aggregation in parallel over multiple shards for faster completion.根据聚合的性质和对最佳实践的遵守,集群可能会在多个分片上并行执行聚合的部分,以更快地完成。
There is no difference between a replica set and a sharded cluster regarding the functional capabilities of the aggregations you build, except for a minimal set of constraints. 副本集和分片集群在构建聚合的功能方面没有区别,只有一组最小的约束。This chapter's Sharded Aggregation Constraints section outlines these constraints. 本章的分片聚合约束部分概述了这些约束。When it comes to optimising your aggregations, in most cases, there will be little to no difference in the structure of a pipeline when refactoring for performance on a sharded cluster compared to a simple replica set. 当涉及到优化聚合时,在大多数情况下,与简单的副本集相比,在分片集群上进行性能重构时,管道的结构几乎没有差异。You should always adhere to the advice outlined in the chapter Pipeline Performance Considerations. 您应该始终遵守管道性能注意事项一章中列出的建议。The aggregation runtime takes care of distributing the appropriate parts of your pipeline to each shard that holds the required data. 聚合运行时负责将管道的适当部分分发到保存所需数据的每个分片。The runtime then transparently coalesces the results from these shards in the most optimal way possible. 然后,运行时以最优化的方式透明地合并这些分片的结果。Nevertheless, it is worth understanding how the aggregation engine distributes work and applies its sharded optimisations in case you ever suffer a performance problem and need to dig deeper into why.尽管如此,如果您遇到性能问题,需要更深入地了解原因,那么了解聚合引擎是如何分配工作并应用其分片优化的是值得的。
Brief Summary Of Sharded Clusters分片集群简介
In a sharded cluster, you partition a collection of data across multiple shards, where each shard runs on a separate set of host machines. 在一个分片集群中,您将数据集合划分为多个分片,其中每个分片在一组单独的主机上运行。You control how the system distributes the data by defining a shard key rule. 您可以通过定义分片键规则来控制系统如何分发数据。Based on the shard key of each document, the system groups subsets of documents together into "chunks", where a range of shard key values identifies each chunk. 根据每个文档的分片键,系统将文档的子集分组为“chunk”,其中一系列分片键值标识每个chunk。The cluster balances these chunks across its shards.集群在其分片中平衡这些块。
In addition to holding sharded collections in a database, you may also be storing unsharded collections in the same database. 除了在数据库中保存分片集合外,您还可能将未排序的集合存储在同一数据库中。All of a database's unsharded collections live on one specific shard in the cluster, designated as the "primary shard" for the database (not to be confused with a replica set's "primary replica"). 数据库的所有未排序集合都位于集群中的一个特定分片上,该分片被指定为数据库的“主分片”(不要与副本集的“主副本”混淆)。The diagram below shows the relationship between a database's collections and the shards in the cluster.下图显示了数据库的集合和集群中的分片之间的关系。
One or more deployed mongos processes act as a reverse proxy, routing read and write operations from the client application to the appropriate shards. 一个或多个已部署的mongos
进程充当反向代理,将读写操作从客户端应用程序路由到适当的分片。For document write operations (i.e. create, update, delete), a mongos router knows which shard the document lives on and routes the operation to that specific shard. 对于文档写入操作(即创建、更新、删除),mongos
路由器知道文档位于哪个分片上,并将操作路由到该特定分片。For read operations, if the query includes the shard key, the mongos knows which shards hold the required documents to route the query to (called "targeting"). 对于读取操作,如果查询包含分片键,mongos
就会知道哪些分片保存了将查询路由到的所需文档(称为“目标”)。If the query does not include the shard key, it sends the query to all shards using a "scatter/gather" pattern (called "broadcasting"). 如果查询不包括分片键,它会使用“散射/聚集”模式(称为“广播”)将查询发送到所有分片。These are the rules for sharded reads and writes, but the approach for sharded aggregations requires a deeper explanation. 这些是分片读取和写入的规则,但分片聚合的方法需要更深入的解释。Consequently, the rest of this chapter outlines how a sharded cluster handles the routing and execution of aggregations.因此,本章的其余部分概述了分片集群如何处理聚合的路由和执行。
Sharded Aggregation Constraints分片聚合约束
Some of MongoDB's stages only partly support sharded aggregations depending on which version of MongoDB you are running. These stages all happen to reference a second collection in addition to the pipeline's source input collection. MongoDB的某些阶段仅部分支持分片聚合,这取决于您正在运行的MongoDB版本。除了管道的源输入集合之外,这些阶段都碰巧引用了第二个集合。In each case, the pipeline can use a sharded collection as its source, but the second collection referenced must be unsharded (for earlier MongoDB versions, at least). 在每种情况下,管道都可以使用一个分片集合作为其源,但引用的第二个集合必须是非分片的(至少对于早期的MongoDB版本)。The affected stages and versions are:受影响的阶段和版本包括:
-
$lookup
.In MongoDB versions prior to 5.1, the other referenced collection to join with must be unsharded.在5.1之前的MongoDB版本中,要加入的其他引用集合必须取消排序。 -
$graphLookup
.In MongoDB versions prior to 5.1, the other referenced collection to recursively traverse must be unsharded.在5.1之前的MongoDB版本中,要递归遍历的其他引用集合必须取消排序。 -
$out
.In all MongoDB versions, the other referenced collection used as the destination of the aggregation's output must be unsharded.在所有MongoDB版本中,用作聚合输出目的地的其他引用集合必须取消排序。However, you can use a但是,您可以使用$merge
stage instead to output the aggregation result to a sharded collection.$merge
阶段将聚合结果输出到分片集合。
Where Does A Sharded Aggregation Run?分片聚合在哪里运行?
Sharded clusters provide the opportunity to reduce the response times of aggregations. 分片集群提供了减少聚合响应时间的机会。For example, there may be an unsharded collection containing billions of documents where it takes 60 seconds for an aggregation pipeline to process all this data. 例如,可能有一个包含数十亿文档的无记录集合,聚合管道处理所有这些数据需要60秒。Instead, suppose a cluster of four shards is hosting this same collection of evenly balanced data. 相反,假设一个由四个分片组成的集群承载着相同的均衡数据集合。Depending on the nature of the aggregation, it may be possible for the cluster to execute the aggregation's pipeline concurrently on each shard. 根据聚合的性质,集群可能会在每个分片上同时执行聚合的管道。Consequently, the same aggregation's total data processing time may be closer to 15 seconds. 因此,相同聚合的总数据处理时间可能接近15秒。However, this won't always be the case because certain types of pipelines will demand combining substantial amounts of data from multiple shards for further processing. 然而,情况并非总是如此,因为某些类型的管道需要将来自多个分片的大量数据组合起来进行进一步处理。The aggregation's response time could go in the opposite direction in such circumstances, completing in far longer than 60 seconds due to the significant network transfer and marshalling overhead. 在这种情况下,聚合的响应时间可能会朝着相反的方向发展,由于大量的网络传输和编组开销,聚合在60秒内完成。
Pipeline Splitting At Runtime运行时的管道拆分
A sharded cluster will attempt to execute as many of a pipeline's stages as possible, in parallel, on each shard containing the required data. 分片集群将尝试在包含所需数据的每个分片上并行执行尽可能多的管道阶段。However, certain types of stages must operate on all the data in one place. 但是,某些类型的阶段必须在一个地方对所有数据进行操作。Specifically, these are the sorting and grouping stages, collectively referred to as the "blocking stages" (described in the chapter Pipeline Performance Considerations). 具体来说,这些是排序和分组阶段,统称为“阻塞阶段”(在管道性能注意事项一章中描述)。Upon the first occurrence of a blocking stage in the pipeline, the aggregation engine will split the pipeline into two parts at the point where the blocking stage occurs. 在管道中第一次出现阻塞阶段时,聚合引擎将在阻塞阶段出现的点将管道拆分为两部分。The Aggregation Framework refers to the first section of the divided pipeline as the "Shards Part", which can run concurrently on multiple shards. 聚合框架将划分的管道的第一部分称为“分片部分”,它可以在多个分片上同时运行。The remaining portion of the split pipeline is called the "Merger Part", which executes in one location. 拆分管道的剩余部分被称为“合并部分”,在一个位置执行。The following illustration shows how this pipeline division occurs.下图显示了这种管道划分是如何发生的。
One of the two stages which causes a split, shown as stage 3, is a 导致拆分的两个阶段之一,如第3阶段所示,是$group
stage. $group
阶段。The same behaviour actually occurs with all grouping stages, specifically 实际上,所有分组阶段都会发生相同的行为,特别是$bucket
, $bucketAuto
, $count
and $sortByCount
. $bucket
、$bucketAuto
、$count
和$sortByCount
。Therefore any mention of the 因此,本章中提到的$group
stage in this chapter is synonymous with all of these grouping stages. $group
阶段与所有这些分组阶段都是同义的。
You can see two examples of aggregation pipeline splitting in action in the MongoDB Shell screenshots displayed below, showing each pipeline and its explain plan. 您可以在下面显示的MongoDB Shell屏幕截图中看到聚合管道拆分的两个示例,其中显示了每个管道及其解释计划。The cluster contains four shards ("s0", "s1", "s2" and "s3") which hold the distributed collection. The two example aggregations perform the following actions respectively:集群包含四个分片(“s0”、“s1”、“s2”和“s3”),它们保存着分布式集合。两个示例聚合分别执行以下操作:
-
Sharded sort, matching on shard key values and limiting the number of results分片排序,匹配分片键值并限制结果数量 -
Sharded group, matching on non-shard key values with分片组,在非分片键值上与allowDiskUse:true
and showing the total number of records per groupallowDiskUse:true
匹配,并显示每个组的记录总数
You can observe some interesting behaviours from these two explain plans:你可以从这两个解释计划中观察到一些有趣的行为:
-
Shards Part Of Pipeline Running In Parallel.并行运行的管道的分片部分。In both cases, the pipeline's在这两种情况下,管道的分片sPart都在多个分片上执行,如解释计划底部的分片s数组字段所示。shardsPart
executes on multiple shards, as indicated in the shards array field at the base of the explain plan.In the first example, the aggregation runtime targets only three shards.在第一个示例中,聚合运行时仅针对三个分片。However, in the second example, the runtime must broadcast the pipeline's然而,在第二个例子中,运行时必须广播管道的shardsPart
to run on all shards - the section Execution Of The Shards Part Of The Split Pipeline in this chapter discusses why.shardsPart
才能在所有分片上运行——本章中的分割管道的分片部分的执行一节讨论了原因。 -
Optimisations Applied In Shards Part.分片部分的优化应用。For the对于管道拆分的$sort
or$group
blocking stages where the pipeline splits, the blocking stage divides into two.$sort
或$group
阻塞阶段,阻塞阶段分为两个。The runtime executes the first phase of the blocking stage as the last stage of the运行时执行阻塞阶段的第一阶段,作为划分管道的shardsPart
of the divided pipeline.shardsPart
的最后阶段。It then completes the stage's remaining work as the first stage of the然后,它完成该阶段的剩余工作,作为mergerPart
. For a$sort
stage, this means the cluster conducts a large portion of the sorting work in parallel on all shards, with a remaining "merge sort" occurring at the final location.mergePart
的第一阶段。对于$sort
阶段,这意味着集群对所有分片并行执行大部分排序工作,剩下的“合并排序”发生在最终位置。For a对于$group
stage, the cluster performs the grouping in parallel on every shard, accumulating partial sums and totals ahead of its final merge phase.$group
阶段,集群对每个分片并行执行分组,在其最终合并阶段之前累积部分和和总数。Consequently, the runtime does not have to ship masses of raw ungrouped data from the source shards to where the runtime merges the partially formed groups.因此,运行时不必将大量未分组的原始数据从源分片运送到运行时合并部分形成的组的地方。 -
Merger Part Running From A Single Location.合并部件从单个位置运行。The specific location where the runtime executes the pipeline's运行时执行管道的mergerPart
stages depends on several variables.mergePart
阶段的具体位置取决于几个变量。The explain plan shows the location chosen by the runtime in the解释计划显示运行时在其输出的mergeType
field of its output.mergeType
字段中选择的位置。In these two examples, the locations are在这两个例子中,位置分别是mongos
andanyShard
, respectively.mongos
和anyShard
。This chapter's Execution Of The Merger Part Of The Split Pipeline section outlines the rules that the aggregation runtime uses to decide this location.本章的合并的执行——拆分管道的一部分部分概述了聚合运行时用于决定此位置的规则。 -
Final Merge Sorting When The Sort Stage Is Split.拆分排序阶段时的最终合并排序。The第一个管道的mergePart中显示的$sort
's final phase shown in themergerPart
of the first pipeline is not a blocking operation, whereas, with$group
shown in the second pipeline,$group
's final phase is blocking.$sort
的最后阶段不是阻塞操作,而第二个管道中显示的是$group
,$group
的最后阶段是阻塞。This chapter's Difference In Merging Behaviour For Grouping Vs Sorting section discusses why.本章的分组与排序的合并行为差异部分讨论了原因。
Unfortunately, if you are running your aggregations in MongoDB versions 4.2 to 5.2, the explain plan generated by the aggregation runtime erroneously neglects to log the final phase of the不幸的是,如果您在MongoDB 4.2到5.2版本中运行聚合,那么聚合运行时生成的解释计划错误地忽略了在管道的$sort
stage in the pipeline'smergerPart
.mergePart
中记录$sort
阶段的最后阶段。This is caused by a now fixed explain plan bug but rest assured that the final phase of the这是由一个现已修复的解释计划错误引起的,但请放心,在所有MongoDB版本中,$sort
stage (the "merge sort") does indeed happen in the pipeline'smergerPart
in all the MongoDB versions.$sort
阶段的最后阶段(“合并排序”)确实发生在管道的mergerPart
中。
Execution Of The Shards Part Of The Split Pipeline分割管道分片部分的执行
When a mongos receives a request to execute an aggregation pipeline, it needs to determine where to target the shards part of the pipeline. 当mongos
收到执行聚合管道的请求时,它需要确定管道的分片部分的目标位置。It will endeavour to run this on the relevant subset of shards rather than broadcasting the work to all. 它将努力在相关的分片子集上运行这项工作,而不是将工作广播给所有人。
Suppose there is a 假设在管道的开头出现了$match
stage occurring at the start of the pipeline. $match
阶段。If the filter for this 如果这个$match
includes the shard key or a prefix of the shard key, the mongos can perform a targeted operation. $match
的筛选器包括分片键或分片键的前缀,mongos
可以执行有针对性的操作。It routes the shards part of the split pipeline to execute on the applicable shards only.它路由分割管道的分片部分,以便仅在适用的分片上执行。
Furthermore, suppose the runtime establishes that the 此外,假设运行时确定$match
's filter contains an exact match on a shard key value for the source collection. $match
的筛选器包含与源集合的分片键值完全匹配的内容。In that case, the pipeline can target a single shard only, and doesn't even need to split the pipeline into two. 在这种情况下,管道只能针对单个分片,甚至不需要将管道一分为二。The entire pipeline runs in one place, on the one shard where the data it needs lives. 整个管道在一个地方运行,在它需要的数据所在的一个分片上。Even if the 即使$match
's filter only has a partial match on the first part of the shard key (the "prefix"), if this spans a range of documents encapsulated within a single chunk, or multiple chunks on the same shard only, the runtime will just target the single shard.$match
的筛选器在分片键的第一部分(“前缀”)上只有部分匹配,如果这跨越了封装在单个块中的一系列文档,或者仅在同一个分片上的多个块,则运行时将仅针对单个分片。
Execution Of The Merger Part Of The Split Pipeline (If Any)拆分管道合并部分的执行(如有)
The aggregation runtime applies a set of rules to determine where to execute the merger part of an aggregation pipeline for a sharded cluster and whether a split is even necessary. 聚合运行时应用一组规则来确定在哪里执行分片集群聚合管道的合并部分,以及是否需要拆分。The following diagram captures the four different approaches the runtime will choose from.下图捕获了运行时将从中选择的四种不同方法。
The aggregation runtime selects the merger part location (if any) by following a decision tree, with four possible outcomes. 聚合运行时通过遵循决策树来选择合并部件的位置(如果有的话),有四种可能的结果。The list below outlines the ordered decisions the runtime takes. 下面的列表概述了运行时所做的有序决策。However, it is crucial to understand that this order does not reflect precedence or preference. 然而,重要的是要理解,这一顺序并不反映优先级或偏好。Achieving either the Targeted-Shard Execution (2) or Mongos Merge (4) is usually the preferred outcome for optimum performance.实现目标分片执行(2)或Mongos合并(4)通常是获得最佳性能的首选结果。
-
Primary-Shard Merge.主分片合并。When the pipeline contains a stage referencing a second unsharded collection, the aggregation runtime will place this stage in the merger part of the split pipeline.当管道包含引用第二个未排序集合的阶段时,聚合运行时会将此阶段放置在拆分管道的合并部分。It executes this merger part on the designated primary shard, which holds the referenced unsharded collection.它在指定的主分片上执行这个合并部分,主分片保存引用的未排序集合。This is always the case for the stages that can only reference unsharded collections (i.e. for对于只能引用未排序集合的阶段,情况总是如此(即,通常用于$out
generally or for$lookup
and$graphLookup
in MongoDB versions before 5.1).$out
,或者在MongoDB 5.1之前的版本中用于$lookup
和$graphLookup
)。This is also the situation if the collection happens to be unsharded and you reference it from a如果集合恰好是未排序的,并且您从$merge
stage or, in MongoDB 5.1 or greater, from a$lookup
or$graphLookup
stage.$merge
阶段引用它,或者在MongoDB 5.1或更高版本中,从$lookup
或$graphLookup
阶段引用它。 -
Targeted-Shard Execution.目标分片执行。As discussed earlier, if the runtime can ensure the pipeline matches the required subset of the source collection data to just one shard, it does not split the pipeline, and there is no merger part.如前所述,如果运行时可以确保管道仅将源集合数据的所需子集与一个分片匹配,则不会拆分管道,也不会出现合并部分。Instead, the runtime executes the entire pipeline on the one matched shard, just like it would for non-sharded deployments.相反,运行时在一个匹配的分片上执行整个管道,就像在非分片部署中一样。This optimisation avoids unnecessarily breaking the pipeline into two parts, where intermediate data then has to move from the shards part(s) to the merger part.这种优化避免了不必要地将管道分成两部分,其中中间数据必须从分片部分移动到合并部分。The behaviour of pinning to a single shard occurs even if the pipeline contains a即使管道包含引用第二个分片集合的$merge
,$lookup
or$graphLookup
stage referencing a second sharded collection containing records dispersed across multiple shards.$merge
、$lookup
或$graphLookup
阶段,也会出现固定到单个分片的行为,该集合包含分散在多个分片中的记录。 -
Any-Shard Merge.任何分片合并。Suppose you've configured假设您为聚合配置了allowDiskUse:true
for the aggregation to avoid the 100 MB memory consumption limit per stage.allowDiskUse:true
,以避免每个阶段的100MB内存消耗限制。If one of the following two situations is also true, the aggregation runtime must run the merger part of the split pipeline on a randomly chosen shard (a.k.a. "any shard"):如果以下两种情况中的一种也是真的,则聚合运行时必须在随机选择的分片(也称为“任何分片”)上运行拆分管道的合并部分:The pipeline contains a grouping stage (which is where the split occurs), or管道包含一个分组阶段(发生拆分的位置),或者The pipeline contains a管道包含一个$sort
stage (which is where the split occurs), and a subsequent blocking stage (a grouping or$sort
stage) occurs later.$sort
阶段(拆分发生的位置),随后会出现一个后续的阻塞阶段(分组或$sort
阶段)。
For these cases, the runtime picks a shard to execute the merger, rather than merging on the mongos, to maximise the likelihood that the host machine has enough storage space to spill to disk.对于这些情况,运行时会选择一个分片来执行合并,而不是在mongos
上进行合并,以最大限度地提高主机有足够存储空间溢出到磁盘的可能性。Invariably, each shard's host machine will have greater storage capacity than the host machine of a mongos.每个分片的主机总是比mongos
的主机具有更大的存储容量。Consequently, the runtime must take this caution because, with因此,运行时必须注意这一点,因为使用allowDiskUse:true
, you are indicating the likelihood that your pipeline will cause memory capacity pressure.allowDiskUse:true
,可以指示管道可能会导致内存容量压力。Notably, the aggregation runtime does not need to mitigate the same risk by merging on a shard for the other type of blocking stage (a值得注意的是,当$sort
) when$sort
is the only blocking stage in the pipeline.$sort
是管道中唯一的阻塞阶段时,聚合运行时不需要通过在另一种类型的阻塞阶段($sort
)的分片上合并来减轻相同的风险。You can read why a single您可以在本章的分组与排序的合并行为差异一节中阅读为什么单个$sort
stage can be treated differently and does not need the same host storage capacity for merging in this chapter's Difference In Merging Behaviour For Grouping Vs Sorting section.$sort
阶段可以得到不同的处理,并且不需要相同的主机存储容量来进行合并。 -
Mongos
Merge.合并This is the default approach and location.这是默认的方法和位置。The aggregation runtime will perform the merger part of the split pipeline on the mongos that instigated the aggregation in all the remaining situations.聚合运行时将在所有剩余情况下引发聚合的mongos
上执行拆分管道的合并部分。If the pipeline's merger part only contains streaming stages (described in the chapter Pipeline Performance Considerations), the runtime assumes it is safe for the mongos to run the remaining pipeline.如果管道的合并部分只包含流阶段(在管道性能注意事项一章中描述),则运行时会认为mongos
运行其余管道是安全的。A mongos has no concept of local storage to hold data.mongos
没有本地存储来保存数据的概念。However, it doesn't matter in this situation because the runtime won't need to write to disk as RAM pressure will be minimal.然而,在这种情况下这并不重要,因为运行时不需要写入磁盘,因为RAM压力将是最小的。The category of streaming tasks that supports a Mongos Merge also includes the final phase of a split支持Mongos Merge的流式任务类别还包括拆分$sort
stage, which processes data in a streaming fashion without needing to block to see all the data together.$sort
阶段的最后阶段,该阶段以流式方式处理数据,而无需阻塞即可查看所有数据。Additionally, suppose you have defined此外,假设您已经定义了allowDiskUse:false
(the default).allowDiskUse:false
(默认值)。In that case, you are signalling that even if the pipeline has a在这种情况下,您发出的信号是,即使管道有$group
stage (or a$sort
stage followed by another blocking stage), these blocking activities will not need to overspill to disk.$group
阶段(或后面跟着另一个阻塞阶段的$sort
阶段),这些阻塞活动也不需要溢出到磁盘。Performing the final merge on the mongos is the default because fewer network data transfer hops are required to fulfil the aggregation, thus reducing latency compared with merging on "any shard".在mongos
上执行最终合并是默认的,因为完成聚合所需的网络数据传输跳数更少,因此与在“任何分片”上进行合并相比,可以减少延迟。
Regardless of where the merger part runs, the mongos is always responsible for streaming the aggregation's final batches of results back to the client application. 无论合并部分在哪里运行,mongos
始终负责将聚合的最后一批结果流式传输回客户端应用程序。
It is worth considering when no blocking stages exist in a pipeline. 当管道中不存在阻塞阶段时,值得考虑。In this case, the runtime executes the entire pipeline in parallel on the relevant shards and the runtime streams each shard's output directly to the mongos. 在这种情况下,运行时在相关的分片上并行执行整个管道,运行时将每个分片的输出直接流式传输到mongos
。You can regard this as just another variation of the default behaviour (4 - Mongos Merge). 您可以将其视为默认行为的另一种变体(4-Mongos合并)。All the stages in the aggregation constitute just the shards part of the pipeline, and the mongos "stream merges" the final data through to the client. 聚合中的所有阶段只构成了管道的分片部分,mongos
“流合并”最终数据到客户端。
Difference In Merging Behaviour For Grouping Vs Sorting分组与排序合并行为的差异
You will have read in the Pipeline Performance Considerations chapter about 您将在管道性能注意事项一章中阅读到有关$sort
and $group
stages being blocking stages and potentially consuming copious RAM. $sort
和$group
阶段是阻塞阶段,可能会消耗大量RAM的内容。Consequently, you may be confused by the statement that, unlike a 因此,您可能会被以下语句弄糊涂:与$group
stage, when the pipeline splits, the aggregation runtime will finalise a $sort
stage on a mongos even if you specify allowDiskUse:true
. $group
阶段不同,当管道拆分时,聚合运行时将在mongos
上最终确定$sort
阶段,即使您指定allowDiskUse:true
也是如此。This is because the final phase of a split 这是因为拆分$sort
stage is not a blocking activity, whereas the final phase of a split $group
stage is. $sort
阶段的最后阶段不是阻塞活动,而拆分$group
阶段的最终阶段是阻塞活动。For 对于$group
, the pipeline's merger part must wait for all the data coming out of all the targeted shards. $group
,管道的合并部分必须等待来自所有目标分片的所有数据。For 对于$sort
, the runtime executes a streaming merge sort operation, only waiting for the next batch of records coming out of each shard. $sort
,运行时执行流式合并排序操作,只等待来自每个分片的下一批记录。As long as it can see the first of the sorted documents in the next batch from every shard, it knows which documents it can immediately process and stream on to the rest of the pipeline. 只要它能从每个分片中看到下一批排序后的第一个文档,它就知道可以立即处理哪些文档并将其流式传输到管道的其余部分。It doesn't have to block waiting to see all of the records to guarantee correct ordering. 它不必阻止等待查看所有记录以保证正确的排序。
This optimisation doesn't mean that MongoDB has magically found a way to avoid a 这种优化并不意味着MongoDB神奇地找到了一种方法来避免$sort
stage being a blocking stage in a sharded cluster. $sort
阶段成为分片集群中的阻塞阶段。It hasn't. 它没有。The first phase of the $sort
stage, run on each shard in parallel, is still blocking, waiting to see all the matched input data for that shard. $sort
阶段的第一阶段在每个分片上并行运行,仍然处于阻塞状态,等待查看该分片的所有匹配输入数据。However, the final phase of the same 但是,在合并位置执行的相同$sort
stage, executed at the merge location, does not need to block. $sort
阶段的最后阶段不需要阻塞。
Summarising Sharded Pipeline Execution Approaches总结分段管道执行方法
In summary, the aggregation runtime seeks to execute a pipeline on the subset of shards containing the required data only. 总之,聚合运行时试图在仅包含所需数据的分片子集上执行管道。If the runtime must split the pipeline to perform grouping or sorting, it completes the final merge work on a mongos, when possible. 如果运行时必须拆分管道以执行分组或排序,则在可能的情况下,它将完成mongos
上的最终合并工作。Merging on a mongos helps to reduce the number of required network hops and the execution time.在mongos
上进行合并有助于减少所需的网络跳数和执行时间。
Performance Tips For Sharded Aggregations分片聚合的性能提示
All the recommended aggregation optimisations outlined in the Pipeline Performance Considerations chapter equally apply to a sharded cluster. 管道性能注意事项一章中列出的所有推荐聚合优化同样适用于分片集群。In fact, in most cases, these same recommendations, repeated below, become even more critical when executing aggregations on sharded clusters: 事实上,在大多数情况下,在对分片集群执行聚合时,以下重复的这些相同建议变得更加重要:
-
Sorting - Use Index Sort.排序-使用索引排序。When the runtime has to split on a当运行时必须在$sort
stage, the shards part of the split pipeline running on each shard in parallel will avoid an expensive in-memory sort operation.$sort
阶段上进行拆分时,拆分管道中并行运行在每个分片上的分片部分将避免昂贵的内存排序操作。 -
Sorting - Use Limit With Sort.排序-将限制与排序一起使用。The runtime has to transfer fewer intermediate records over the network, from each shard performing the shards part of a split pipeline to the location that executes the pipeline's merger part.运行时必须通过网络传输更少的中间记录,从执行拆分管道的分片部分的每个分片到执行管道合并部分的位置。 -
Sorting - Reduce Records To Sort.排序-减少要排序的记录。If you cannot adopt point 1 or 2, moving a如果不能采用第1点或第2点,那么在管道中尽可能晚地移动$sort
stage to as late as possible in a pipeline will typically benefit performance in a sharded cluster.$sort
阶段通常会提高分片集群的性能。Wherever the无论$sort
stage appears in a pipeline, the aggregation runtime will split the pipeline at this point (unless preceded by a$group
stage which would cause the split earlier).$sort
阶段出现在管道中的哪个位置,聚合运行时都会在此时拆分管道(除非前面有$group
阶段,否则会提前导致拆分)。By promoting other activities to occur in the pipeline first, the hope is these reduce the number of records entering the blocking通过促进其他活动首先在管道中进行,希望这些活动能够减少进入阻塞$sort
stage.$sort
阶段的记录数量。This sorting operation, executing in parallel at the end of shards part of the split pipeline, will exhibit less memory pressure.这种排序操作在分割管道的分片部分的末尾并行执行,将显示出较小的内存压力。The runtime will also stream fewer records over the network to the split pipeline's merger part location.运行时还将通过网络将更少的记录流式传输到拆分管道的合并部分位置。 -
Grouping - Avoid Unnecessary Grouping.分组-避免不必要的分组。Using array operators where possible instead of在可能的情况下使用数组运算符而不是$unwind
and$group
stages will mean that the runtime does not need to split the pipeline due to an unnecessarily introduced$group
stage.$unvent
和$group
阶段将意味着运行时不需要由于不必要地引入$group
阶段而拆分管道。Consequently, the aggregation can efficiently process and stream data directly to the mongos rather than flowing through an intermediary shard first.因此,聚合可以有效地处理数据并将数据直接流式传输到mongos
,而不是首先通过中间分片。 -
Grouping - Group Summary Data Only.分组-仅对汇总数据进行分组。The runtime has to move fewer computed records over the network from each shard performing the shards part of a split pipeline to the merger part's location.运行时必须通过网络将更少的计算记录从执行拆分管道的分片部分的每个分片移动到合并部分的位置。 -
Encourage Match Filters To Appear Early In The Pipeline.鼓励匹配筛选器在管道的早期出现。By filtering out a large subset of records on each shard when performing the shards part of the split pipeline, the runtime needs to stream fewer records to the merger part location.在执行拆分管道的分片部分时,通过筛选掉每个分片上的大量记录子集,运行时需要将更少的记录流式传输到合并部分位置。
Specifically for sharded clusters, there are two further performance optimisations you should aspire to achieve:特别是对于分片集群,您应该进一步实现两个性能优化:
-
Look For Opportunities To Target Aggregations To One Shard Only.寻找机会,只针对一块分片进行聚合。If possible, include a如果可能的话,在分片键值(或分片键值前缀值)上包含一个带筛选器的$match
stage with a filter on a shard key value (or shard key prefix value).$match
阶段。 -
Look For Opportunities For A Split Pipeline To Merge On A Mongos.在Mongos上寻找拆分管道合并的机会。If the pipeline has a如果管道具有导致管道划分的$group
stage (or a$sort
stage followed by a$group
/$sort
stage) which causes the pipeline to divide, avoid specifyingallowDiskUse:true
if possible.$group
阶段(或$sort
阶段后接$group
/$sort
阶段),请尽可能避免指定allowDiskUse:true
。This reduces the amount of intermediate data transferred over the network, thus reducing latency.这减少了通过网络传输的中间数据量,从而减少了延迟。