Docs HomeMongoDB Manual

Data Partitioning with Chunks使用区块进行数据分区

MongoDB uses the shard key associated to the collection to partition the data into chunks owned by a specific shard. MongoDB使用与集合关联的分片键将数据划分为特定分片所拥有的区块A chunk consists of a range of sharded data. A range can be a portion of the chunk or the whole chunk. 区块由分片数据的一段范围组成。范围可以是块的一部分,也可以是整个块。The balancer migrates data between shards. Each chunk has inclusive lower and exclusive upper limits based on the shard key.平衡器在分片之间迁移数据。每个区块都有基于分片键的包含下限和独占上限。

Diagram of the shard key value space segmented into smaller ranges or chunks.

The smallest unit of data a chunk can represent is a single unique shard key value.区块可以表示的最小数据单元是单个唯一的分片键值。

Initial Chunks初始区块

Populated Collection已填充集合

  • The sharding operation creates one large initial chunk to cover all of the shard key values.分片操作创建一个大的初始块来覆盖所有的分片键值。
  • After the initial chunk creation, the balancer moves ranges off of the initial chunk when it needs to start balancing data.在初始块创建之后,平衡器在需要开始平衡数据时将范围从初始块移开。

Empty Collection空集合

  • If you have zones and zone ranges defined for an empty or non-existing collection.如果为空集合或不存在的集合定义了区域和区域范围

    • The sharding operation creates empty chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. 分片操作为定义的区域范围创建空块,以及覆盖整个分片键值范围的任何额外块,并基于区域范围执行初始块分布。This initial creation and distribution of chunks allows for faster setup of zoned sharding.这种块的初始创建和分布允许更快地设置分区分片。
    • After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,平衡器管理接下来的块分发。
  • If you do not have zones and zone ranges defined for an empty or non-existing collection:如果没有为空集合或不存在的集合定义分区和分区范围:

    • For hashed sharding:对于哈希分片:

      • The sharding operation creates empty chunks to cover the entire range of the shard key values and performs an initial chunk distribution. 分片操作创建空块以覆盖分片键值的整个范围,并执行初始块分发。By default, the operation creates 2 chunks per shard and migrates across the cluster. 默认情况下,该操作为每个分片创建2个块,并在集群中迁移。You can use numInitialChunks option to specify a different number of initial chunks. 您可以使用numInitialChucks选项来指定不同数量的初始块。This initial creation and distribution of chunks allows for faster setup of sharding.这种块的初始创建和分发允许更快地设置分片。
      • After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,平衡器管理接下来的块分发。
    • For ranged sharding:对于远程分片:

      • The sharding operation creates a single empty chunk to cover the entire range of the shard key values.分片操作会创建一个空区块来覆盖整个分片键值范围。
      • After the initial chunk creation, the balancer migrates the initial chunk across the shards as appropriate as well as manages the chunk distribution going forward.在初始块创建之后,平衡器在适当的情况下跨分片迁移初始块,并管理接下来的块分布。
Tip

See also: 另请参阅:

sh.balancerCollectionStatus()

Range Size范围大小

The default range size in MongoDB is 128 megabytes. MongoDB中默认的范围大小是128兆字节。You can increase or reduce the chunk size. Consider the implications of changing the default chunk size:您可以增加或减少区块大小。考虑更改默认区块大小的含义:

  1. Small ranges lead to a more even distribution of data at the expense of more frequent migrations. 小范围导致数据分布更加均匀,而代价是更频繁的迁移。This creates expense at the query routing (mongos) layer.这在查询路由(mongos)层产生了开销。
  2. Large ranges lead to fewer migrations. 大范围导致迁移减少。This is more efficient both from the networking perspective and in terms of internal overhead at the query routing layer. 从网络的角度和查询路由层的内部开销来看,这都更有效。But, these efficiencies come at the expense of a potentially uneven distribution of data.但是,这些效率是以潜在的不均衡数据分布为代价的。
  3. Range size affects the Maximum Number of Documents Per Range to Migrate.范围大小会影响要迁移的每个范围的最大文档数
  4. Range size affects the maximum collection size when sharding an existing collection. 分割现有集合时,范围大小会影响最大集合大小。Post-sharding, data range size does not constrain collection size.分片后,数据范围大小不限制集合大小。

For many deployments, it makes sense to avoid frequent and potentially spurious migrations at the expense of a slightly less evenly distributed data set.对于许多部署来说,避免频繁和潜在的虚假迁移是有意义的,而牺牲的是分布稍微不均匀的数据集。

Range Migration范围迁移

MongoDB migrates data ranges in a sharded cluster to distribute the data of a sharded collection evenly among shards. Migrations may be either:MongoDB迁移一个分片集群中的数据范围,使分片集合的数据在各分片之间均匀分布。迁移可能是:

  • Manual. Only use manual migration in limited cases, such as to distribute data during bulk inserts. 手动。仅在有限的情况下使用手动迁移,例如在大容量插入期间分发数据。See Migrating Chunks Manually for more details.有关更多详细信息,请参阅手动迁移区块
  • Automatic. 自动。The balancer process automatically migrates data when there is an uneven distribution of a sharded collection's data across the shards. 当分片集合的数据在各分片之间分布不均匀时,平衡器进程会自动迁移数据。See Migration Thresholds for more details.有关详细信息,请参阅迁移阈值

For more information on the sharded cluster balancer, see Sharded Cluster Balancer.有关分片集群平衡器的更多信息,请参阅分片集群平衡器

Balancing平衡

The balancer is a background process that manages data migrations. 平衡器是管理数据迁移的后台进程。If the difference in amount of data between the largest and smallest shard exceed the migration thresholds, the balancer begins migrating data across the cluster to ensure an even distribution.如果最大和最小分片之间的数据量差异超过迁移阈值,则平衡器开始在集群中迁移数据,以确保均匀分布。

Diagram of a collection distributed across three shards. For this collection, the difference in the number of chunks between the shards reaches the *migration thresholds* (in this case, 2) and triggers migration.

You can manage certain aspects of the balancer. The balancer also respects any zones created as a part of configuring zones in a sharded cluster.您可以管理平衡器的某些方面。平衡器还尊重作为在分片集群中配置区域的一部分而创建的任何区域

See Sharded Cluster Balancer for more information on the balancer.有关平衡器的详细信息,请参阅分片集群平衡器

Indivisible/Jumbo Chunks不可分割/大块

In some cases, chunks can grow beyond the specified chunk size but cannot undergo a split. 在某些情况下,块可以增长到指定的块大小之外,但不能进行拆分。The most common scenario is when a chunk represents a single shard key value. 最常见的场景是当一个chunk表示单个分片键值时。Since the chunk cannot split, it continues to grow beyond the chunk size, becoming a jumbo chunk. 由于区块无法分割,它会继续增长,超过区块大小,成为一个巨型的区块。These jumbo chunks can become a performance bottleneck as they continue to grow, especially if the shard key value occurs with high frequency.随着这些巨型块的不断增长,它们可能会成为性能瓶颈,尤其是在分片键值高频率出现的情况下。

Starting in MongoDB 5.0, you can reshard a collection by changing a document's shard key.从MongoDB 5.0开始,您可以通过更改文档的分片键来重新分片集合

Starting in MongoDB 4.4, MongoDB provides the refineCollectionShardKey command. 从MongoDB 4.4开始,MongoDB提供了refineCollectionShardKey命令。Refining a collection's shard key allows for a more fine-grained data distribution and can address situations where the existing key insufficient cardinality leads to jumbo chunks.细化集合的分片键可以实现更细粒度的数据分布,并可以解决现有键基数不足导致巨型块的情况。

To learn whether you should reshard your collection or refine your shard key, see Change a Shard Key.要了解是应该重新发布集合还是细化分片键,请参阅更改分片键

For more information, see:有关详细信息,请参阅: