Data Partitioning with Chunks使用块的数据分区

On this page本页内容

MongoDB uses the shard key associated to the collection to partition the data into chunks. MongoDB使用与集合关联的分片键将数据划分为块。A chunk consists of a subset of sharded data. 区块由分片数据的子集组成。Each chunk has a inclusive lower and exclusive upper range based on the shard key.每个区块都有一个基于分片键的包含下限和排除上限范围。

Diagram of the shard key value space segmented into smaller ranges or chunks.

MongoDB splits chunks when they grow beyond the configured chunk size. 当块增长超过配置的块大小时,MongoDB会分割块。Both inserts and updates can trigger a chunk split.插入和更新都会触发区块分割。

The smallest range a chunk can represent is a single unique shard key value. 块可以表示的最小范围是单个唯一的分片键值。A chunk that only contains documents with a single shard key value cannot be split.无法拆分仅包含具有单个分片键值的文档的区块。

Initial Chunks初始区块

Populated Collection填充的集合

  • The sharding operation creates the initial chunk(s) to cover the entire range of the shard key values. 分片操作创建初始块以覆盖分片键值的整个范围。The number of chunks created depends on the configured chunk size.创建的块数取决于配置的块大小
  • After the initial chunk creation, the balancer migrates these initial chunks across the shards as appropriate as well as manages the chunk distribution going forward.在初始块创建之后,平衡器会根据需要跨分片迁移这些初始块,并继续管理块分发。

Empty Collection清空集合

  • If you define zones and zone ranges defined for an empty or non-existing collection (Available starting in MongoDB 4.0.3):如果为空集合或不存在的集合定义分区和分区范围(从MongoDB 4.0.3开始可用):

    • The sharding operation creates empty chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. 分片操作为定义的区域范围创建空块以及任何附加块,以覆盖整个分片键值范围,并基于区域范围执行初始块分配。This initial creation and distribution of chunks allows for faster setup of zoned sharding.块的初始创建和分发允许更快地设置分区分片。
    • After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,平衡器将继续管理块分发。
  • If you do not have zones and zone ranges defined for an empty or non-existing collection:如果没有为空集合或不存在的集合定义区域和区域范围

    • For hashed sharding:对于散列分片:

      • The sharding operation creates empty chunks to cover the entire range of the shard key values and performs an initial chunk distribution. 分片操作创建空块以覆盖整个分片键值范围,并执行初始块分配。By default, the operation creates 2 chunks per shard and migrates across the cluster. 默认情况下,该操作为每个分片创建2个块,并跨集群迁移。You can use numInitialChunks option to specify a different number of initial chunks. 可以使用numInitialChunks选项指定不同数量的初始块。This initial creation and distribution of chunks allows for faster setup of sharding.这种块的初始创建和分发允许更快地设置分片。
      • After the initial distribution, the balancer manages the chunk distribution going forward.初始分发之后,平衡器将继续管理区块分发。
    • For ranged sharding:对于远程分片:

      • The sharding operation creates a single empty chunk to cover the entire range of the shard key values.分片操作创建单个空块,以覆盖整个分片键值范围。
      • After the initial chunk creation, the balancer migrates the initial chunk across the shards as appropriate as well as manages the chunk distribution going forward.在初始块创建之后,平衡器将初始块适当地跨分片迁移,并继续管理块分布。
Tip提示
See also: 参阅:

Chunk Size组块大小

The default chunk size in MongoDB is 128 megabytes. MongoDB中的默认区块大小为128 MB。You can increase or reduce the chunk size. 您可以增加或减少区块大小。Consider the implications of changing the default chunk size:考虑更改默认区块大小的含义:

  1. Small chunks lead to a more even distribution of data at the expense of more frequent migrations. 小数据块导致数据分布更均匀,但代价是迁移更频繁。This creates expense at the query routing (mongos) layer.这会在查询路由(mongos)层产生开销。
  2. Large chunks lead to fewer migrations. 大块导致迁移更少。This is more efficient both from the networking perspective and in terms of internal overhead at the query routing layer. 无论是从网络角度还是从查询路由层的内部开销来看,这都更加高效。But, these efficiencies come at the expense of a potentially uneven distribution of data.但是,这些效率是以潜在的数据分布不均匀为代价的。
  3. Chunk size affects the Maximum Number of Documents Per Chunk to Migrate.区块大小会影响每个区块要迁移的最大文档数
  4. Chunk size affects the maximum collection size when sharding an existing collection. 分片现有集合时,区块大小会影响最大集合大小。Post-sharding, chunk size does not constrain collection size.分片后,区块大小不限制集合大小。

For many deployments, it makes sense to avoid frequent and potentially spurious migrations at the expense of a slightly less evenly distributed data set.对于许多部署,以稍微不均匀分布的数据集为代价,避免频繁和潜在的虚假迁移是有意义的。

Limitations局限性

Changing the chunk size affects when chunks split but there are some limitations to its effects.更改块大小会影响块拆分时的效果,但其效果有一些限制。

  • Automatic splitting only occurs during inserts or updates. 自动拆分仅在插入或更新期间发生。If you lower the chunk size, it may take time for all chunks to split to the new size.如果降低块大小,则可能需要一些时间才能将所有块拆分为新大小。
  • Splits cannot be "undone". If you increase the chunk size, existing chunks must grow through inserts or updates until they reach the new size.拆分无法“撤消”。如果增加块大小,现有块必须通过插入或更新来增长,直到达到新大小。

Chunk Splits区块拆分

Splitting is a process that keeps chunks from growing too large. 拆分是一个防止块增长过大的过程。When a chunk grows beyond a specified chunk size, or if the number of documents in the chunk exceeds Maximum Number of Documents Per Chunk to Migrate, MongoDB splits the chunk based on the shard key values the chunk represent. 当区块增长超过指定的区块大小时,或者如果区块中的文档数超过要迁移的每个区块的最大文档数,MongoDB会根据区块表示的分片键值拆分区块。A chunk may be split into multiple chunks where necessary. 必要时,可以将一个块拆分为多个块。Inserts and updates may trigger splits. 插入和更新可能会触发拆分。Splits are an efficient meta-data change. 拆分是一种有效的元数据更改。To create splits, MongoDB does not migrate any data or affect the shards.为了创建拆分,MongoDB不会迁移任何数据或影响分片。

Diagram of a shard with a chunk that exceeds the default chunk size of 128 MB and triggers a split of the chunk into two chunks.

Splits may lead to an uneven distribution of the chunks for a collection across the shards. 拆分可能会导致一个集合的块在分片上的分布不均匀。In such cases, the balancer redistributes chunks across shards. 在这种情况下,平衡器会跨分片重新分配块。See Cluster Balancer for more details on balancing chunks across shards.有关跨分片平衡块的详细信息,请参阅群集平衡器

Chunk Migration区块迁移

MongoDB migrates chunks in a sharded cluster to distribute the chunks of a sharded collection evenly among shards. MongoDB迁移分片集群中的块,以便在分片之间均匀分布分片集合的块。Migrations may be either:迁移可以是:

  • Manual. 手动Only use manual migration in limited cases, such as to distribute data during bulk inserts. 仅在有限的情况下使用手动迁移,例如在批量插入期间分发数据。See Migrating Chunks Manually for more details.有关更多详细信息,请参阅手动迁移区块
  • Automatic. 自动的The balancer process automatically migrates chunks when there is an uneven distribution of a sharded collection's chunks across the shards. 当分片集合的块在分片上的分布不均匀时,平衡器进程会自动迁移块。See Migration Thresholds for more details.有关更多详细信息,请参阅迁移阈值

For more information on the sharded cluster balancer, see Sharded Cluster Balancer.有关分片群集平衡器的更多信息,请参阅分片群集平衡器

Balancing平衡

The balancer is a background process that manages chunk migrations. 平衡器是一个管理区块迁移的后台进程。If the difference in number of chunks between the largest and smallest shard exceed the migration thresholds, the balancer begins migrating chunks across the cluster to ensure an even distribution of data.如果最大和最小分片之间的块数差异超过迁移阈值,则平衡器开始跨集群迁移块,以确保数据的均匀分布。

Diagram of a collection distributed across three shards. For this collection, the difference in the number of chunks between the shards reaches the *migration thresholds* (in this case, 2) and triggers migration.

You can manage certain aspects of the balancer. 您可以管理平衡器的某些方面。The balancer also respects any zones created as a part of configuring zones in a sharded cluster.平衡器还考虑作为在分片集群中配置区域的一部分创建的任何区域

See Sharded Cluster Balancer for more information on the balancer.有关平衡器的详细信息,请参阅分片群集平衡器

Indivisible/Jumbo Chunks不可见/大块

In some cases, chunks can grow beyond the specified chunk size but cannot undergo a split. 在某些情况下,块可以超过指定的块大小,但不能进行拆分The most common scenario is when a chunk represents a single shard key value. 最常见的情况是块表示单个分片键值。Since the chunk cannot split, it continues to grow beyond the chunk size, becoming a jumbo chunk. 由于块无法拆分,它会继续增长,超出块大小,成为一个巨型块。These jumbo chunks can become a performance bottleneck as they continue to grow, especially if the shard key value occurs with high frequency.随着这些巨型块的不断增长,特别是在分片键值频繁出现的情况下,它们可能成为性能瓶颈。

Starting in MongoDB 5.0, you can reshard a collection by changing a document's shard key.从MongoDB5.0开始,您可以通过更改文档的分片键来重新装载集合

Starting in MongoDB 4.4, MongoDB provides the refineCollectionShardKey command. 从MongoDB 4.4开始,MongoDB提供了refineCollectionShardKey命令。Refining a collection's shard key allows for a more fine-grained data distribution and can address situations where the existing key insufficient cardinality leads to jumbo chunks.优化集合的分片键允许更细粒度的数据分发,并可以解决现有键基数不足导致巨型块的情况。

For more information, see:有关详细信息,请参阅:

moveChunk directory目录

In MongoDB 2.6 and MongoDB 3.0, sharding.archiveMovedChunks is enabled by default. 在MongoDB 2.6和MongoDB 3.0中,sharding.archiveMovedChunks默认启用。All other MongoDB versions have this disabled by default. >默认情况下,所有其他MongoDB版本都禁用了此功能。With sharding.archiveMovedChunks enabled, the source shard archives the documents in the migrated chunks in a directory named after the collection namespace under the moveChunk directory in the storage.dbPath.sharding.archiveMovedChunks后,源分片将迁移块中的文档归档到storage.dbPathmoveChunk目录下以集合名称空间命名的目录中。

If some error occurs during a migration, these files may be helpful in recovering documents affected during the migration.如果迁移过程中发生错误,这些文件可能有助于恢复迁移过程中受影响的文档。

Once the migration has completed successfully and there is no need to recover documents from these files, you may safely delete these files. 一旦迁移成功完成,并且不需要从这些文件恢复文档,您就可以安全地删除这些文件。Or, if you have an existing backup of the database that you can use for recovery, you may also delete these files after migration.或者,如果您有可用于恢复的数据库的现有备份,也可以在迁移后删除这些文件。

To determine if all migrations are complete, run sh.isBalancerRunning() while connected to a mongos instance.要确定所有迁移是否完成,请在连接到mongos实例时运行sh.isBalancerRunning()

←  Distributed Local Writes for Insert Only WorkloadsCreate Chunks in a Sharded Cluster →