Docs HomeMongoDB Manual

Sharded Cluster Balancer分片集群平衡器

The MongoDB balancer is a background process that monitors the amount of data on each shard for each sharded collection. MongoDB平衡器是一个后台进程,用于监控每个分片集合的每个分片上的数据量。When the amount of data for a sharded collection on a given shard reaches specific migration thresholds, the balancer attempts to automatically migrate data between shards and reach an even amount of data per shard while respecting the zones. 当给定分片上的分片集合的数据量达到特定的迁移阈值时,平衡器会尝试在分片之间自动迁移数据,并在尊重区域的同时达到每个分片的数据量。By default, the balancer process is always enabled.默认情况下,平衡器进程始终处于启用状态。

The balancing procedure for sharded clusters is entirely transparent to the user and application layer, though there may be some performance impact while the procedure takes place.分片集群的平衡过程对用户和应用程序层是完全透明的,尽管在进行该过程时可能会对性能产生一些影响。

Diagram of a collection distributed across three shards. For this collection, the difference in the number of chunks between the shards reaches the *migration thresholds* (in this case, 2) and triggers migration.

The balancer runs on the primary of the config server replica set (CSRS).平衡器在配置服务器副本集(CSRS)的主服务器上运行。

To configure collection balancing for a single collection, see configureCollectionBalancing.要为单个集合配置集合平衡,请参阅configureCollectionBalancing

To manage the sharded cluster balancer, see Manage Sharded Cluster Balancer.要管理分片集群平衡器,请参阅管理分片群集平衡器

Balancer Internals平衡器内部

Range migrations carry some overhead in terms of bandwidth and workload, both of which can impact database performance. 范围迁移在带宽和工作负载方面会带来一些开销,这两者都会影响数据库性能。The balancer attempts to minimize the impact by:平衡器试图通过以下方式将影响降至最低:

  • Restricting a shard to at most one migration at any given time. 将分片限制为在任何给定时间最多进行一次迁移。Specifically, a shard cannot participate in multiple data migrations at the same time. The balancer migrates ranges one at a time.具体来说,一个分片不能同时参与多个数据迁移。平衡器一次迁移一个范围。

    MongoDB can perform parallel data migrations, but a shard can participate in at most one migration at a time. MongoDB可以执行并行数据迁移,但一个分片一次最多可以参与一次迁移。For a sharded cluster with n shards, MongoDB can perform at most n/2 (rounded down) simultaneous migrations.对于一个有n个分片的分片集群,MongoDB最多可以执行n/2(四舍五入)的同时迁移。

    See also Asynchronous Range Migration Cleanup.另请参见异步范围迁移清理

  • Starting a balancing round only when the difference in the amount of data between the shard with the most data for a sharded collection and the shard with the least data for that collection reaches the migration threshold.只有当一个分片集合的数据最多的分片和该集合的数据最少的分片之间的数据量差异达到迁移阈值时,才启动平衡回合。

You may disable the balancer temporarily for maintenance. See Disable the Balancer for details.您可以暂时禁用平衡器进行维护。有关详细信息,请参阅禁用平衡器

You can also limit the window during which the balancer runs to prevent it from impacting production traffic. 您还可以限制平衡器运行的窗口,以防止其影响生产流量。See Schedule the Balancing Window for details.有关详细信息,请参阅安排平衡窗口

Note

The specification of the balancing window is relative to the local time zone of the primary of the config server replica set.平衡窗口的规范是相对于配置服务器副本集的主副本的本地时区的。

Adding and Removing Shards from the Cluster从群集中添加和删除分片

Adding a shard to a cluster creates an imbalance, since the new shard has no data. 将分片添加到集群会造成不平衡,因为新的分片没有数据。While MongoDB begins migrating data to the new shard immediately, it can take some time before the cluster balances. 虽然MongoDB立即开始将数据迁移到新的分片,但集群平衡可能需要一些时间。See the Add Shards to a Cluster tutorial for instructions on adding a shard to a cluster.有关将分片添加到集群的说明,请参阅将分片添加至集群教程。

Removing a shard from a cluster creates a similar imbalance, since data residing on that shard must be redistributed throughout the cluster. 从集群中删除分片会产生类似的不平衡,因为驻留在该分片上的数据必须在整个集群中重新分发。While MongoDB begins draining a removed shard immediately, it can take some time before the cluster balances. 虽然MongoDB开始立即排出已删除的分片,但集群平衡可能需要一段时间。Do not shutdown the servers associated to the removed shard during this process.在此过程中,不要关闭与已删除的分片关联的服务器。

When you remove a shard in a cluster with an uneven chunk distribution, the balancer first removes the chunks from the draining shard and then balances the remaining uneven chunk distribution.当您在具有不均匀块分布的集群中移除一个分片时,平衡器首先从排出的分片中移除块,然后平衡剩余的不均匀块分配。

See the Remove Shards from a Cluster tutorial for instructions on safely removing a shard from a cluster.有关从集群中安全删除分片的说明,请参阅从集群中删除分片教程。

Tip

See also: 另请参阅:

sh.balancerCollectionStatus()

Range Migration Procedure范围迁移过程

All range migrations use the following procedure:所有范围迁移都使用以下过程:

  1. The balancer process sends the moveRange command to the source shard.平衡器进程将moveRange命令发送到源分片。
  2. The source starts the move when it receives an internal moveRange command. 源在接收到内部moveRange命令时开始移动。During the migration process, operations to the range are sent to the source shard. The source shard is responsible for incoming write operations for the range.在迁移过程中,对范围的操作被发送到源分片。源分片负责范围的传入写入操作。
  3. The destination shard builds any indexes required by the source that do not exist on the destination.目标分片构建目标上不存在的源所需的任何索引。
  4. The destination shard begins requesting documents in the range and starts receiving copies of the data. 目标分片开始请求该范围内的文档,并开始接收数据的副本。See also Range Migration and Replication.另请参阅范围迁移和复制
  5. After receiving the final document in the range, the destination shard starts a synchronization process to ensure that it has the changes to the migrated documents that occurred during the migration.在接收到范围中的最终文档后,目标分片启动同步过程,以确保它对迁移过程中发生的迁移文档进行了更改。
  6. When fully synchronized, the source shard connects to the config database and updates the cluster metadata with the new location for the range.当完全同步时,源分片连接到配置数据库,并使用范围的新位置更新集群元数据。
  7. After the source shard completes the update of the metadata, and once there are no open cursors on the range, the source shard deletes its copy of the documents.在源分片完成元数据更新后,并且一旦该范围上没有打开的游标,源分片就会删除其文档副本。

    Note

    If the balancer needs to perform additional chunk migrations from the source shard, the balancer can start the next chunk migration without waiting for the current migration process to finish this deletion step. 如果平衡器需要从源分片执行额外的块迁移,则平衡器可以开始下一次块迁移,而无需等待当前迁移过程完成此删除步骤。See Asynchronous Range Migration Cleanup.请参见异步范围迁移清理

Migration Thresholds迁移阈值

To minimize the impact of balancing on the cluster, the balancer only begins balancing after the distribution of data for a sharded collection has reached certain thresholds.为了最大限度地减少平衡对集群的影响,平衡器只有在分片集合的数据分布达到特定阈值后才开始进行平衡。

A collection is considered balanced if the difference in data between shards (for that collection) is less than three times the configured range size for the collection. 如果(该集合的)分片之间的数据差异小于该集合的配置范围大小的三倍,则认为该集合是平衡的。For the default range size of 128MB, two shards must have a data size difference for a given collection of at least 384MB for a migration to occur.对于128MB的默认范围大小,对于给定的集合,两个分片的数据大小必须相差至少384MB才能进行迁移。

Tip

See also: 另请参阅:

sh.balancerCollectionStatus()

Asynchronous Range Migration Cleanup异步范围迁移清理

To migrate data from a shard, the balancer migrates the data one range at a time. 要从分片中迁移数据,平衡器一次迁移一个范围的数据。However, the balancer does not wait for the current migration's delete phase to complete before starting the next range migration. 但是,平衡器不会在开始下一个范围迁移之前等待当前迁移的删除阶段完成。See Range Migration for the range migration process and the delete phase.请参阅范围迁移以了解范围迁移过程和删除阶段。

This queuing behavior allows shards to unload data more quickly in cases of heavily imbalanced cluster, such as when performing initial data loads without pre-splitting and when adding new shards.这种排队行为允许分片在严重不平衡集群的情况下更快地卸载数据,例如在不进行预拆分的情况下执行初始数据加载以及添加新分片时。

This behavior also affects the moveRange command, and migration scripts that use the moveRange command may proceed more quickly.此行为还会影响moveRange命令,使用moveRange命令的迁移脚本可能会进行得更快。

In some cases, the delete phases may persist longer. 在某些情况下,删除阶段可能会持续更长时间。Range migrations are enhanced to be more resilient in the event of a failover during the delete phase. 范围迁移得到了增强,以便在删除阶段发生故障切换时更具弹性。Orphaned documents are cleaned up even if a replica set's primary crashes or restarts during this phase.孤立文档将被清理,即使复制集的主复制集在此阶段崩溃或重新启动。

The _waitForDelete balancer setting can alter the behavior so that the delete phase of the current migration blocks the start of the next chunk migration. _waitForDelete平衡器设置可以更改行为,以便当前迁移的删除阶段阻止下一个块迁移的开始。The _waitForDelete is generally for internal testing purposes. _waitForDelete通常用于内部测试。For more information, see Wait for Delete.有关详细信息,请参阅等待删除

Range Migration and Replication范围迁移和复制

During range migration, the _secondaryThrottle value determines when the migration proceeds with next document in the range.在范围迁移过程中,_secondaryThrottle值确定迁移何时继续该范围中的下一个文档。

In the config.settings collection:config.settings集合中:

  • If the _secondaryThrottle setting for the balancer is set to a write concern, each document move during range migration must receive the requested acknowledgement before proceeding with the next document.如果平衡器的_secondaryThrottle设置设置为写入问题,则范围迁移期间的每个文档移动都必须在继续下一个文档之前收到请求的确认。
  • If the _secondaryThrottle setting for the balancer is set to true, each document move during range migration must receive acknowledgement from at least one secondary before the migration proceeds with the next document in the range. 如果平衡器的_secondaryThrottle设置设置为true,则在迁移范围中的下一个文档之前,范围迁移期间的每个文档移动都必须至少收到一个辅助文档的确认。This is equivalent to a write concern of { w: 2 }.这相当于{ w: 2 }的写入问题。
  • If the _secondaryThrottle setting is unset, the migration process does not wait for replication to a secondary and instead continues with the next document.如果未设置_secondaryThrottle设置,则迁移过程不会等待复制到辅助文档,而是继续执行下一个文档。

To update the _secondaryThrottle parameter for the balancer, see Secondary Throttle for an example.要更新平衡器的_secondaryThrottle参数,请参阅辅助节流阀以获取示例。

Independent of any _secondaryThrottle setting, certain phases of the range migration have the following replication policy:与任何_secondaryThrottle设置无关,范围迁移的某些阶段具有以下复制策略:

  • MongoDB briefly pauses all application reads and writes to the collection being migrated to on the source shard before updating the config servers with the range location. 在用范围位置更新配置服务器之前,MongoDB会短暂暂停所有应用程序对源分片上迁移到的集合的读取和写入。MongoDB resumes application reads and writes after the update. MongoDB在更新后恢复应用程序的读写操作。The range move requires all writes to be acknowledged by majority of the members of the replica set both before and after committing the range move to config servers.范围移动要求在将范围移动提交到配置服务器之前和之后,副本集的大多数成员都要确认所有写入操作。
  • When an outgoing migration finishes and cleanup occurs, all writes must be replicated to a majority of servers before further cleanup (from other outgoing migrations) or new incoming migrations can proceed.当传出迁移完成并进行清理时,必须将所有写入复制到大多数服务器,然后才能进行进一步的清理(来自其他传出迁移)或新的传入迁移。

To update the _secondaryThrottle setting in the config.settings collection, see Secondary Throttle for an example.要更新config.settings集合中的_secondaryThrottle设置,请参阅辅助节流阀以获取示例。

Maximum Number of Documents Per Range to Migrate每个范围要迁移的最大文档数

By default, MongoDB cannot move a range if the number of documents in the range is greater than 2 times the result of dividing the configured range size by the average document size. 默认情况下,如果范围中的文档数量大于配置的范围大小除以平均文档大小的结果的2倍,MongoDB将无法移动该范围。If MongoDB can move a sub-range of a chunk and reduce the size to less than that, the balancer does so by migrating a range. 如果MongoDB可以移动一个区块的子范围并将大小缩小到小于这个范围,那么均衡器就可以通过迁移一个范围来实现这一点。db.collection.stats() includes the avgObjSize field, which represents the average document size in the collection.db.collection.stats()包含avgObjSize字段,该字段表示集合中的平均文档大小。

For chunks that are too large to migrate:对于太大而无法迁移的块:

  • The balancer setting attemptToBalanceJumboChunks allows the balancer to migrate chunks too large to move as long as the chunks are not labeled jumbo. 平衡器设置attemptToBalanceJumboChunks允许平衡器迁移太大而无法移动的块,只要这些块没有标记为jumboSee Balance Ranges that Exceed Size Limit for details.有关详细信息,请参阅超出大小限制的平衡范围

    When issuing moveRange and moveChunk commands, it's possible to specify the forceJumbo option to allow for the migration of ranges that are too large to move. 在发出moveRangemoveChunk命令时,可以指定forceJumbo选项以允许迁移太大而无法移动的范围。The ranges may or may not be labeled jumbo.范围可以标记为jumbo,也可以不标记为jumbo

Range Deletion Performance Tuning范围删除性能调整

You can tune the performance impact of range deletions with rangeDeleterBatchSize, rangeDeleterBatchDelayMS, and rangeDeleterHighPriority parameters. 您可以使用rangeDeleterBatchSizerangeDeleterBatchDelayMSrangeDeleterHighPriority参数来调整范围删除对性能的影响。For example:例如:

  • To limit the number of documents deleted per batch, you can set rangeDeleterBatchSize to a small value such as 32.要限制每批删除的文档数,可以将rangeDeleterBatchSize设置为一个小值,如32
  • To add an additional delay between batch deletions, you can set rangeDeleterBatchDelayMS above the current default of 20 milliseconds.要在批删除之间添加额外的延迟,可以将rangeDeleterBatchDelayMS设置为当前默认值20毫秒以上。
  • To prioritize range deletions, you can set rangeDeleterHighPriority to true. Range deletions are potentially long-running background tasks that might negatively impact the throughput of user operations when the system is under heavy load.要确定范围删除的优先级,可以将rangeDeleterHighPriority设置为true。范围删除是潜在的长期运行的后台任务,当系统处于重负载下时,可能会对用户操作的吞吐量产生负面影响。
Note

If there are ongoing read operations or open cursors on the collection targeted for deletes, range deletion processes may not proceed.如果在要删除的集合上有正在进行的读取操作或打开的游标,则范围删除过程可能无法继续。

Change Streams and Orphan Documents更改流和孤立文档

Starting in MongoDB 5.3, during range migration, change stream events are not generated for updates to orphaned documents.从MongoDB 5.3开始,在范围迁移过程中,不会为孤立文档的更新生成更改流事件。

Shard Size分片大小

By default, MongoDB attempts to fill all available disk space with data on every shard as the data set grows. 默认情况下,随着数据集的增长,MongoDB会尝试用每个分片上的数据填充所有可用的磁盘空间。To ensure that the cluster always has the capacity to handle data growth, monitor disk usage as well as other performance metrics.为了确保集群始终具有处理数据增长的能力,请监控磁盘使用情况以及其他性能指标。

See the Change the Maximum Storage Size for a Given Shard tutorial for instructions on setting the maximum size for a shard.有关设置分片最大大小的说明,请参阅更改给定分片的最大存储大小教程

Chunk Size and Balancing块大小和平衡

For an introduction to chunkSize, see Modify Range Size in a Sharded Cluster.有关chunkSize的介绍,请参阅修改分片群集中的范围大小

The following table describes how chunkSize affects defragmentation and the balancer operations in different MongoDB versions.下表描述了chunkSize如何影响不同MongoDB版本中的分片整理和平衡器操作。

MongoDB VersionDescription描述
MongoDB 6.0 and laterMongoDB 6.0及更高版本When the collection data shared between two shards differs by three or more times the configured chunkSize setting, the balancer migrates chunks between the shards.当两个分片之间共享的集合数据相差配置的chunkSize设置的三倍或三倍以上时,平衡器会在分片之间迁移块。
For example, if chunkSize is 128 MB and the collection data differs by 384 MB or more, the balancer migrates chunks between the shards. 例如,如果chunkSize为128 MB,并且集合数据相差384 MB或更多,则平衡器在分片之间迁移块。
Earlier than MongoDB 6.0早于MongoDB 6.0When a chunk grows larger than chunkSize, the chunk is split.当一个区块变得大于chunkSize时,该区块就会被分割。

When chunks are moved, split, or merged, the shard metadata is updated after the chunk operation is committed by a config server. 当块被移动、拆分或合并时,在配置服务器提交块操作后,会更新分片元数据。Shards not involved in the chunk operation are also updated with new metadata.块操作中未涉及的分片也会使用新的元数据进行更新。

The time for the shard metadata update is proportional to the size of the routing table. 分片元数据更新的时间与路由表的大小成正比。CRUD operations on the collection are temporarily blocked while the shard metadata is updated, and a smaller routing table means shorter CRUD operation delays.在更新分片元数据时,集合上的CRUD操作会被暂时阻止,较小的路由表意味着更短的CRUD运算延迟。

Defragmenting a collection reduces the number of chunks and the time to update the chunk metadata.对集合进行分片整理可以减少块的数量和更新块元数据的时间。

To reduce the system workload, configure the balancer to run only at a specific time using a shard balancing window. 为了减少系统工作负载,请使用分片平衡窗口将平衡器配置为仅在特定时间运行。Defragmentation runs during the balancing window time period.分片整理在平衡窗口时间段内运行。

You can use the chunkDefragmentationThrottlingMS parameter to limit the rate of split and merge commands run by the balancer.您可以使用chunkDefragmentationThrottlingMS参数来限制平衡器运行的拆分和合并命令的速率。

You can start and stop defragmentation at any time.您可以随时启动和停止分片整理。

You can also set a shard zone. 您还可以设置分片区域A shard zone is based on the shard key, and you can associate each zone with one or more shards in a cluster.分片区域基于分片键,您可以将每个区域与集群中的一个或多个分片相关联。

Starting in MongoDB 6.0, a sharded cluster only splits chunks when chunks must be migrated. 从MongoDB 6.0开始,分片集群只在必须迁移块时才拆分块。This means the chunk size may exceed chunkSize. 这意味着区块大小可能超过chunkSizeLarger chunks reduce the number of chunks on a shard and improve performance because the time to update the shard metadata is reduced. 更大的区块减少了分片上的区块数量并提高了性能,因为更新分片元数据的时间减少了。For example, you might see a 1 TB chunk on a shard even though you have set chunkSize to 256 MB.例如,即使已将chunkSize设置为256MB,您也可能在分片上看到1TB的块。

chunkSize affects the following:影响以下内容:

  • Maximum amount of data the balancer attempts to migrate between two shards in a single chunk migration operation.平衡器在单个区块迁移操作中尝试在两个分片之间迁移的最大数据量。
  • Amount of data migrated during defragmentation.分片整理过程中迁移的数据量。

For details about defragmenting sharded collections, see Defragment Sharded Collections.有关对分片集合进行分片整理的详细信息,请参阅分片整理分片集合