Docs HomeMongoDB Manual

Defragment Sharded Collections分片整理分片集合

Fragmentation is where a sharded collection's data is broken up into an unnecessarily large number of small chunks. 分片化是指将分片化集合的数据分解为不必要的大量小块。This can increase operation times of CRUD operations run on that collection. 这可能会增加在该集合上运行的CRUD操作的操作时间。Defragmentation reduces the number of chunks by merging smaller chunks into larger ones, resulting in lower CRUD operation times.分片整理通过将较小的块合并为较大的块来减少块的数量,从而降低CRUD操作时间。

If CRUD operation times are acceptable, you don't need to defragment collections.如果CRUD操作时间可以接受,则不需要对集合进行分片整理。

The following table summarizes defragmentation information for various MongoDB versions.下表总结了各种MongoDB版本的分片整理信息。

MongoDB VersionDescription描述
MongoDB 7.0 and laterMongoDB 7.0及更高版本Chunks are automatically merged. 区块会自动合并。Performance improvements from defragmenting a collection in MongoDB 7.0 are lower compared to MongoDB 6.0. 与MongoDB 6.0相比,MongoDB 7.0中对集合进行分片整理的性能改进更低。Typically, you don't need to defragment collections starting in MongoDB 7.0.通常,您不需要从MongoDB 7.0开始对集合进行分片整理。
MongoDB 6.0 and earlier than 7.0MongoDB 6.0及7.0之前的版本Defragment collections only if you experience CRUD operation delays when the balancer migrates chunks or a node starts.只有当平衡器迁移块或节点启动时遇到CRUD操作延迟时,才进行分片整理集合。
Starting in MongoDB 6.0, high write traffic should not cause fragmentation. 从MongoDB 6.0开始,高写流量不应该导致分片化。Chunk migrations cause fragmentation. 块迁移会导致分片。
Earlier than MongoDB 6.0MongoDB 6.0之前Defragment collections only if you experience longer CRUD operation times during metadata updates. 只有在元数据更新期间CRUD操作时间较长时,才能对集合进行分片整理。For MongoDB versions earlier than 6.0, a sharded collection becomes fragmented when the collection size grows significantly because of many insert or update operations.对于6.0之前的MongoDB版本,当由于许多插入或更新操作导致集合大小显著增长时,分片的集合就会变得分片化。

To defragment a sharded collection, use the configureCollectionBalancing command's defragmentCollection option. The option is available starting in MongoDB 6.0.要对已分片化的集合进行分片整理,请使用configureCollectionBalancing命令的defragmentCollection选项。该选项从MongoDB 6.0开始提供。

Before you Begin开始之前

Consider these issues before you defragment collections:在对集合进行分片整理之前,请考虑以下问题:

  • Defragmentation might cause many metadata updates on the shards. 分片整理可能会导致分片上的许多元数据更新。If your CRUD operations are already taking longer than usual during migrations, you should only run defragmentation during a shard balancing window to reduce the system workload.如果您的CRUD操作在迁移过程中花费的时间已经比平时长,那么您应该只在分片平衡窗口期间运行分片整理,以减少系统工作负载。
  • If defragmentation is impacting workload and CRUD latency on the cluster, you can reduce the impact using the chunkDefragmentationThrottlingMS parameter.如果分片整理正在影响集群上的工作负载和CRUD延迟,则可以使用chunkDefragmentationThrottlingMS参数来减少影响。
  • Merged chunks lose their placement history.合并的区块将丢失其放置历史记录。

    • This means that while defragmentation is running, snapshot reads and indirectly, transactions, could fail with stale chunk history errors.这意味着,当分片整理正在运行时,快照读取以及间接的事务可能会因过时的区块历史错误而失败。
    • Placement history records the shards that a chunk was stored on. Defragmentation erases the placement history and some operations could fail, but will typically resolve after around five minutes.放置历史记录存储区块的分片。分片整理会擦除放置历史记录,有些操作可能会失败,但通常会在大约五分钟后解决。
  • Defragmentation affects the locality of the documents in a collection by moving data between shards. 分片整理通过在分片之间移动数据来影响文档在集合中的位置。If a collection has ranges of data that are frequently accessed, after defragmenting the collection it is possible that the frequently accessed data will be on one shard. 如果一个集合有一系列频繁访问的数据,则在对该集合进行分片整理后,频繁访问的这些数据可能会在一个分片上。This might decrease the performance of CRUD operations by placing the workload on one shard instead of multiple shards.这可能会将工作负载放在一个分片而不是多个分片上,从而降低CRUD操作的性能。

Tasks任务

Note

Typically, you should use a shard balancing window to specify when the balancer runs instead of manually starting and stopping defragmentation.通常,您应该使用分片平衡窗口来指定平衡器何时运行,而不是手动启动和停止分片整理。

Details详细信息

This section describes additional details related to defragmenting sharded collections.本节介绍与分片化集合分片整理相关的其他详细信息。

Configure Collection Balancing Status配置集合平衡状态

The defragmentCollection field returned by the configureCollectionBalancing command is only true when defragmentation is running.configureCollectionBalancing命令返回的defragmentCollection字段只有在分片整理运行时才为true

After defragmentation automatically ends or you manually stop defragmentation, the defragmentCollection field is removed from the returned document.分片整理自动结束或手动停止分片整理后,将从返回的文档中删除defragmentCollection字段。

Operations操作

Secondary node reads are permitted during defragmentation, but might take longer to complete until metadata updates on the primary node are replicated to the secondary nodes.分片整理期间允许读取辅助节点,但可能需要更长的时间才能完成,直到将主节点上的元数据更新复制到辅助节点。

Chunk Size, Balancing, and Defragmentation块大小、平衡和分片整理

For details about the MongoDB balancer, see Sharded Cluster Balancer.有关MongoDB平衡器的详细信息,请参阅分片集群平衡器

For an introduction to chunkSize, see Modify Range Size in a Sharded Cluster.有关chunkSize的介绍,请参阅修改分片群集中的范围大小

The following table describes how chunkSize affects defragmentation and the balancer operations in different MongoDB versions.下表描述了chunkSize如何影响不同MongoDB版本中的分片整理和平衡器操作。

MongoDB VersionDescription描述
MongoDB 6.0 and laterMongoDB 6.0及更高版本When the collection data shared between two shards differs by three or more times the configured chunkSize setting, the balancer migrates chunks between the shards.当两个分片之间共享的集合数据相差配置的chunkSize设置的三倍或三倍以上时,平衡器会在分片之间迁移块。
For example, if chunkSize is 128 MB and the collection data differs by 384 MB or more, the balancer migrates chunks between the shards. 例如,如果chunkSize为128 MB,并且集合数据相差384 MB或更多,则平衡器在分片之间迁移块。
Earlier than MongoDB 6.0MongoDB 6.0之前When a chunk grows larger than chunkSize, the chunk is split.当一个块大于chunkSize时,该块将被拆分。

When chunks are moved, split, or merged, the shard metadata is updated after the chunk operation is committed by a config server. 当块被移动、拆分或合并时,在配置服务器提交块操作后,会更新分片元数据。Shards not involved in the chunk operation are also updated with new metadata.块操作中未涉及的分片也会使用新的元数据进行更新。

The time for the shard metadata update is proportional to the size of the routing table. 分片元数据更新的时间与路由表的大小成正比。CRUD operations on the collection are temporarily blocked while the shard metadata is updated, and a smaller routing table means shorter CRUD operation delays.在更新分片元数据时,集合上的CRUD操作会被暂时阻止,较小的路由表意味着更短的CRUD运算延迟。

Defragmenting a collection reduces the number of chunks and the time to update the chunk metadata.对集合进行分片整理可以减少块的数量和更新块元数据的时间。

To reduce the system workload, configure the balancer to run only at a specific time using a shard balancing window. Defragmentation runs during the balancing window time period.为了减少系统工作负载,请使用分片平衡窗口将平衡器配置为仅在特定时间运行。分片整理在平衡窗口时间段内运行。

You can use the chunkDefragmentationThrottlingMS parameter to limit the rate of split and merge commands run by the balancer.您可以使用chunkDefragmentationThrottlingMS参数来限制平衡器运行的拆分和合并命令的速率。

You can start and stop defragmentation at any time.您可以随时启动和停止分片整理。

You can also set a shard zone. A shard zone is based on the shard key, and you can associate each zone with one or more shards in a cluster.您还可以设置分片区域。分片区域基于分片键,您可以将每个区域与集群中的一个或多个分片相关联。

Starting in MongoDB 6.0, a sharded cluster only splits chunks when chunks must be migrated. 从MongoDB 6.0开始,分片集群只在必须迁移块时才拆分块。This means the chunk size may exceed chunkSize. Larger chunks reduce the number of chunks on a shard and improve performance because the time to update the shard metadata is reduced. 这意味着区块大小可能超过chunkSize。更大的区块减少了分片上的区块数量并提高了性能,因为更新分片元数据的时间减少了。For example, you might see a 1 TB chunk on a shard even though you have set chunkSize to 256 MB.例如,即使已将chunkSize设置为256MB,您也可能在分片上看到1TB的块。

chunkSize affects the following:影响以下内容:

  • Maximum amount of data the balancer attempts to migrate between two shards in a single chunk migration operation.平衡器在单个区块迁移操作中尝试在两个分片之间迁移的最大数据量。
  • Amount of data migrated during defragmentation.分片整理过程中迁移的数据量。

Learn More了解更多信息