Defragment Sharded Collections分片整理分片集合
On this page本页内容
Fragmentation is where a sharded collection's data is broken up into an unnecessarily large number of small chunks. 分片化是指将分片化集合的数据分解为不必要的大量小块。This can increase operation times of CRUD operations run on that collection. 这可能会增加在该集合上运行的CRUD操作的操作时间。Defragmentation reduces the number of chunks by merging smaller chunks into larger ones, resulting in lower CRUD operation times.分片整理通过将较小的块合并为较大的块来减少块的数量,从而降低CRUD操作时间。
If CRUD operation times are acceptable, you don't need to defragment collections.如果CRUD操作时间可以接受,则不需要对集合进行分片整理。
The following table summarizes defragmentation information for various MongoDB versions.下表总结了各种MongoDB版本的分片整理信息。
MongoDB Version | |
---|---|
To defragment a sharded collection, use the 要对已分片化的集合进行分片整理,请使用configureCollectionBalancing
command's defragmentCollection
option. The option is available starting in MongoDB 6.0.configureCollectionBalancing
命令的defragmentCollection
选项。该选项从MongoDB 6.0开始提供。
Before you Begin开始之前
Consider these issues before you defragment collections:在对集合进行分片整理之前,请考虑以下问题:
Defragmentation might cause many metadata updates on the shards.分片整理可能会导致分片上的许多元数据更新。If your CRUD operations are already taking longer than usual during migrations, you should only run defragmentation during a shard balancing window to reduce the system workload.如果您的CRUD操作在迁移过程中花费的时间已经比平时长,那么您应该只在分片平衡窗口期间运行分片整理,以减少系统工作负载。If defragmentation is impacting workload and CRUD latency on the cluster, you can reduce the impact using the如果分片整理正在影响集群上的工作负载和CRUD延迟,则可以使用chunkDefragmentationThrottlingMS
parameter.chunkDefragmentationThrottlingMS
参数来减少影响。Merged chunks lose their placement history.合并的区块将丢失其放置历史记录。This means that while defragmentation is running, snapshot reads and indirectly, transactions, could fail with stale chunk history errors.这意味着,当分片整理正在运行时,快照读取以及间接的事务可能会因过时的区块历史错误而失败。Placement history records the shards that a chunk was stored on. Defragmentation erases the placement history and some operations could fail, but will typically resolve after around five minutes.放置历史记录存储区块的分片。分片整理会擦除放置历史记录,有些操作可能会失败,但通常会在大约五分钟后解决。
Defragmentation affects the locality of the documents in a collection by moving data between shards.分片整理通过在分片之间移动数据来影响文档在集合中的位置。If a collection has ranges of data that are frequently accessed, after defragmenting the collection it is possible that the frequently accessed data will be on one shard.如果一个集合有一系列频繁访问的数据,则在对该集合进行分片整理后,频繁访问的这些数据可能会在一个分片上。This might decrease the performance of CRUD operations by placing the workload on one shard instead of multiple shards.这可能会将工作负载放在一个分片而不是多个分片上,从而降低CRUD操作的性能。
Tasks任务
Manually start defragmenting a sharded collection手动启动分片化集合的分片整理Monitor defragmentation of a sharded collection监视分片化集合的分片整理Manually stop defragmenting a sharded collection手动停止分片化集合的分片整理
Typically, you should use a shard balancing window to specify when the balancer runs instead of manually starting and stopping defragmentation.通常,您应该使用分片平衡窗口来指定平衡器何时运行,而不是手动启动和停止分片整理。
Details详细信息
This section describes additional details related to defragmenting sharded collections.本节介绍与分片化集合分片整理相关的其他详细信息。
Configure Collection Balancing Status配置集合平衡状态
The defragmentCollection
field returned by the configureCollectionBalancing
command is only true
when defragmentation is running.configureCollectionBalancing
命令返回的defragmentCollection
字段只有在分片整理运行时才为true
。
After defragmentation automatically ends or you manually stop defragmentation, the 分片整理自动结束或手动停止分片整理后,将从返回的文档中删除defragmentCollection
field is removed from the returned document.defragmentCollection
字段。
Operations操作
Secondary node reads are permitted during defragmentation, but might take longer to complete until metadata updates on the primary node are replicated to the secondary nodes.分片整理期间允许读取辅助节点,但可能需要更长的时间才能完成,直到将主节点上的元数据更新复制到辅助节点。
Chunk Size, Balancing, and Defragmentation块大小、平衡和分片整理
For details about the MongoDB balancer, see Sharded Cluster Balancer.有关MongoDB平衡器的详细信息,请参阅分片集群平衡器。
For an introduction to 有关chunkSize
, see Modify Range Size in a Sharded Cluster.chunkSize
的介绍,请参阅修改分片群集中的范围大小。
The following table describes how 下表描述了chunkSize
affects defragmentation and the balancer operations in different MongoDB versions.chunkSize
如何影响不同MongoDB版本中的分片整理和平衡器操作。
MongoDB Version | |
---|---|
chunkSize setting, the balancer migrates chunks between the shards.chunkSize 设置的三倍或三倍以上时,平衡器会在分片之间迁移块。chunkSize is 128 MB and the collection data differs by 384 MB or more, the balancer migrates chunks between the shards. chunkSize 为128 MB,并且集合数据相差384 MB或更多,则平衡器在分片之间迁移块。 | |
chunkSize , the chunk is split.chunkSize 时,该块将被拆分。 |
When chunks are moved, split, or merged, the shard metadata is updated after the chunk operation is committed by a config server. 当块被移动、拆分或合并时,在配置服务器提交块操作后,会更新分片元数据。Shards not involved in the chunk operation are also updated with new metadata.块操作中未涉及的分片也会使用新的元数据进行更新。
The time for the shard metadata update is proportional to the size of the routing table. 分片元数据更新的时间与路由表的大小成正比。CRUD operations on the collection are temporarily blocked while the shard metadata is updated, and a smaller routing table means shorter CRUD operation delays.在更新分片元数据时,集合上的CRUD操作会被暂时阻止,较小的路由表意味着更短的CRUD运算延迟。
Defragmenting a collection reduces the number of chunks and the time to update the chunk metadata.对集合进行分片整理可以减少块的数量和更新块元数据的时间。
To reduce the system workload, configure the balancer to run only at a specific time using a shard balancing window. Defragmentation runs during the balancing window time period.为了减少系统工作负载,请使用分片平衡窗口将平衡器配置为仅在特定时间运行。分片整理在平衡窗口时间段内运行。
You can use the 您可以使用chunkDefragmentationThrottlingMS
parameter to limit the rate of split and merge commands run by the balancer.chunkDefragmentationThrottlingMS
参数来限制平衡器运行的拆分和合并命令的速率。
You can start and stop defragmentation at any time.您可以随时启动和停止分片整理。
You can also set a shard zone. A shard zone is based on the shard key, and you can associate each zone with one or more shards in a cluster.您还可以设置分片区域。分片区域基于分片键,您可以将每个区域与集群中的一个或多个分片相关联。
Starting in MongoDB 6.0, a sharded cluster only splits chunks when chunks must be migrated. 从MongoDB 6.0开始,分片集群只在必须迁移块时才拆分块。This means the chunk size may exceed 这意味着区块大小可能超过chunkSize
. Larger chunks reduce the number of chunks on a shard and improve performance because the time to update the shard metadata is reduced. chunkSize
。更大的区块减少了分片上的区块数量并提高了性能,因为更新分片元数据的时间减少了。For example, you might see a 1 TB chunk on a shard even though you have set 例如,即使已将chunkSize
to 256 MB.chunkSize
设置为256MB,您也可能在分片上看到1TB的块。
chunkSize
affects the following:影响以下内容:
Maximum amount of data the balancer attempts to migrate between two shards in a single chunk migration operation.平衡器在单个区块迁移操作中尝试在两个分片之间迁移的最大数据量。Amount of data migrated during defragmentation.分片整理过程中迁移的数据量。
Learn More了解更多信息
Introduction to sharding, see Sharding分片简介,请参阅分片Partition data with chunks, see Data Partitioning with Chunks使用块对数据进行分区,请参阅使用块进行数据分区Configure collection balancing, see配置集合平衡,请参阅configureCollectionBalancing
configureCollectionBalancing
Examine balancer collection status, see检查平衡器集合状态,请参阅balancerCollectionStatus
balancerCollectionStatus
Configure shard balancing windows, see Schedule the Balancing Window配置分片平衡窗口,请参阅调度平衡窗口Monitor shards using MongoDB Atlas, see Review Sharded Clusters使用MongoDB Atlas监控分片,请参阅查看分片集群