The ideal shard key allows MongoDB to distribute documents evenly throughout the cluster while facilitating common query patterns. A suboptimal shard key can lead to performance or scaling issues due to uneven data distribution. You can change the shard key for a collection to change the distribution of your data across a cluster.理想的分片键允许MongoDB在整个集群中均匀分布文档,同时促进常见的查询模式。由于数据分布不均,次优的分片键可能会导致性能或扩展问题。您可以更改集合的分片键,以更改数据在集群中的分布。
Starting in MongoDB 8.0, you can reshard a collection on the same shard key, allowing you to redistribute data to include new shards or to different zones without changing your shard key. To reshard to the same shard key, set forceRedistribution to 从MongoDB 8.0开始,你可以在同一个分片键上重新分片一个集合,允许你在不更改分片键的情况下重新分发数据以包含新的分片或到不同的区域。要重新分片到相同的分片键,请将true.forceRedistribution设置为true。
Starting in MongoDB 8.0.10, you can reshard a time series collection. All shards in the time series collection must run version 8.0.10 or later to reshard.从MongoDB 8.0.10开始,您可以重新分片时间序列集合。时间序列集合中的所有分片必须运行8.0.10或更高版本才能重新分片。
Note
Before resharding your collection, read Troubleshoot Shard Keys for information on common performance and scaling issues and advice on how to fix them.在重新分片集合之前,请阅读分片键疑难解答,了解常见性能和扩展问题的信息,以及如何解决这些问题的建议。
About this Task关于此任务
Only one collection can be resharded at a time.一次只能重新分片一个集合。writeConcernMajorityJournalDefaultmust be必须true.true。To reshard a collection that has a uniqueness constraint, the new shard key must satisfy the unique index requirements for any existing unique indexes.要重新分片具有唯一性约束的集合,新的分片键必须满足任何现有唯一索引的唯一索引要求。The following commands and corresponding shell methods are not supported on the collection that is being resharded while the resharding operation is in progress:在重新分片操作进行期间,正在重新分片的集合不支持以下命令和相应的shell方法:The following commands and methods are not supported on the cluster while the resharding operation is in progress:正在进行重新分片操作时,群集不支持以下命令和方法:Warning
Using any of the preceding commands during a resharding operation causes the resharding operation to fail.在重新分片操作期间使用上述任何命令都会导致重新分片操作失败。If the collection you're resharding uses MongoDB Search, the search index becomes unavailable when the resharding operation completes.如果您正在重新分片的集合使用MongoDB搜索,则重新分片操作完成后,搜索索引将不可用。You need to manually rebuild the search index once the resharding operation completes.重新分片操作完成后,您需要手动重建搜索索引。
Before you Begin开始之前
Before you reshard your collection, ensure that you meet the following requirements:在重新分片集合之前,请确保您符合以下要求:
Your application can tolerate a period of two seconds where the affected collection blocks writes. During the time period where writes are blocked, your application experiences an increase in latency.应用程序可以容忍受影响的集合块写入的两秒钟时间。在写入被阻止的时间段内,应用程序的延迟会增加。If your workload cannot tolerate this requirement, consider refining your shard key instead.如果工作负载不能容忍这一要求,可以考虑改进分片键。Your database meets these resource requirements:数据库满足以下资源要求:Ensure that the available storage space on each shard the collection will be distributed across is at least twice the size of the collection that you want to reshard and its total index size, divided by the number of shards.确保集合将分布在每个分片上的可用存储空间至少是要重新分片的集合大小及其总索引大小除以分片数量的两倍。storage_req = ( ( collection_storage_size + index_size ) * 2 ) / shard_countFor example, consider a collection that contains 2 TB of data and has a 400 GB index distributed across four shards. To perform a resharding operation on this collection, each shard would require 1.2 TB of available storage.例如,考虑一个包含2TB数据的集合,其400GB索引分布在四个分片上。要对此集合执行重新分片操作,每个分片需要1.2 TB的可用存储空间。1.2 TB storage = ( ( 2 TB collection + 0.4 TB index ) * 2 ) / 4 shardsTo meet storage requirements, you may need to upgrade to the next tier of storage during the resharding operation. You can scale down once the operation completes.为了满足存储要求,您可能需要在重新分片操作期间升级到下一层存储。操作完成后,您可以缩小规模。Ensure that your I/O capacity is below 50%.确保I/O容量低于50%。Ensure that your CPU load is below 80%.确保CPU负载低于80%。
Important
These requirements are not enforced by the database. A failure to allocate enough resources can result in:数据库不强制执行这些要求。未能分配足够的资源可能会导致:the database running out of space and shutting down数据库空间不足并关闭decreased performance性能下降the operation taking longer than expected手术时间比预期的要长
If your application has time periods with less traffic, perform this operation on the collection during that time if possible.如果应用程序有流量较少的时间段,请在可能的情况下在该时间段内对集合执行此操作。You must rewrite your application's queries to use both the current shard key and the new shard key.您必须重写应用程序的查询,以使用当前分片键和新分片键。Tip
If your application can tolerate downtime, you can perform these steps to avoid rewriting your application's queries to use both the current and new shard keys:如果应用程序可以容忍停机时间,你可以执行以下步骤来避免重写应用程序的查询,以使用当前和新的分片键:Stop your application.停止应用程序。Rewrite your application to use the new shard key.重写应用程序以使用新的分片键。Wait until resharding completes. To monitor the resharding process, use the等待重新分片完成。要监视重新分片过程,请使用$currentOppipeline stage.$currentOp管道阶段。Deploy your rewritten application.部署重写的应用程序。
Before resharding completes, the following queries return an error if the query filter does not include either the current shard key or a unique field (like在重新分片完成之前,如果查询筛选器不包括当前分片键或唯一字段(如_id):_id),则以下查询将返回错误:deleteOne()findAndModify()findOneAndDelete()findOneAndReplace()findOneAndUpdate()replaceOne()updateOne()
For optimal performance, we recommend that you also rewrite other queries to include the new shard key.为了获得最佳性能,我们建议您还重写其他查询以包含新的分片键。Once the resharding operation completes, you can remove the old shard key from the queries.重新分片操作完成后,您可以从查询中删除旧的分片键。No index builds are in progress. To check for running index builds, use没有正在进行的索引构建。要检查是否正在运行索引构建,请使用$currentOp:$currentOp:db.getSiblingDB("admin").aggregate( [
{ $currentOp : { idleConnections: true } },
{ $match: {
$or: [
{ "op": "command", "command.createIndexes": { $exists: true } },
{ "op": "none", "msg": /^Index Build/ }
]
}
}
] )In the result document, if the在结果文档中,如果inprogfield value is an empty array, there are no index builds in progress:inprog字段值是一个空数组,则没有正在进行的索引构建:{
inprog: [],
ok: 1,
'$clusterTime': { ... },
operationTime: <timestamp>
}
Note
Resharding is a write-intensive process which can generate increased rates of oplog. You may wish to:重新分片是一个写密集型过程,可以产生更高的oplog速率。您可能希望:
set a fixed oplog size to prevent unbounded oplog growth.设置固定的oplog大小以防止oplog无限增长。increase the oplog size to minimize the chance that one or more secondary nodes becomes stale.增加oplog大小,以尽量减少一个或多个辅助节点过时的可能性。
See the Replica Set Oplog documentation for more details.有关更多详细信息,请参阅副本集操作日志文档。
Steps步骤
Important
We strongly recommend that you check the About this Task and read the Steps section in full before resharding your collection.我们强烈建议您在重新分片集合之前,查看关于此任务并完整阅读步骤部分。
In a collection resharding operation, a shard can be a:在集合重新分片操作中,分片可以是:
donor, which currently stores chunks for the sharded collection.捐赠者,目前为分片集合存储块。recipient, which stores new chunks for the sharded collection based on the shard keys and zones.接收者,它根据分片键和区域为分片集合存储新块。
A shard can be donor and a recipient at the same time.一个分片可以同时作为捐赠者和接受者。
The config server primary is always the resharding coordinator and starts each phase of the resharding operation.配置服务器主服务器始终是重新分片协调器,并启动重新分片操作的每个阶段。
Disable the Balancer禁用平衡器
You must turn off the balancer before you begin the process of resharding a collection. To disable the balancer, see here.在开始重新分片集合之前,您必须关闭平衡器。要禁用平衡器,请参阅此处。
Start the resharding operation.开始重新分片操作。
While connected to the 当连接到mongos, issue a reshardCollection command that specifies the collection to be resharded and the new shard key:mongos时,发出reshardCollection命令,指定要重新分片的集合和新的分片键:
db.adminCommand({
reshardCollection: "<database>.<collection>",
key: <shardkey>
})
MongoDB sets the max number of seconds to block writes to two seconds and begins the resharding operation.MongoDB将阻止写入的最大秒数设置为2秒,并开始重新分片操作。
To reshard to the same shard key, set forceRedistribution to 要重新分片到相同的分片键,请将true:forceRedistribution设置为true:
db.adminCommand({
reshardCollection: "<database>.<collection>",
key: <shardkey>,
forceRedistribution: true
})
You can also use 您还可以使用sh.reshardCollection() to reshard a collection with the same key. For an example, see Redistribute Data to New Shards.sh.reshardCollection()重新标记具有相同键的集合。例如,请参阅将数据重新分配到新分片。
Monitor the resharding operation.监控重新分片操作。
To monitor the resharding operation, you can use the 要监视重新分片操作,可以使用$currentOp pipeline stage:$currentOp管道阶段:
db.getSiblingDB("admin").aggregate([
{ $currentOp: { allUsers: true, localOps: false } },
{
$match: {
type: "op",
"originatingCommand.reshardCollection": "<database>.<collection>"
}
}
])
Note
To see updated values, you need to continuously run the preceeding pipeline.要查看更新的值,您需要连续运行前面的管道。
The $currentOp pipeline outputs:$currentOp管道输出:
totalOperationTimeElapsedSecs: elapsed operation time in seconds:运行时间(秒)remainingOperationTimeEstimatedSecs: estimated time remaining in seconds for the current resharding operation. It is returned as:当前重新分片操作的估计剩余时间(秒)。当新的重新分片操作开始时,它将返回-1。-1when a new resharding operation starts.Starting in MongoDB 7.0,从MongoDB 7.0开始,在重新分片操作期间,协调器上也可以使用remainingOperationTimeEstimatedSecsis also available on the coordinator during a resharding operation.remainingOperationTimeEstimatedSecs。remainingOperationTimeEstimatedSecsis set to a pessimistic time estimate:设置为悲观时间估计:The catch-up phase time estimate is set to the clone phase time, which is a relatively long time.追赶阶段时间估计值被设置为克隆阶段时间,这是一个相对较长的时间。In practice, if there are only a few pending write operations, the actual catch-up phase time is relatively short.在实践中,如果只有少数未决的写入操作,则实际的追赶阶段时间相对较短。
[
{
shard: '<shard>',
type: 'op',
desc: 'ReshardingRecipientService | ReshardingDonorService | ReshardingCoordinatorService <reshardingUUID>',
op: 'command',
ns: '<database>.<collection>',
originatingCommand: {
reshardCollection: '<database>.<collection>',
key: <shardkey>,
unique: <boolean>,
collation: { locale: 'simple' }
},
totalOperationTimeElapsedSecs: <number>,
remainingOperationTimeEstimatedSecs: <number>,
...
},
...
]Behavior行为
Minimum Duration of a Resharding Operation重新分片操作的最短持续时间
The minimum duration of a resharding operation is always 5 minutes.重新分片操作的最短持续时间始终为5分钟。
Retryable Writes可重试写入
Retryable writes initiated before or during resharding can be retried during and after the collection has been resharded for up to 5 minutes. 在重新分片之前或期间启动的可重试写入可以在集合重新分片期间和之后重试,最长可达5分钟。After 5 minutes you may be unable to find the definitive result of the write and subsequent attempts to retry the write fail with an 5分钟后,您可能无法找到写入的最终结果,随后重试写入的尝试失败,并出现IncompleteTransactionHistory error.IncompleteTransactionHistory(不完整事务历史)错误。
Error Case错误案例
Duplicate _id Values重复的_id值
_id ValuesThe resharding operation fails if 如果_id values are not globally unique to avoid corrupting collection data. _id值不是全局唯一的,则重新分片操作失败,以避免损坏集合数据。Duplicate 重复的_id values can also prevent successful chunk migration. If you have documents with duplicate _id values, copy the data from each into a new document, and then delete the duplicate documents._id值也会阻止成功的块迁移。如果您有具有重复_id值的文档,请将每个文档中的数据复制到新文档中,然后删除重复的文档。