To remove a shard you must ensure the shard's data is migrated to the remaining shards in the cluster. This procedure describes how to safely migrate data and remove a shard.要删除分片,您必须确保分片的数据迁移到集群中的其余分片。此过程描述了如何安全地迁移数据和删除分片。
About this Task关于此任务
Creating, sharding, or moving collections while performing this procedure may cause interruptions and lead to unexpected results.执行此过程时创建、分片或移动集合可能会导致中断并导致意外结果。Do not use this procedure to migrate an entire cluster to new hardware. To migrate, see Migrate a Self-Managed Sharded Cluster to Different Hardware.请勿使用此过程将整个群集迁移到新硬件。要迁移,请参阅将自我管理的分片群集迁移到不同的硬件。When you remove a shard in a cluster with an uneven chunk distribution, the balancer first removes the chunks from the draining shard and then balances the remaining uneven chunk distribution.当你在一个块分布不均匀的集群中删除一个分片时,平衡器首先从耗尽的分片中删除块,然后平衡剩余的不均匀块分布。Removing a shard may cause an open change stream cursor to close, and the closed change stream cursor may not be fully resumable.删除分片可能会导致打开的更改流游标关闭,而关闭的更改流游标可能无法完全恢复。You can safely restart a cluster during a shard removal process. If you restart a cluster during an ongoing draining process, draining continues automatically after the cluster components restart.在分片删除过程中,您可以安全地重新启动集群。如果在正在进行的排水过程中重新启动群集,则排水将在群集组件重新启动后自动继续。MongoDB records the shard draining status in theMongoDB在config.shardscollection.config.shards集合中记录分片耗尽状态。
Before you Begin开始之前
This procedure uses the此过程使用sh.moveCollection()method to move collections off of the removed shard.sh.moveCollection()方法将集合移出已删除的分片。Before you begin this procedure, review the在开始此过程之前,请查看moveCollectionconsiderations and requirements to understand the command behavior.moveCollection注意事项和要求,以了解命令行为。To remove a shard, first connect to one of the cluster's要删除分片,首先使用mongosinstances usingmongosh.mongosh连接到集群的mongos实例之一。
Note
When removing multiple shards, remove them simultaneously rather than one at a time. Removing one shard at a time causes the balancer to drain data into other remaining shards. A shard can only participate in one chunk migration at a time, so removing one shard limits the throughput of data migration.删除多个分片时,请同时删除它们,而不是一次删除一个。一次删除一个分片会导致平衡器将数据排入其他剩余的分片。一个分片一次只能参与一个块迁移,因此删除一个分片会限制数据迁移的吞吐量。
Steps步骤
Ensure the balancer is enabled确保平衡器已启用
To migrate data from a shard, the balancer process must be enabled. To check the balancer state, use the 要从分片迁移数据,必须启用平衡器进程。要检查平衡器状态,请使用sh.getBalancerState() method:sh.getBalancerState()方法:
sh.getBalancerState()
If the operation returns 如果操作返回true, the balancer is enabled.true,则启用平衡器。
If the operation returns 如果操作返回false, see Enable the Balancer.false,请参阅启用平衡器。
Determine the name of the shard to remove确定要删除的分片的名称
To find the name of the shard, run the 要查找分片的名称,请运行listShards command:listShards命令:
db.adminCommand( { listShards: 1 } )
The shards._id field contains the shard name.shards._id字段包含分片名称。
Migrate sharded collection data with the balancer使用平衡器迁移分片集合数据
Run the 对要删除的分片运行removeShard command for the shard you want to remove:removeShard命令:
db.adminCommand( { removeShard: "<shardName>" } )
Note
mongos converts the write concern of the removeShard command to "majority".mongos将removeShard命令的写入关注转换为"majority"。
The removeShard operation returns:removeShard操作返回:
{
"msg" : "draining started successfully",
"state" : "started",
"shard" : "<shardName>",
"note" : "you need to call moveCollection for collectionsToMove and afterwards movePrimary for the dbsToMove",
"dbsToMove" : [
"db1",
"db2"
],
collectionsToMove: ["db1.collA"]
"ok" : 1,
"operationTime" : Timestamp(1575398919, 2),
"$clusterTime" : {
"clusterTime" : Timestamp(1575398919, 2),
"signature" : {
"hash" : BinData(0,"Oi68poWCFCA7b9kyhIcg+TzaGiA="),
"keyId" : Long("6766255701040824328")
}
}
The shard enters the 分片进入draining state and the balancer begins migrating chunks from the removed shard to other shards in the cluster. draining(耗尽)状态,平衡器开始将块从已删除的分片迁移到集群中的其他分片。These migrations happens slowly to avoid severe impact on the overall cluster. Depending on your network capacity and the amount of data, this operation can take from a few minutes to several days to complete.这些迁移过程缓慢,以避免对整个集群造成严重影响。根据您的网络容量和数据量,此操作可能需要几分钟到几天的时间才能完成。
Tip
While the shard is in the 当分片处于draining state, you can use the reshardCollection command to redistribute data off of the removed shard.draining(耗尽)状态时,您可以使用reshardCollection命令从已删除的分片中重新分发数据。
Moving data with 使用reshardCollection can be faster than waiting for the balancer to migrate chunks. The cluster ensures that data is not placed on any draining shards. You can't run moveCollection and reshardCollection operations simultaneously.reshardCollection移动数据可能比等待平衡器迁移块更快。集群确保数据不会放置在任何耗尽的分片上。您不能同时运行moveCollection和reshardCollection操作。
For the full procedure, see Resharding for Adding and Removing Shards.有关完整过程,请参阅重新分片以添加和删除分片。
Move unsharded collections to another shard将未分片的集合移动到另一个分片
Determine what collections need to be moved确定需要移动哪些集合To list the unsharded collections on the shard, use the aggregation stage要列出分片上的未分片集合,请使用聚合阶段$listClusterCatalog:$listClusterCatalog:use admin
db.aggregate([
{ $listClusterCatalog: { shards: true } },
{ $match: {
$and: [
{ sharded: false },
{ shards: '<shard_to_remove>' },
{ type: { $nin: ["timeseries","view"] } },
{ ns: { $not: { $regex: "^enxcol_\..*(\.esc|\.ecc|\.ecoc|\.ecoc\.compact)$" }}},
{ $or: [{ns: {$not: { $regex: "\.system\." }}}, {ns: {$regex: "\.system\.buckets\."}}]},
{ db: { $ne: 'config' } },
{ db: { $ne: 'admin' } }
]}},
{ $project: {
_id: 0,
ns: {
$cond: [
"$options.timeseries",
{
$replaceAll: {
input: "$ns",
find: ".system.buckets",
replacement: ""
}
},
"$ns"
]
}
}}
])Move the collections one by one逐一移动集合To move the collection, run要移动集合,请运行sh.moveCollection():sh.moveCollection():sh.moveCollection( "<database>.<collection>", "<ID of recipient shard>" )Note
如果在分片的命名空间上运行该命令,moveCollectionfails if you run the command on a namespace that is sharded. If you receive this error message, ignore it and return to step1for the next collection.moveCollection将失败。如果您收到此错误消息,请忽略它并返回步骤1进行下一次集合。Return to step返回步骤1to check that there are no remaining unsharded collections on the draining shard.1,检查排水分片上是否没有剩余的未分片集合。
Change primary shard更改主分片
Run the 运行db.printShardingStatus() method:db.printShardingStatus()方法:
db.printShardingStatus()
In the 在命令输出的databases section of the command output, check the database.primary field. If the primary field is the removed shard, you must move that database's primary to a different shard.databases部分,检查database.primary字段。如果primary字段是已删除的分片,则必须将该数据库的主字段移动到其他分片。
To change a database's primary shard, run the 要更改数据库的主分片,请运行movePrimary command.movePrimary命令。
Warning
When you run 当您运行movePrimary, any collections that were not moved in the Move collections off of the shard step are unavailable during the movePrimary process.movePrPrimary时,在movePrimary过程中,任何未在“将集合移出分片”步骤中移动的集合都不可用。
db.adminCommand(
{
movePrimary: <dbName>,
to: <shardName>
}
)Check migration status检查迁移状态
To check the progress of the migration, run 要检查迁移的进度,请再次从removeShard from the admin database again:admin数据库中运行removeShard:
db.adminCommand( { removeShard: "<shardName>" } )
In the output, the 在输出中,remaining field includes these fields:remaining字段包括以下字段:
chunks | |
dbs | dbsToMove output field.dbsToMove输出字段中指定。 |
jumboChunks |
|
Continue checking the status of the 继续检查removeShard command until the number of chunks remaining is 0.removeShard命令的状态,直到剩余块数为0。
db.adminCommand( { removeShard: "<shardName>" } )Finalize shard removal完成分片删除
To finalize the shard removal process, re-run the 要完成分片删除过程,请重新运行removeShard command:removeShard命令:
db.adminCommand( { removeShard: <shardName> } )
Note
DDL OperationsDDL操作
If you remove a shard while your cluster executes a DDL operation (an operation that modifies a collection such as 如果在集群执行DDL操作(修改集合(如reshardCollection), the removeShard operation runs after the concurrent DDL operation finishes.reshardCollection)的操作)时删除分片,则removeShard操作将在并发DDL操作完成后运行。
If the shard is removed, the command output resembles the following:如果删除分片,命令输出类似于以下内容:
{
msg: 'removeshard completed successfully',
state: 'completed',
shard: '<shardName>',
ok: 1,
'$clusterTime': {
clusterTime: Timestamp({ t: 1721941519, i: 7 }),
signature: {
hash: Binary.createFromBase64('AAAAAAAAAAAAAAAAAAAAAAAAAAA=', 0),
keyId: Long('0')
}
},
operationTime: Timestamp({ t: 1721941519, i: 7 })
}
Learn More了解更多