Database Manual / Sharding / Administration

Remove Shards from a Sharded Cluster从分片集群中删除分片

To remove a shard you must ensure the shard's data is migrated to the remaining shards in the cluster. This procedure describes how to safely migrate data and remove a shard.要删除分片,您必须确保分片的数据迁移到集群中的其余分片。此过程描述了如何安全地迁移数据和删除分片。

About this Task关于此任务

  • Creating, sharding, or moving collections while performing this procedure may cause interruptions and lead to unexpected results.执行此过程时创建、分片或移动集合可能会导致中断并导致意外结果。
  • Do not use this procedure to migrate an entire cluster to new hardware. To migrate, see Migrate a Self-Managed Sharded Cluster to Different Hardware.请勿使用此过程将整个群集迁移到新硬件。要迁移,请参阅将自我管理的分片群集迁移到不同的硬件
  • When you remove a shard in a cluster with an uneven chunk distribution, the balancer first removes the chunks from the draining shard and then balances the remaining uneven chunk distribution.当你在一个块分布不均匀的集群中删除一个分片时,平衡器首先从耗尽的分片中删除块,然后平衡剩余的不均匀块分布。
  • Removing a shard may cause an open change stream cursor to close, and the closed change stream cursor may not be fully resumable.删除分片可能会导致打开的更改流游标关闭,而关闭的更改流游标可能无法完全恢复。
  • You can safely restart a cluster during a shard removal process. If you restart a cluster during an ongoing draining process, draining continues automatically after the cluster components restart. 在分片删除过程中,您可以安全地重新启动集群。如果在正在进行的排水过程中重新启动群集,则排水将在群集组件重新启动后自动继续。MongoDB records the shard draining status in the config.shards collection.MongoDB在config.shards集合中记录分片耗尽状态。

Before you Begin开始之前

  1. This procedure uses the sh.moveCollection() method to move collections off of the removed shard. 此过程使用sh.moveCollection()方法将集合移出已删除的分片。Before you begin this procedure, review the moveCollection considerations and requirements to understand the command behavior.在开始此过程之前,请查看moveCollection注意事项要求,以了解命令行为。
  2. To remove a shard, first connect to one of the cluster's mongos instances using mongosh.要删除分片,首先使用mongosh连接到集群的mongos实例之一。

Note

When removing multiple shards, remove them simultaneously rather than one at a time. Removing one shard at a time causes the balancer to drain data into other remaining shards. A shard can only participate in one chunk migration at a time, so removing one shard limits the throughput of data migration.删除多个分片时,请同时删除它们,而不是一次删除一个。一次删除一个分片会导致平衡器将数据排入其他剩余的分片。一个分片一次只能参与一个块迁移,因此删除一个分片会限制数据迁移的吞吐量。

Steps步骤

1

Ensure the balancer is enabled确保平衡器已启用

To migrate data from a shard, the balancer process must be enabled. To check the balancer state, use the sh.getBalancerState() method:要从分片迁移数据,必须启用平衡器进程。要检查平衡器状态,请使用sh.getBalancerState()方法:

sh.getBalancerState()

If the operation returns true, the balancer is enabled.如果操作返回true,则启用平衡器。

If the operation returns false, see Enable the Balancer.如果操作返回false,请参阅启用平衡器

2

Determine the name of the shard to remove确定要删除的分片的名称

To find the name of the shard, run the listShards command:要查找分片的名称,请运行listShards命令:

db.adminCommand( { listShards: 1 } )

The shards._id field contains the shard name.shards._id字段包含分片名称。

3

Migrate sharded collection data with the balancer使用平衡器迁移分片集合数据

Run the removeShard command for the shard you want to remove:对要删除的分片运行removeShard命令:

db.adminCommand( { removeShard: "<shardName>" } )

Note

mongos converts the write concern of the removeShard command to "majority".mongosremoveShard命令的写入关注转换为"majority"

The removeShard operation returns:removeShard操作返回:

{
"msg" : "draining started successfully",
"state" : "started",
"shard" : "<shardName>",
"note" : "you need to call moveCollection for collectionsToMove and afterwards movePrimary for the dbsToMove",
"dbsToMove" : [
"db1",
"db2"
],
collectionsToMove: ["db1.collA"]
"ok" : 1,
"operationTime" : Timestamp(1575398919, 2),
"$clusterTime" : {
"clusterTime" : Timestamp(1575398919, 2),
"signature" : {
"hash" : BinData(0,"Oi68poWCFCA7b9kyhIcg+TzaGiA="),
"keyId" : Long("6766255701040824328")
}
}

The shard enters the draining state and the balancer begins migrating chunks from the removed shard to other shards in the cluster. 分片进入draining(耗尽)状态,平衡器开始将块从已删除的分片迁移到集群中的其他分片。These migrations happens slowly to avoid severe impact on the overall cluster. Depending on your network capacity and the amount of data, this operation can take from a few minutes to several days to complete.这些迁移过程缓慢,以避免对整个集群造成严重影响。根据您的网络容量和数据量,此操作可能需要几分钟到几天的时间才能完成。

Tip

While the shard is in the draining state, you can use the reshardCollection command to redistribute data off of the removed shard.当分片处于draining(耗尽)状态时,您可以使用reshardCollection命令从已删除的分片中重新分发数据。

Moving data with reshardCollection can be faster than waiting for the balancer to migrate chunks. The cluster ensures that data is not placed on any draining shards. You can't run moveCollection and reshardCollection operations simultaneously.使用reshardCollection移动数据可能比等待平衡器迁移块更快。集群确保数据不会放置在任何耗尽的分片上。您不能同时运行moveCollectionreshardCollection操作。

For the full procedure, see Resharding for Adding and Removing Shards.有关完整过程,请参阅重新分片以添加和删除分片

4

Move unsharded collections to another shard将未分片的集合移动到另一个分片

  1. Determine what collections need to be moved确定需要移动哪些集合

    To list the unsharded collections on the shard, use the aggregation stage $listClusterCatalog:要列出分片上的未分片集合,请使用聚合阶段$listClusterCatalog

    use admin

    db.aggregate([
    { $listClusterCatalog: { shards: true } },
    { $match: {
    $and: [
    { sharded: false },
    { shards: '<shard_to_remove>' },
    { type: { $nin: ["timeseries","view"] } },
    { ns: { $not: { $regex: "^enxcol_\..*(\.esc|\.ecc|\.ecoc|\.ecoc\.compact)$" }}},
    { $or: [{ns: {$not: { $regex: "\.system\." }}}, {ns: {$regex: "\.system\.buckets\."}}]},
    { db: { $ne: 'config' } },
    { db: { $ne: 'admin' } }
    ]}},
    { $project: {
    _id: 0,
    ns: {
    $cond: [
    "$options.timeseries",
    {
    $replaceAll: {
    input: "$ns",
    find: ".system.buckets",
    replacement: ""
    }
    },
    "$ns"
    ]
    }
    }}
    ])
  2. Move the collections one by one逐一移动集合

    To move the collection, run sh.moveCollection():要移动集合,请运行sh.moveCollection()

    sh.moveCollection( "<database>.<collection>", "<ID of recipient shard>" )

    Note

    moveCollection fails if you run the command on a namespace that is sharded. If you receive this error message, ignore it and return to step 1 for the next collection.如果在分片的命名空间上运行该命令,moveCollection将失败。如果您收到此错误消息,请忽略它并返回步骤1进行下一次集合。

  3. Return to step 1 to check that there are no remaining unsharded collections on the draining shard.返回步骤1,检查排水分片上是否没有剩余的未分片集合。
5

Change primary shard更改主分片

Run the db.printShardingStatus() method:运行db.printShardingStatus()方法:

db.printShardingStatus()

In the databases section of the command output, check the database.primary field. If the primary field is the removed shard, you must move that database's primary to a different shard.在命令输出的databases部分,检查database.primary字段。如果primary字段是已删除的分片,则必须将该数据库的主字段移动到其他分片。

To change a database's primary shard, run the movePrimary command.要更改数据库的主分片,请运行movePrimary命令。

Warning

When you run movePrimary, any collections that were not moved in the Move collections off of the shard step are unavailable during the movePrimary process.当您运行movePrPrimary时,在movePrimary过程中,任何未在“将集合移出分片”步骤中移动的集合都不可用。

db.adminCommand(
{
movePrimary: <dbName>,
to: <shardName>
}
)
6

Check migration status检查迁移状态

To check the progress of the migration, run removeShard from the admin database again:要检查迁移的进度,请再次从admin数据库中运行removeShard

db.adminCommand( { removeShard: "<shardName>" } )

In the output, the remaining field includes these fields:在输出中,remaining字段包括以下字段:

Field字段Description描述
chunksNumber of chunks currently remaining on the shard分片上当前剩余的块数
dbsNumber of databases whose primary shard is the shard. These databases are specified in the dbsToMove output field.主分片为分片的数据库数量。这些数据库在dbsToMove输出字段中指定。
jumboChunks

Of the total number of chunks, the number that are jumbo.chunks的总数中,巨型块的数量。

If jumboChunks is greater than 0, wait until only the jumboChunks remain on the shard. 如果jumboChunks大于0,则等待直到分片上只剩下jumboChunkOnce only the jumbo chunks remain, you must manually clear the jumbo flag before the draining can complete. 一旦只剩下jumbo块,您必须在排水完成之前手动清除大块标志。See Clear jumbo Flag.请参阅清除jumbo标志

After the jumbo flag clears, the balancer can migrate these chunks. For details on the migration procedure, see Range Migration Procedure.jumbo标志清除后,平衡器可以迁移这些块。有关迁移过程的详细信息,请参阅范围迁移过程

Continue checking the status of the removeShard command until the number of chunks remaining is 0.继续检查removeShard命令的状态,直到剩余块数为0。

db.adminCommand( { removeShard: "<shardName>" } )
7

Finalize shard removal完成分片删除

To finalize the shard removal process, re-run the removeShard command:要完成分片删除过程,请重新运行removeShard命令:

db.adminCommand( { removeShard: <shardName> } )

Note

DDL OperationsDDL操作

If you remove a shard while your cluster executes a DDL operation (an operation that modifies a collection such as reshardCollection), the removeShard operation runs after the concurrent DDL operation finishes.如果在集群执行DDL操作(修改集合(如reshardCollection)的操作)时删除分片,则removeShard操作将在并发DDL操作完成后运行。

If the shard is removed, the command output resembles the following:如果删除分片,命令输出类似于以下内容:

{
   msg: 'removeshard completed successfully',
state: 'completed',
shard: '<shardName>',
ok: 1,
'$clusterTime': {
clusterTime: Timestamp({ t: 1721941519, i: 7 }),
signature: {
hash: Binary.createFromBase64('AAAAAAAAAAAAAAAAAAAAAAAAAAA=', 0),
keyId: Long('0')
}
},
operationTime: Timestamp({ t: 1721941519, i: 7 })
}

Learn More了解更多