Docs HomeMongoDB Manual

Back Up a Sharded Cluster with File System Snapshots使用文件系统快照备份分片群集

Overview概述

This document describes a procedure for taking a backup of all components of a sharded cluster. 本文档描述了对分片集群的所有组件进行备份的过程。This procedure uses file system snapshots to capture a copy of the mongod instance.此过程使用文件系统快照来捕获mongod实例的副本。

Important

To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. 要从分片集群捕获时间点备份,必须停止对集群的所有写入。On a running production system, you can only capture an approximation of point-in-time snapshot.在运行的生产系统上,您只能捕获时间点快照的近似值

For more information on backups in MongoDB and backups of sharded clusters in particular, see MongoDB Backup Methods and Backup and Restore Sharded Clusters.有关MongoDB中的备份,特别是分片集群的备份的更多信息,请参阅MongoDB备份方法备份和恢复分片集群

Considerations注意事项

Transactions Across Shards跨分片的事务

In MongoDB 4.2+, you cannot use file system snapshots for backups that involve transactions across shards because those backups do not maintain atomicity. Instead, use one of the following to perform the backups:在MongoDB 4.2+中,不能将文件系统快照用于涉及跨分片事务的备份,因为这些备份不维护原子性。相反,请使用以下方法之一执行备份:

Encrypted Storage Engine (MongoDB Enterprise Only)加密存储引擎(仅限MongoDB Enterprise)

For encrypted storage engines that use AES256-GCM encryption mode, AES256-GCM requires that every process use a unique counter block value with the key.对于使用AES256-GCM加密模式的加密存储引擎AES256-GCM要求每个进程使用唯一的计数器块值和键。

For encrypted storage engine configured with AES256-GCM cipher:对于配置有AES256-GCM密码的加密存储引擎

Restoring from Hot Backup从热备份恢复
Starting in 4.2, if you restore from files taken via "hot" backup (i.e. the mongod is running), MongoDB can detect "dirty" keys on startup and automatically rollover the database key to avoid IV (Initialization Vector) reuse.从4.2开始,如果你从通过“热”备份(即mongod正在运行)获取的文件中恢复,MongoDB可以在启动时检测到“脏”键,并自动滚动数据库键,以避免IV(初始化向量)重用。
Restoring from Cold Backup从冷备份恢复

However, if you restore from files taken via "cold" backup (i.e. the mongod is not running), MongoDB cannot detect "dirty" keys on startup, and reuse of IV voids confidentiality and integrity guarantees.然而,如果您从通过“冷”备份获取的文件中恢复(即mongod没有运行),MongoDB在启动时无法检测到“脏”键,并且IV的重用将失去机密性和完整性保证。

Starting in 4.2, to avoid the reuse of the keys after restoring from a cold filesystem snapshot, MongoDB adds a new command-line option --eseDatabaseKeyRollover. 从4.2开始,为了避免从冷文件系统快照恢复后重用键,MongoDB添加了一个新的命令行选项--eseDatabaseKeyRolloverWhen started with the --eseDatabaseKeyRollover option, the mongod instance rolls over the database keys configured with AES256-GCM cipher and exits.当使用--eseDatabaseKeyRollover选项启动时,mongod实例会滚动使用AES256-GCM密码配置的数据库键并退出。

Tip
  • In general, if using filesystem based backups for MongoDB Enterprise 4.2+, use the "hot" backup feature, if possible.通常,如果MongoDB Enterprise 4.2+使用基于文件系统的备份,请尽可能使用“热”备份功能。
  • For MongoDB Enterprise versions 4.0 and earlier, if you use AES256-GCM encryption mode, do not make copies of your data files or restore from filesystem snapshots ("hot" or "cold").对于MongoDB Enterprise 4.0及更早版本,如果使用AES256-GCM加密模式,请不要复制数据文件或从文件系统快照(“热”或“冷”)进行恢复。

Balancer平衡器

It is essential that you stop the balancer before capturing a backup.在捕获备份之前,必须停止平衡器

If the balancer is active while you capture backups, the backup artifacts may be incomplete and/or have duplicate data, as chunks may migrate while recording backups.如果平衡器在捕获备份时处于活动状态,则备份工件可能不完整和/或具有重复数据,因为可能会在记录备份时迁移。

Precision精确

In this procedure, you will stop the cluster balancer and take a backup up of the config database, and then take backups of each shard in the cluster using a file-system snapshot tool. 在这个过程中,您将停止集群平衡器并备份config数据库,然后使用文件系统快照工具备份集群中的每个分片。If you need an exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the file system snapshots; otherwise the snapshot will only approximate a moment in time.如果您需要系统的实时快照,则需要在获取文件系统快照之前停止所有应用程序写入;否则,快照将仅近似于某个时刻。

For approximate point-in-time snapshots, you can minimize the impact on the cluster by taking the backup from a secondary member of each replica set shard.对于近似的时间点快照,您可以通过从每个副本集分片的辅助成员获取备份来最大限度地减少对集群的影响。

Consistency一致性

If the journal and data files are on the same logical volume, you can use a single point-in-time snapshot to capture a consistent copy of the data files.如果日志文件和数据文件位于同一逻辑卷上,则可以使用单个时间点快照来捕获数据文件的一致副本。

If the journal and data files are on different file systems, you must use db.fsyncLock() and db.fsyncUnlock() to ensure that the data files do not change, providing consistency for the purposes of creating backups.如果日志文件和数据文件位于不同的文件系统上,则必须使用db.fsyncLock()db.fsyncUnlock()来确保数据文件不会更改,从而为创建备份提供一致性。

Snapshots with Amazon EBS in a RAID 10 ConfigurationRAID 10配置中的Amazon EBS快照

If your deployment depends on Amazon's Elastic Block Storage (EBS) with RAID configured within your instance, it is impossible to get a consistent state across all disks using the platform's snapshot tool. 如果您的部署依赖于亚马逊的弹性块存储(EBS),并且在您的实例中配置了RAID,那么使用平台的快照工具不可能在所有磁盘上获得一致的状态。As an alternative, you can do one of the following:作为替代方案,您可以执行以下操作之一:

Procedure过程

1

Disable the balancer.禁用平衡器。

Connect mongosh to a cluster mongos instance. Use the sh.stopBalancer() method to stop the balancer. If a balancing round is in progress, the operation waits for balancing to complete before stopping the balancer.mongosh连接到集群mongos实例。使用sh.stopBalancer()方法停止平衡器。如果正在进行一轮平衡,则操作将等待平衡完成,然后停止平衡器。

use config
sh.stopBalancer()

Starting in MongoDB 6.1, automatic chunk splitting is not performed. This is because of balancing policy improvements. 从MongoDB 6.1开始,不执行自动区块分割。这是因为平衡政策的改进。Auto-splitting commands still exist, but do not perform an operation. For details, see Balancing Policy Changes.自动拆分命令仍然存在,但不执行操作。有关详细信息,请参阅平衡策略更改

In MongoDB versions earlier than 6.1, sh.stopBalancer() also disables auto-splitting for the sharded cluster.在6.1之前的MongoDB版本中,sh.stopBalancer()还禁用了分片集群的自动拆分。

For more information, see the Disable the Balancer procedure.有关更多信息,请参阅禁用平衡器过程。

2

If necessary, lock one secondary member of each replica set.如有必要,请锁定每个复制副本集的一个辅助成员。

If your secondary does not have journaling enabled or its journal and data files are on different volumes, you must lock the secondary's mongod instance before capturing a backup.如果您的辅助设备没有启用日志记录,或者其日志和数据文件位于不同的卷上,则必须在捕获备份之前锁定辅助设备的mongod实例。

If your secondary has journaling enabled and its journal and data files are on the same volume, you may skip this step.如果辅助设备已启用日志记录,并且其日志和数据文件位于同一卷上,则可以跳过此步骤。

Important

If your deployment requires this step, you must perform it on one secondary of each shard and one secondary of the config server replica set (CSRS).如果您的部署需要此步骤,则必须在每个分片的一个secondary和配置服务器副本集(CSRS)的一个辅服务器上执行此步骤。

Ensure that the oplog has sufficient capacity to allow these secondaries to catch up to the state of the primaries after finishing the backup procedure. 确保oplog具有足够的容量,以便这些辅助设备在完成备份过程后能够赶上主设备的状态。See Oplog Size for more information.有关详细信息,请参阅操作日志大小

Lock shard replica set secondary.锁定分片副本集辅助。

For each shard replica set in the sharded cluster, confirm that the member has replicated data up to some control point. 对于分片集群中的每个分片副本集,确认成员已将数据复制到某个控制点。To verify, first connect mongosh to the shard primary and perform a write operation with "majority" write concern on a control collection:要进行验证,请首先将mongosh连接到分片-primary,并对控件集合执行具有"majority"写入关注的写入操作:

use config
db.BackupControl.findAndModify(
{
query: { _id: 'BackupControlDocument' },
update: { $inc: { counter : 1 } },
new: true,
upsert: true,
writeConcern: { w: 'majority', wtimeout: 15000 }
}
);

The operation should return the modified (or inserted) control document:操作应返回修改(或插入)的控制文档:

{ "_id" : "BackupControlDocument", "counter" : 1 }

Query the shard secondary member for the returned control document. 向分片辅助成员查询返回的控制文档。Connect mongosh to the shard secondary to lock and use db.collection.find() to query for the control document:mongosh连接到分片-secondary进行锁定,并使用db.collection.find()查询控制文档:

rs.secondaryOk();

use config;

db.BackupControl.find(
{ "_id" : "BackupControlDocument", "counter" : 1 }
).readConcern('majority');

If the secondary member contains the latest control document, it is safe to lock the member. 如果辅助成员包含最新的控制文档,则可以安全地锁定该成员。Otherwise, wait until the member contains the document or select a different secondary member that contains the latest control document.否则,请等待成员包含文档,或者选择包含最新控制文档的其他辅助成员。

To lock the secondary member, run db.fsyncLock() on the member:要锁定辅助成员,请对该成员运行db.fsyncLock()

db.fsyncLock()

Lock config server replica set secondary.锁定配置服务器副本集辅助。

If locking a secondary of the CSRS, confirm that the member has replicated data up to some control point. 如果锁定CSRS的辅助,请确认该成员已将数据复制到某个控制点。To verify, first connect mongosh to the CSRS primary and perform a write operation with "majority" write concern on a control collection:要进行验证,请首先将mongosh连接到CSRS主服务器,并在控制集合上执行具有“多数”写入关注的写入操作:

use config
db.BackupControl.findAndModify(
{
query: { _id: 'BackupControlDocument' },
update: { $inc: { counter : 1 } },
new: true,
upsert: true,
writeConcern: { w: 'majority', wtimeout: 15000 }
}
);

The operation should return the modified (or inserted) control document:操作应返回修改(或插入)的控制文档:

{ "_id" : "BackupControlDocument", "counter" : 1 }

Query the CSRS secondary member for the returned control document. 向CSRS辅助成员查询返回的控制文档。Connect mongosh to the CSRS secondary to lock and use db.collection.find() to query for the control document:mongosh连接到CSRS secondary进行锁定,并使用db.collection.find()查询控制文档:

rs.secondaryOk();

use config;

db.BackupControl.find(
{ "_id" : "BackupControlDocument", "counter" : 1 }
).readConcern('majority');

If the secondary member contains the latest control document, it is safe to lock the member. 如果辅助成员包含最新的控制文档,则可以安全地锁定该成员。Otherwise, wait until the member contains the document or select a different secondary member that contains the latest control document.否则,请等待成员包含文档,或者选择包含最新控制文档的其他辅助成员。

To lock the secondary member, run db.fsyncLock() on the member:要锁定辅助成员,请对该成员运行db.fsyncLock()

db.fsyncLock()
3

Back up one of the config servers.备份其中一个配置服务器。

Note

Backing up a config server backs up the sharded cluster's metadata. 备份config服务器会备份分片集群的元数据。You only need to back up one config server, as they all hold the same data. Perform this step against the locked CSRS secondary member.您只需要备份一个配置服务器,因为它们都持有相同的数据。对锁定的CSRS次要成员执行此步骤。

To create a file-system snapshot of the config server, follow the procedure in Create a Snapshot.要创建配置服务器的文件系统快照,请按照创建快照中的过程进行操作。

4

Back up a replica set member for each shard.为每个分片备份一个副本集成员。

If you locked a member of the replica set shards, perform this step against the locked secondary.如果锁定了复制集分片的一个成员,请对锁定的次级执行此步骤。

You may back up the shards in parallel. For each shard, create a snapshot, using the procedure in Back Up and Restore with Filesystem Snapshots.您可以并行备份分片。对于每个分片,使用使用文件系统快照备份和恢复中的过程创建一个快照。

5

Unlock all locked replica set members.解除锁定所有锁定的复制副本集成员。

If you locked any mongod instances to capture the backup, unlock them.如果您锁定了任何mongod实例来捕获备份,请解锁它们。

To unlock the replica set members, use db.fsyncUnlock() method in mongosh.要解锁副本集成员,请使用mongosh中的db.fsyncUnlock()方法。

db.fsyncUnlock()
6

Enable the balancer.启用平衡器。

To re-enable to balancer, connect mongosh to a mongos instance and run sh.startBalancer().要重新启用To balancer,请将mongosh连接到mongos实例并运行sh.startBalancer()

sh.startBalancer()

Starting in MongoDB 6.1, automatic chunk splitting is not performed. 从MongoDB 6.1开始,不执行自动区块分割。This is because of balancing policy improvements. Auto-splitting commands still exist, but do not perform an operation. 这是因为平衡政策的改进。自动拆分命令仍然存在,但不执行操作。For details, see Balancing Policy Changes.有关详细信息,请参阅平衡策略更改

In MongoDB versions earlier than 6.1, sh.startBalancer() also enables auto-splitting for the sharded cluster.在6.1之前的MongoDB版本中,sh.startBalancer()还支持对分片集群进行自动拆分。