Back Up a Sharded Cluster with File System Snapshots使用文件系统快照备份分片群集
On this page本页内容
Overview概述
This document describes a procedure for taking a backup of all components of a sharded cluster. 本文档描述了对分片集群的所有组件进行备份的过程。This procedure uses file system snapshots to capture a copy of the 此过程使用文件系统快照来捕获mongod
instance.mongod
实例的副本。
To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. 要从分片集群捕获时间点备份,必须停止对集群的所有写入。On a running production system, you can only capture an approximation of point-in-time snapshot.在运行的生产系统上,您只能捕获时间点快照的近似值。
For more information on backups in MongoDB and backups of sharded clusters in particular, see MongoDB Backup Methods and Backup and Restore Sharded Clusters.有关MongoDB中的备份,特别是分片集群的备份的更多信息,请参阅MongoDB备份方法和备份和恢复分片集群。
Considerations注意事项
Transactions Across Shards跨分片的事务
In MongoDB 4.2+, you cannot use file system snapshots for backups that involve transactions across shards because those backups do not maintain atomicity. Instead, use one of the following to perform the backups:在MongoDB 4.2+中,不能将文件系统快照用于涉及跨分片事务的备份,因为这些备份不维护原子性。相反,请使用以下方法之一执行备份:
Encrypted Storage Engine (MongoDB Enterprise Only)加密存储引擎(仅限MongoDB Enterprise)
For encrypted storage engines that use 对于使用AES256-GCM加密模式的加密存储引擎,AES256-GCM
encryption mode, AES256-GCM
requires that every process use a unique counter block value with the key.AES256-GCM
要求每个进程使用唯一的计数器块值和键。
For encrypted storage engine configured with 对于配置有AES256-GCM
cipher:AES256-GCM
密码的加密存储引擎:
Restoring from Hot Backup从热备份恢复Starting in 4.2, if you restore from files taken via "hot" backup (i.e. the从4.2开始,如果你从通过“热”备份(即mongod
is running), MongoDB can detect "dirty" keys on startup and automatically rollover the database key to avoid IV (Initialization Vector) reuse.mongod
正在运行)获取的文件中恢复,MongoDB可以在启动时检测到“脏”键,并自动滚动数据库键,以避免IV(初始化向量)重用。Restoring from Cold Backup从冷备份恢复-
However, if you restore from files taken via "cold" backup (i.e. the然而,如果您从通过“冷”备份获取的文件中恢复(即mongod
is not running), MongoDB cannot detect "dirty" keys on startup, and reuse of IV voids confidentiality and integrity guarantees.mongod
没有运行),MongoDB在启动时无法检测到“脏”键,并且IV的重用将失去机密性和完整性保证。Starting in 4.2, to avoid the reuse of the keys after restoring from a cold filesystem snapshot, MongoDB adds a new command-line option从4.2开始,为了避免从冷文件系统快照恢复后重用键,MongoDB添加了一个新的命令行选项--eseDatabaseKeyRollover
.--eseDatabaseKeyRollover
。When started with the当使用--eseDatabaseKeyRollover
option, themongod
instance rolls over the database keys configured withAES256-GCM
cipher and exits.--eseDatabaseKeyRollover
选项启动时,mongod
实例会滚动使用AES256-GCM密码配置的数据库键并退出。
In general, if using filesystem based backups for MongoDB Enterprise 4.2+, use the "hot" backup feature, if possible.通常,如果MongoDB Enterprise 4.2+使用基于文件系统的备份,请尽可能使用“热”备份功能。For MongoDB Enterprise versions 4.0 and earlier, if you use对于MongoDB Enterprise 4.0及更早版本,如果使用AES256-GCM
encryption mode, do not make copies of your data files or restore from filesystem snapshots ("hot" or "cold").AES256-GCM
加密模式,请不要复制数据文件或从文件系统快照(“热”或“冷”)进行恢复。
Balancer平衡器
It is essential that you stop the balancer before capturing a backup.在捕获备份之前,必须停止平衡器。
If the balancer is active while you capture backups, the backup artifacts may be incomplete and/or have duplicate data, as chunks may migrate while recording backups.如果平衡器在捕获备份时处于活动状态,则备份工件可能不完整和/或具有重复数据,因为块可能会在记录备份时迁移。
Precision精确
In this procedure, you will stop the cluster balancer and take a backup up of the config database, and then take backups of each shard in the cluster using a file-system snapshot tool. 在这个过程中,您将停止集群平衡器并备份config
数据库,然后使用文件系统快照工具备份集群中的每个分片。If you need an exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the file system snapshots; otherwise the snapshot will only approximate a moment in time.如果您需要系统的实时快照,则需要在获取文件系统快照之前停止所有应用程序写入;否则,快照将仅近似于某个时刻。
For approximate point-in-time snapshots, you can minimize the impact on the cluster by taking the backup from a secondary member of each replica set shard.对于近似的时间点快照,您可以通过从每个副本集分片的辅助成员获取备份来最大限度地减少对集群的影响。
Consistency一致性
If the journal and data files are on the same logical volume, you can use a single point-in-time snapshot to capture a consistent copy of the data files.如果日志文件和数据文件位于同一逻辑卷上,则可以使用单个时间点快照来捕获数据文件的一致副本。
If the journal and data files are on different file systems, you must use 如果日志文件和数据文件位于不同的文件系统上,则必须使用db.fsyncLock()
and db.fsyncUnlock()
to ensure that the data files do not change, providing consistency for the purposes of creating backups.db.fsyncLock()
和db.fsyncUnlock()
来确保数据文件不会更改,从而为创建备份提供一致性。
Snapshots with Amazon EBS in a RAID 10 ConfigurationRAID 10配置中的Amazon EBS快照
If your deployment depends on Amazon's Elastic Block Storage (EBS) with RAID configured within your instance, it is impossible to get a consistent state across all disks using the platform's snapshot tool. 如果您的部署依赖于亚马逊的弹性块存储(EBS),并且在您的实例中配置了RAID,那么使用平台的快照工具不可能在所有磁盘上获得一致的状态。As an alternative, you can do one of the following:作为替代方案,您可以执行以下操作之一:
Flush all writes to disk and create a write lock to ensure consistent state during the backup process.将所有写操作刷新到磁盘并创建写锁定,以确保备份过程中的状态一致。If you choose this option see Back up Instances with Journal Files on Separate Volume or without Journaling.如果选择此选项,请参阅在单独卷上使用日志文件或不使用日志备份实例。Configure LVM to run and hold your MongoDB data files on top of the RAID within your system.将LVM配置为在系统中的RAID之上运行并保存MongoDB数据文件。If you choose this option, perform the LVM backup operation described in Create a Snapshot.如果选择此选项,请执行创建快照中描述的LVM备份操作。
Procedure过程
Disable the balancer.禁用平衡器。
Connect 将mongosh
to a cluster mongos
instance. Use the sh.stopBalancer()
method to stop the balancer. If a balancing round is in progress, the operation waits for balancing to complete before stopping the balancer.mongosh
连接到集群mongos
实例。使用sh.stopBalancer()
方法停止平衡器。如果正在进行一轮平衡,则操作将等待平衡完成,然后停止平衡器。
use config
sh.stopBalancer()
Starting in MongoDB 6.1, automatic chunk splitting is not performed. This is because of balancing policy improvements. 从MongoDB 6.1开始,不执行自动区块分割。这是因为平衡政策的改进。Auto-splitting commands still exist, but do not perform an operation. For details, see Balancing Policy Changes.自动拆分命令仍然存在,但不执行操作。有关详细信息,请参阅平衡策略更改。
In MongoDB versions earlier than 6.1, 在6.1之前的MongoDB版本中,sh.stopBalancer()
also disables auto-splitting for the sharded cluster.sh.stopBalancer()
还禁用了分片集群的自动拆分。
For more information, see the Disable the Balancer procedure.有关更多信息,请参阅禁用平衡器过程。
If necessary, lock one secondary member of each replica set.如有必要,请锁定每个复制副本集的一个辅助成员。
If your secondary does not have journaling enabled or its journal and data files are on different volumes, you must lock the secondary's 如果您的辅助设备没有启用日志记录,或者其日志和数据文件位于不同的卷上,则必须在捕获备份之前锁定辅助设备的mongod
instance before capturing a backup.mongod
实例。
If your secondary has journaling enabled and its journal and data files are on the same volume, you may skip this step.如果辅助设备已启用日志记录,并且其日志和数据文件位于同一卷上,则可以跳过此步骤。
If your deployment requires this step, you must perform it on one secondary of each shard and one secondary of the config server replica set (CSRS).如果您的部署需要此步骤,则必须在每个分片的一个secondary和配置服务器副本集(CSRS)的一个辅服务器上执行此步骤。
Ensure that the oplog has sufficient capacity to allow these secondaries to catch up to the state of the primaries after finishing the backup procedure. 确保oplog具有足够的容量,以便这些辅助设备在完成备份过程后能够赶上主设备的状态。See Oplog Size for more information.有关详细信息,请参阅操作日志大小。
Lock shard replica set secondary.锁定分片副本集辅助。
For each shard replica set in the sharded cluster, confirm that the member has replicated data up to some control point. 对于分片集群中的每个分片副本集,确认成员已将数据复制到某个控制点。To verify, first connect 要进行验证,请首先将mongosh
to the shard primary and perform a write operation with "majority"
write concern on a control collection:mongosh
连接到分片-primary,并对控件集合执行具有"majority"
写入关注的写入操作:
use config
db.BackupControl.findAndModify(
{
query: { _id: 'BackupControlDocument' },
update: { $inc: { counter : 1 } },
new: true,
upsert: true,
writeConcern: { w: 'majority', wtimeout: 15000 }
}
);
The operation should return the modified (or inserted) control document:操作应返回修改(或插入)的控制文档:
{ "_id" : "BackupControlDocument", "counter" : 1 }
Query the shard secondary member for the returned control document. 向分片辅助成员查询返回的控制文档。Connect 将mongosh
to the shard secondary to lock and use db.collection.find()
to query for the control document:mongosh
连接到分片-secondary进行锁定,并使用db.collection.find()
查询控制文档:
rs.secondaryOk();
use config;
db.BackupControl.find(
{ "_id" : "BackupControlDocument", "counter" : 1 }
).readConcern('majority');
If the secondary member contains the latest control document, it is safe to lock the member. 如果辅助成员包含最新的控制文档,则可以安全地锁定该成员。Otherwise, wait until the member contains the document or select a different secondary member that contains the latest control document.否则,请等待成员包含文档,或者选择包含最新控制文档的其他辅助成员。
To lock the secondary member, run 要锁定辅助成员,请对该成员运行db.fsyncLock()
on the member:db.fsyncLock()
:
db.fsyncLock()
Lock config server replica set secondary.锁定配置服务器副本集辅助。
If locking a secondary of the CSRS, confirm that the member has replicated data up to some control point. 如果锁定CSRS的辅助,请确认该成员已将数据复制到某个控制点。To verify, first connect 要进行验证,请首先将mongosh
to the CSRS primary and perform a write operation with "majority"
write concern on a control collection:mongosh
连接到CSRS主服务器,并在控制集合上执行具有“多数”写入关注的写入操作:
use config
db.BackupControl.findAndModify(
{
query: { _id: 'BackupControlDocument' },
update: { $inc: { counter : 1 } },
new: true,
upsert: true,
writeConcern: { w: 'majority', wtimeout: 15000 }
}
);
The operation should return the modified (or inserted) control document:操作应返回修改(或插入)的控制文档:
{ "_id" : "BackupControlDocument", "counter" : 1 }
Query the CSRS secondary member for the returned control document. 向CSRS辅助成员查询返回的控制文档。Connect 将mongosh
to the CSRS secondary to lock and use db.collection.find()
to query for the control document:mongosh
连接到CSRS secondary进行锁定,并使用db.collection.find()
查询控制文档:
rs.secondaryOk();
use config;
db.BackupControl.find(
{ "_id" : "BackupControlDocument", "counter" : 1 }
).readConcern('majority');
If the secondary member contains the latest control document, it is safe to lock the member. 如果辅助成员包含最新的控制文档,则可以安全地锁定该成员。Otherwise, wait until the member contains the document or select a different secondary member that contains the latest control document.否则,请等待成员包含文档,或者选择包含最新控制文档的其他辅助成员。
To lock the secondary member, run 要锁定辅助成员,请对该成员运行db.fsyncLock()
on the member:db.fsyncLock()
:
db.fsyncLock()
Back up one of the config servers.备份其中一个配置服务器。
Backing up a config server backs up the sharded cluster's metadata. 备份config
服务器会备份分片集群的元数据。You only need to back up one config server, as they all hold the same data. Perform this step against the locked CSRS secondary member.您只需要备份一个配置服务器,因为它们都持有相同的数据。对锁定的CSRS次要成员执行此步骤。
To create a file-system snapshot of the config server, follow the procedure in Create a Snapshot.要创建配置服务器的文件系统快照,请按照创建快照中的过程进行操作。
Back up a replica set member for each shard.为每个分片备份一个副本集成员。
If you locked a member of the replica set shards, perform this step against the locked secondary.如果锁定了复制集分片的一个成员,请对锁定的次级执行此步骤。
You may back up the shards in parallel. For each shard, create a snapshot, using the procedure in Back Up and Restore with Filesystem Snapshots.您可以并行备份分片。对于每个分片,使用使用文件系统快照备份和恢复中的过程创建一个快照。
Unlock all locked replica set members.解除锁定所有锁定的复制副本集成员。
If you locked any 如果您锁定了任何mongod
instances to capture the backup, unlock them.mongod
实例来捕获备份,请解锁它们。
To unlock the replica set members, use 要解锁副本集成员,请使用db.fsyncUnlock()
method in mongosh
.mongosh
中的db.fsyncUnlock()
方法。
db.fsyncUnlock()
Enable the balancer.启用平衡器。
To re-enable to balancer, connect 要重新启用To balancer,请将mongosh
to a mongos
instance and run sh.startBalancer()
.mongosh
连接到mongos
实例并运行sh.startBalancer()
。
sh.startBalancer()
Starting in MongoDB 6.1, automatic chunk splitting is not performed. 从MongoDB 6.1开始,不执行自动区块分割。This is because of balancing policy improvements. Auto-splitting commands still exist, but do not perform an operation. 这是因为平衡政策的改进。自动拆分命令仍然存在,但不执行操作。For details, see Balancing Policy Changes.有关详细信息,请参阅平衡策略更改。
In MongoDB versions earlier than 6.1, 在6.1之前的MongoDB版本中,sh.startBalancer()
also enables auto-splitting for the sharded cluster.sh.startBalancer()
还支持对分片集群进行自动拆分。