On this page本页内容
This document describes a procedure for taking a backup of all components of a sharded cluster. 本文档描述了对分片群集的所有组件进行备份的过程。This procedure uses file system snapshots to capture a copy of the 此过程使用文件系统快照捕获mongod
instance.mongod
实例的副本。
To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. 要从分片群集捕获时间点备份,必须停止对群集的所有写入。On a running production system, you can only capture an approximation of point-in-time snapshot.在运行的生产系统上,只能捕获时间点快照的近似值。
For more information on backups in MongoDB and backups of sharded clusters in particular, see MongoDB Backup Methods and Backup and Restore Sharded Clusters.有关MongoDB中的备份,尤其是分片群集备份的更多信息,请参阅MongoDB备份方法以及备份和恢复分片群集。
In MongoDB 4.2+, you cannot use file system snapshots for backups that involve transactions across shards because those backups do not maintain atomicity. 在MongoDB 4.2+中,对于涉及跨分片事务的备份,不能使用文件系统快照,因为这些备份不保持原子性。Instead, use one of the following to perform the backups:相反,请使用以下方法之一执行备份:
For encrypted storage engines that use 对于使用AES256-GCM
encryption mode, AES256-GCM
requires that every process use a unique counter block value with the key.AES256-GCM
加密模式的加密存储引擎,AES256-GCM要求每个进程对密钥使用唯一的计数器块值。
For encrypted storage engine configured with 对于配置了AES256-GCM密码的加密存储引擎:AES256-GCM
cipher:
mongod
is running), MongoDB can detect "dirty" keys on startup and automatically rollover the database key to avoid IV (Initialization Vector) reuse.mongod
正在运行)获取的文件进行恢复,MongoDB可以在启动时检测到“脏”键,并自动滚动数据库键以避免IV(初始化向量)重用。However, if you restore from files taken via "cold" backup (i.e. the 但是,如果您从通过“冷”备份(即mongod
is not running), MongoDB cannot detect "dirty" keys on startup, and reuse of IV voids confidentiality and integrity guarantees.mongod
未运行)获取的文件进行恢复,MongoDB在启动时无法检测到“脏”密钥,并且重复使用IV会导致机密性和完整性保证失效。
Starting in 4.2, to avoid the reuse of the keys after restoring from a cold filesystem snapshot, MongoDB adds a new command-line option 从4.2开始,为了避免在从冷文件系统快照恢复后重用密钥,MongoDB添加了一个新的命令行选项--eseDatabaseKeyRollover
. --eseDatabaseKeyRollover
。When started with the 当使用--eseDatabaseKeyRollover
option, the mongod
instance rolls over the database keys configured with AES256-GCM
cipher and exits.--eseDatabaseKeyRollover
选项启动时,mongod
实例将滚动使用AES256-GCM
密码配置的数据库密钥并退出。
AES256-GCM
encryption mode, do not make copies of your data files or restore from filesystem snapshots ("hot" or "cold").AES256-GCM
加密模式,请勿复制数据文件或从文件系统快照进行恢复(“热”或“冷”)。It is essential that you stop the balancer before capturing a backup.在捕获备份之前,必须停止均衡器。
If the balancer is active while you capture backups, the backup artifacts may be incomplete and/or have duplicate data, as chunks may migrate while recording backups.如果在捕获备份时均衡器处于活动状态,则备份工件可能不完整和/或具有重复数据,因为记录备份时块可能会迁移。
In this procedure, you will stop the cluster balancer and take a backup up of the config database, and then take backups of each shard in the cluster using a file-system snapshot tool. 在此过程中,您将停止群集平衡器并备份配置数据库,然后使用文件系统快照工具备份群集中的每个分片。If you need an exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the file system snapshots; otherwise the snapshot will only approximate a moment in time.如果需要系统的即时快照,则需要在拍摄文件系统快照之前停止所有应用程序写入;否则,快照将仅在一个时间点左右。
For approximate point-in-time snapshots, you can minimize the impact on the cluster by taking the backup from a secondary member of each replica set shard.对于近似的时间点快照,可以通过从每个副本集分片的一个辅助成员进行备份,将对群集的影响降至最低。
If the journal and data files are on the same logical volume, you can use a single point-in-time snapshot to capture a consistent copy of the data files.如果日志和数据文件位于同一逻辑卷上,则可以使用单个时间点快照来捕获数据文件的一致副本。
If the journal and data files are on different file systems, you must use 如果日志文件和数据文件位于不同的文件系统上,则必须使用db.fsyncLock()
and db.fsyncUnlock()
to ensure that the data files do not change, providing consistency for the purposes of creating backups.db.fsyncLock()
和db.fsyncUnlock()
,以确保数据文件不会更改,从而为创建备份提供一致性。
If your deployment depends on Amazon's Elastic Block Storage (EBS) with RAID configured within your instance, it is impossible to get a consistent state across all disks using the platform's snapshot tool. 如果您的部署依赖于Amazon的弹性块存储(EBS),并且在实例中配置了RAID,那么使用平台的快照工具不可能在所有磁盘上获得一致的状态。As an alternative, you can do one of the following:作为替代方案,您可以执行以下操作之一:
Flush all writes to disk and create a write lock to ensure consistent state during the backup process.刷新对磁盘的所有写入,并创建写入锁,以确保备份过程中的状态一致。
If you choose this option see Back up Instances with Journal Files on Separate Volume or without Journaling.如果选择“使用单独的日志文件备份”,请选择“使用单独的日志文件备份”选项。
Configure LVM to run and hold your MongoDB data files on top of the RAID within your system.将LVM配置为在系统中的RAID之上运行并保存MongoDB数据文件。
If you choose this option, perform the LVM backup operation described in Create a Snapshot.如果选择此选项,请执行创建快照中描述的LVM备份操作。
Connect 将mongosh
to a cluster mongos
instance. Use the sh.stopBalancer()
method to stop the balancer.mongosh
连接到群集mongos
实例。使用shstopBalancer()
方法停止均衡器。 If a balancing round is in progress, the operation waits for balancing to complete before stopping the balancer.如果正在进行一轮平衡,操作将等待平衡完成,然后停止均衡器。
use config sh.stopBalancer()
Starting in MongoDB 4.2, 从MongoDB 4.2开始,sh.stopBalancer()
also disables auto-splitting for the sharded cluster.sh.stopBalancer()
还禁用分片集群的自动拆分。
For more information, see the Disable the Balancer procedure.有关更多信息,请参阅禁用均衡器程序。
If your secondary does not have journaling enabled or its journal and data files are on different volumes, you must lock the secondary's 如果辅助服务器未启用日志记录,或者其日志和数据文件位于不同的卷上,则必须在捕获备份之前锁定辅助服务器的mongod
instance before capturing a backup.mongod
实例。
If your secondary has journaling enabled and its journal and data files are on the same volume, you may skip this step.如果辅助设备已启用日志记录,并且其日志和数据文件位于同一卷上,则可以跳过此步骤。
If your deployment requires this step, you must perform it on one secondary of each shard and one secondary of the config server replica set (CSRS).如果部署需要此步骤,则必须在每个分片的一个次要部分和配置服务器副本集(CSR)的一个次要部分上执行此步骤。
Ensure that the oplog has sufficient capacity to allow these secondaries to catch up to the state of the primaries after finishing the backup procedure. 确保oplog有足够的容量,使这些二级备份在完成备份过程后能够赶上初级备份的状态。See Oplog Size for more information.有关更多信息,请参阅Oplog Size。
For each shard replica set in the sharded cluster, confirm that the member has replicated data up to some control point. 对于分片集群中的每个分片副本集,确认成员已将数据复制到某个控制点。To verify, first connect 要进行验证,请首先将mongosh
to the shard primary and perform a write operation with "majority"
write concern on a control collection:mongosh
连接到shard primary,并在控件集合上执行"majority"
写操作:
use config db.BackupControl.findAndModify( { query: { _id: 'BackupControlDocument' }, update: { $inc: { counter : 1 } }, new: true, upsert: true, writeConcern: { w: 'majority', wtimeout: 15000 } } );
The operation should return the modified (or inserted) control document:操作应返回修改(或插入)的控制文件:
{ "_id" : "BackupControlDocument", "counter" : 1 }
Query the shard secondary member for the returned control document. Connect mongosh
to the shard secondary to lock and use db.collection.find()
to query for the control document:
rs.secondaryOk(); use config; db.BackupControl.find( { "_id" : "BackupControlDocument", "counter" : 1 } ).readConcern('majority');
If the secondary member contains the latest control document, it is safe to lock the member. 如果辅助成员包含最新的控制文档,则可以安全地锁定该成员。Otherwise, wait until the member contains the document or select a different secondary member that contains the latest control document.否则,请等待成员包含该文档,或选择包含最新控制文档的其他辅助成员。
To lock the secondary member, run 要锁定辅助成员,请在该成员上运行db.fsyncLock()
on the member:db.fsyncLock()
:
db.fsyncLock()
If locking a secondary of the CSRS, confirm that the member has replicated data up to some control point. 如果锁定CSR的一个辅助,请确认该成员已将数据复制到某个控制点。To verify, first connect 要进行验证,首先将mongosh
to the CSRS primary and perform a write operation with "majority"
write concern on a control collection:mongosh
连接到CSRS主服务器,并在控制集合上执行具有"majority"
写入问题的写入操作:
use config db.BackupControl.findAndModify( { query: { _id: 'BackupControlDocument' }, update: { $inc: { counter : 1 } }, new: true, upsert: true, writeConcern: { w: 'majority', wtimeout: 15000 } } );
The operation should return the modified (or inserted) control document:操作应返回修改(或插入)的控制文件:
{ "_id" : "BackupControlDocument", "counter" : 1 }
Query the CSRS secondary member for the returned control document. Connect 向CSRS次要成员查询返回的控制文档。将mongosh
to the CSRS secondary to lock and use db.collection.find()
to query for the control document:mongosh
连接到CSRS secondary以锁定,并使用db.collection.find()
查询控制文档:
rs.secondaryOk(); use config; db.BackupControl.find( { "_id" : "BackupControlDocument", "counter" : 1 } ).readConcern('majority');
If the secondary member contains the latest control document, it is safe to lock the member. 如果辅助成员包含最新的控制文档,则可以安全地锁定该成员。Otherwise, wait until the member contains the document or select a different secondary member that contains the latest control document.否则,请等待成员包含该文档,或选择包含最新控制文档的其他辅助成员。
To lock the secondary member, run 要锁定辅助成员,请在该成员上运行db.fsyncLock()
on the member:db.fsyncLock()
:
db.fsyncLock()
Backing up a config server backs up the sharded cluster's metadata. You only need to back up one config server, as they all hold the same data. 您只需要备份一个配置服务器,因为它们都保存相同的数据。Perform this step against the locked CSRS secondary member.对锁定的CSRS次要成员执行此步骤。
To create a file-system snapshot of the config server, follow the procedure in Create a Snapshot.要创建配置服务器的文件系统快照,请按照创建快照中的过程操作。
If you locked a member of the replica set shards, perform this step against the locked secondary.如果锁定了复制集分片的成员,请对锁定的次分片执行此步骤。
You may back up the shards in parallel. 你可以同时备份分片。For each shard, create a snapshot, using the procedure in Back Up and Restore with Filesystem Snapshots.对于每个分片,使用文件系统快照备份和恢复中的过程创建一个快照。
If you locked any 如果您锁定了任何mongod
instances to capture the backup, unlock them.mongod
实例以捕获备份,请解锁它们。
To unlock the replica set members, use 要解锁副本集成员,请在db.fsyncUnlock()
method in mongosh
.mongosh
中使用db.fsyncUnlock()
方法。
db.fsyncUnlock()
To re-enable to balancer, connect 要重新启用均衡器,请将mongosh
to a mongos
instance and run sh.startBalancer()
.mongosh
连接到mongos
实例,然后运行sh.startBalancer()
。
sh.startBalancer()
Starting in MongoDB 4.2, 从MongoDB 4.2开始,sh.startBalancer()
also enables auto-splitting for the sharded cluster.sh.startBalancer()
还为分片集群启用自动拆分。