Back Up a Sharded Cluster with File System Snapshots使用文件系统快照备份分片群集

On this page本页内容

Overview概述

This document describes a procedure for taking a backup of all components of a sharded cluster. 本文档描述了对分片群集的所有组件进行备份的过程。This procedure uses file system snapshots to capture a copy of the mongod instance.此过程使用文件系统快照捕获mongod实例的副本。

Important重要

To capture a point-in-time backup from a sharded cluster you must stop all writes to the cluster. 要从分片群集捕获时间点备份,必须停止对群集的所有写入。On a running production system, you can only capture an approximation of point-in-time snapshot.在运行的生产系统上,只能捕获时间点快照的近似值。

For more information on backups in MongoDB and backups of sharded clusters in particular, see MongoDB Backup Methods and Backup and Restore Sharded Clusters.有关MongoDB中的备份,尤其是分片群集备份的更多信息,请参阅MongoDB备份方法以及备份和恢复分片群集

Considerations考虑因素

Transactions Across Shards跨分片的事务

In MongoDB 4.2+, you cannot use file system snapshots for backups that involve transactions across shards because those backups do not maintain atomicity. 在MongoDB 4.2+中,对于涉及跨分片事务的备份,不能使用文件系统快照,因为这些备份不保持原子性。Instead, use one of the following to perform the backups:相反,请使用以下方法之一执行备份:

Encrypted Storage Engine (MongoDB Enterprise Only)加密存储引擎(仅限MongoDB Enterprise)

For encrypted storage engines that use AES256-GCM encryption mode, AES256-GCM requires that every process use a unique counter block value with the key.对于使用AES256-GCM加密模式的加密存储引擎,AES256-GCM要求每个进程对密钥使用唯一的计数器块值。

For encrypted storage engine configured with AES256-GCM cipher:对于配置了AES256-GCM密码的加密存储引擎

  • Restoring from Hot Backup从热备份恢复
    Starting in 4.2, if you restore from files taken via "hot" backup (i.e. the mongod is running), MongoDB can detect "dirty" keys on startup and automatically rollover the database key to avoid IV (Initialization Vector) reuse.从4.2开始,如果您从通过“热”备份(即mongod正在运行)获取的文件进行恢复,MongoDB可以在启动时检测到“脏”键,并自动滚动数据库键以避免IV(初始化向量)重用。
  • Restoring from Cold Backup从冷备份恢复

    However, if you restore from files taken via "cold" backup (i.e. the mongod is not running), MongoDB cannot detect "dirty" keys on startup, and reuse of IV voids confidentiality and integrity guarantees.但是,如果您从通过“冷”备份(即mongod未运行)获取的文件进行恢复,MongoDB在启动时无法检测到“脏”密钥,并且重复使用IV会导致机密性和完整性保证失效。

    Starting in 4.2, to avoid the reuse of the keys after restoring from a cold filesystem snapshot, MongoDB adds a new command-line option --eseDatabaseKeyRollover. 从4.2开始,为了避免在从冷文件系统快照恢复后重用密钥,MongoDB添加了一个新的命令行选项--eseDatabaseKeyRolloverWhen started with the --eseDatabaseKeyRollover option, the mongod instance rolls over the database keys configured with AES256-GCM cipher and exits.当使用--eseDatabaseKeyRollover选项启动时,mongod实例将滚动使用AES256-GCM密码配置的数据库密钥并退出。

Tip提示
  • In general, if using filesystem based backups for MongoDB Enterprise 4.2+, use the "hot" backup feature, if possible.通常,如果MongoDB Enterprise 4.2+使用基于文件系统的备份,请尽可能使用“热”备份功能。
  • For MongoDB Enterprise versions 4.0 and earlier, if you use AES256-GCM encryption mode, do not make copies of your data files or restore from filesystem snapshots ("hot" or "cold").对于MongoDB Enterprise versions 4.0及更早版本,如果使用AES256-GCM加密模式,请勿复制数据文件或从文件系统快照进行恢复(“热”或“冷”)。

Balancer均衡器

It is essential that you stop the balancer before capturing a backup.在捕获备份之前,必须停止均衡器

If the balancer is active while you capture backups, the backup artifacts may be incomplete and/or have duplicate data, as chunks may migrate while recording backups.如果在捕获备份时均衡器处于活动状态,则备份工件可能不完整和/或具有重复数据,因为记录备份时可能会迁移。

Precision精确

In this procedure, you will stop the cluster balancer and take a backup up of the config database, and then take backups of each shard in the cluster using a file-system snapshot tool. 在此过程中,您将停止群集平衡器并备份配置数据库,然后使用文件系统快照工具备份群集中的每个分片。If you need an exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the file system snapshots; otherwise the snapshot will only approximate a moment in time.如果需要系统的即时快照,则需要在拍摄文件系统快照之前停止所有应用程序写入;否则,快照将仅在一个时间点左右。

For approximate point-in-time snapshots, you can minimize the impact on the cluster by taking the backup from a secondary member of each replica set shard.对于近似的时间点快照,可以通过从每个副本集分片的一个辅助成员进行备份,将对群集的影响降至最低。

Consistency一致性

If the journal and data files are on the same logical volume, you can use a single point-in-time snapshot to capture a consistent copy of the data files.如果日志和数据文件位于同一逻辑卷上,则可以使用单个时间点快照来捕获数据文件的一致副本。

If the journal and data files are on different file systems, you must use db.fsyncLock() and db.fsyncUnlock() to ensure that the data files do not change, providing consistency for the purposes of creating backups.如果日志文件和数据文件位于不同的文件系统上,则必须使用db.fsyncLock()db.fsyncUnlock(),以确保数据文件不会更改,从而为创建备份提供一致性。

Snapshots with Amazon EBS in a RAID 10 Configuration在RAID 10配置中使用Amazon EBS的快照

If your deployment depends on Amazon's Elastic Block Storage (EBS) with RAID configured within your instance, it is impossible to get a consistent state across all disks using the platform's snapshot tool. 如果您的部署依赖于Amazon的弹性块存储(EBS),并且在实例中配置了RAID,那么使用平台的快照工具不可能在所有磁盘上获得一致的状态。As an alternative, you can do one of the following:作为替代方案,您可以执行以下操作之一:

  • Flush all writes to disk and create a write lock to ensure consistent state during the backup process.刷新对磁盘的所有写入,并创建写入锁,以确保备份过程中的状态一致。

    If you choose this option see Back up Instances with Journal Files on Separate Volume or without Journaling.如果选择“使用单独的日志文件备份”,请选择“使用单独的日志文件备份”选项。

  • Configure LVM to run and hold your MongoDB data files on top of the RAID within your system.LVM配置为在系统中的RAID之上运行并保存MongoDB数据文件。

    If you choose this option, perform the LVM backup operation described in Create a Snapshot.如果选择此选项,请执行创建快照中描述的LVM备份操作。

Procedure程序

1

Disable the balancer.禁用均衡器。

Connect mongosh to a cluster mongos instance. Use the sh.stopBalancer() method to stop the balancer.mongosh连接到群集mongos实例。使用shstopBalancer()方法停止均衡器。 If a balancing round is in progress, the operation waits for balancing to complete before stopping the balancer.如果正在进行一轮平衡,操作将等待平衡完成,然后停止均衡器。

use config
sh.stopBalancer()

Starting in MongoDB 4.2, sh.stopBalancer() also disables auto-splitting for the sharded cluster.从MongoDB 4.2开始,sh.stopBalancer()还禁用分片集群的自动拆分。

For more information, see the Disable the Balancer procedure.有关更多信息,请参阅禁用均衡器程序。

2

If necessary, lock one secondary member of each replica set.如有必要,锁定每个副本集的一个辅助成员。

If your secondary does not have journaling enabled or its journal and data files are on different volumes, you must lock the secondary's mongod instance before capturing a backup.如果辅助服务器未启用日志记录,或者其日志和数据文件位于不同的卷上,则必须在捕获备份之前锁定辅助服务器的mongod实例。

If your secondary has journaling enabled and its journal and data files are on the same volume, you may skip this step.如果辅助设备已启用日志记录,并且其日志和数据文件位于同一卷上,则可以跳过此步骤。

Important重要

If your deployment requires this step, you must perform it on one secondary of each shard and one secondary of the config server replica set (CSRS).如果部署需要此步骤,则必须在每个分片的一个次要部分和配置服务器副本集(CSR)的一个次要部分上执行此步骤。

Ensure that the oplog has sufficient capacity to allow these secondaries to catch up to the state of the primaries after finishing the backup procedure. 确保oplog有足够的容量,使这些二级备份在完成备份过程后能够赶上初级备份的状态。See Oplog Size for more information.有关更多信息,请参阅Oplog Size

Lock shard replica set secondary.锁定分片副本集。

For each shard replica set in the sharded cluster, confirm that the member has replicated data up to some control point. 对于分片集群中的每个分片副本集,确认成员已将数据复制到某个控制点。To verify, first connect mongosh to the shard primary and perform a write operation with "majority" write concern on a control collection:要进行验证,请首先将mongosh连接到shard primary,并在控件集合上执行"majority"写操作:

use config
db.BackupControl.findAndModify(
   {
     query: { _id: 'BackupControlDocument' },
     update: { $inc: { counter : 1 } },
     new: true,
     upsert: true,
     writeConcern: { w: 'majority', wtimeout: 15000 }
   }
);

The operation should return the modified (or inserted) control document:操作应返回修改(或插入)的控制文件:

{ "_id" : "BackupControlDocument", "counter" : 1 }

Query the shard secondary member for the returned control document. Connect mongosh to the shard secondary to lock and use db.collection.find() to query for the control document:

rs.secondaryOk();
use config;
db.BackupControl.find(
   { "_id" : "BackupControlDocument", "counter" : 1 }
).readConcern('majority');

If the secondary member contains the latest control document, it is safe to lock the member. 如果辅助成员包含最新的控制文档,则可以安全地锁定该成员。Otherwise, wait until the member contains the document or select a different secondary member that contains the latest control document.否则,请等待成员包含该文档,或选择包含最新控制文档的其他辅助成员。

To lock the secondary member, run db.fsyncLock() on the member:要锁定辅助成员,请在该成员上运行db.fsyncLock()

db.fsyncLock()

Lock config server replica set secondary.锁定配置服务器副本集辅助。

If locking a secondary of the CSRS, confirm that the member has replicated data up to some control point. 如果锁定CSR的一个辅助,请确认该成员已将数据复制到某个控制点。To verify, first connect mongosh to the CSRS primary and perform a write operation with "majority" write concern on a control collection:要进行验证,首先将mongosh连接到CSRS主服务器,并在控制集合上执行具有"majority"写入问题的写入操作:

use config
db.BackupControl.findAndModify(
   {
     query: { _id: 'BackupControlDocument' },
     update: { $inc: { counter : 1 } },
     new: true,
     upsert: true,
     writeConcern: { w: 'majority', wtimeout: 15000 }
   }
);

The operation should return the modified (or inserted) control document:操作应返回修改(或插入)的控制文件:

{ "_id" : "BackupControlDocument", "counter" : 1 }

Query the CSRS secondary member for the returned control document. Connect mongosh to the CSRS secondary to lock and use db.collection.find() to query for the control document:向CSRS次要成员查询返回的控制文档。将mongosh连接到CSRS secondary以锁定,并使用db.collection.find()查询控制文档:

rs.secondaryOk();
use config;
db.BackupControl.find(
   { "_id" : "BackupControlDocument", "counter" : 1 }
).readConcern('majority');

If the secondary member contains the latest control document, it is safe to lock the member. 如果辅助成员包含最新的控制文档,则可以安全地锁定该成员。Otherwise, wait until the member contains the document or select a different secondary member that contains the latest control document.否则,请等待成员包含该文档,或选择包含最新控制文档的其他辅助成员。

To lock the secondary member, run db.fsyncLock() on the member:要锁定辅助成员,请在该成员上运行db.fsyncLock()

db.fsyncLock()
3

Back up one of the config servers.备份其中一个配置服务器。

Note注意

Backing up a config server backs up the sharded cluster's metadata. You only need to back up one config server, as they all hold the same data. 您只需要备份一个配置服务器,因为它们都保存相同的数据。Perform this step against the locked CSRS secondary member.对锁定的CSRS次要成员执行此步骤。

To create a file-system snapshot of the config server, follow the procedure in Create a Snapshot.要创建配置服务器的文件系统快照,请按照创建快照中的过程操作。

4

Back up a replica set member for each shard.为每个分片备份一个副本集成员。

If you locked a member of the replica set shards, perform this step against the locked secondary.如果锁定了复制集分片的成员,请对锁定的次分片执行此步骤。

You may back up the shards in parallel. 你可以同时备份分片。For each shard, create a snapshot, using the procedure in Back Up and Restore with Filesystem Snapshots.对于每个分片,使用文件系统快照备份和恢复中的过程创建一个快照。

5

Unlock all locked replica set members.解锁所有锁定的副本集成员。

If you locked any mongod instances to capture the backup, unlock them.如果您锁定了任何mongod实例以捕获备份,请解锁它们。

To unlock the replica set members, use db.fsyncUnlock() method in mongosh.要解锁副本集成员,请在mongosh中使用db.fsyncUnlock()方法。

db.fsyncUnlock()
6

Enable the balancer.启用平衡器。

To re-enable to balancer, connect mongosh to a mongos instance and run sh.startBalancer().要重新启用均衡器,请将mongosh连接到mongos实例,然后运行sh.startBalancer()

sh.startBalancer()

Starting in MongoDB 4.2, sh.startBalancer() also enables auto-splitting for the sharded cluster.从MongoDB 4.2开始,sh.startBalancer()还为分片集群启用自动拆分。

←  Backup and Restore Sharded ClustersBack Up a Sharded Cluster with Database Dumps →