Docs HomeMongoDB Manual

Replication复制

A replica set in MongoDB is a group of mongod processes that maintain the same data set. MongoDB中的副本集是一组维护相同数据集的mongod进程。Replica sets provide redundancy and high availability, and are the basis for all production deployments. 副本集提供冗余和高可用性,是所有生产部署的基础。This section introduces replication in MongoDB as well as the components and architecture of replica sets. 本节介绍MongoDB中的复制,以及副本集的组件和体系结构。The section also provides tutorials for common tasks related to replica sets.本节还提供了与副本集相关的常见任务的教程。

Redundancy and Data Availability冗余和数据可用性

Replication provides redundancy and increases data availability. 复制提供了冗余并提高了数据可用性With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server.在不同的数据库服务器上有多个数据副本的情况下,复制提供了一定程度的容错能力,以防止丢失单个数据库服务器。

In some cases, replication can provide increased read capacity as clients can send read operations to different servers. 在某些情况下,由于客户端可以将读取操作发送到不同的服务器,因此复制可以提供更大的读取容量。Maintaining copies of data in different data centers can increase data locality and availability for distributed applications. 在不同的数据中心维护数据拷贝可以提高分布式应用程序的数据位置和可用性。You can also maintain additional copies for dedicated purposes, such as disaster recovery, reporting, or backup.您还可以为专用目的维护额外的拷贝,例如灾难恢复、报告或备份。

Replication in MongoDBMongoDB中的复制

A replica set is a group of mongod instances that maintain the same data set. 副本集是一组维护相同数据集的mongod实例。A replica set contains several data bearing nodes and optionally one arbiter node. 副本集包含几个数据承载节点和可选的一个仲裁器节点。Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes.在数据承载节点中,只有一个成员被视为主节点,而其他节点被视为次节点。

The primary node receives all write operations. 主节点接收所有写入操作。A replica set can have only one primary capable of confirming writes with { w: "majority" } write concern; although in some circumstances, another mongod instance may transiently believe itself to also be primary. 一个副本集只能有一个主副本,它能够通过{ w: "majority" }写入关注来确认写入;尽管在某些情况下,另一个mongod实例可能会暂时相信自己也是主要的。[1] The primary records all changes to its data sets in its operation log, i.e. oplog. 主服务器将对其数据集的所有更改记录在其操作日志(即oplog)中。For more information on primary node operation, see Replica Set Primary.有关主节点操作的详细信息,请参阅副本集Primary

Diagram of default routing of reads and writes to the primary.

The secondaries replicate the primary's oplog and apply the operations to their data sets such that the secondaries' data sets reflect the primary's data set. secondary复制primary的操作日志,并将操作应用于其数据集,以便辅助设备的数据集反映主设备的数据集中。If the primary is unavailable, an eligible secondary will hold an election to elect itself the new primary. 如果初选不可用,符合条件的中学将举行选举,选举自己为新的初选。For more information on secondary members, see Replica Set Secondary Members.有关辅助成员的详细信息,请参阅副本集Secondary成员

Diagram of a 3 member replica set that consists of a primary and two secondaries.

In some circumstances (such as you have a primary and a secondary but cost constraints prohibit adding another secondary), you may choose to add a mongod instance to a replica set as an arbiter. 在某些情况下(例如,您有一个主实例和一个辅助实例,但成本限制禁止添加另一个辅助),您可以选择将mongod实例添加到副本集中作为仲裁器An arbiter participates in elections but does not hold data (i.e. does not provide data redundancy). 仲裁器参与选举但不保存数据(即不提供数据冗余)。For more information on arbiters, see Replica Set Arbiter.有关仲裁器的更多信息,请参阅副本集仲裁器

Diagram of a replica set that consists of a primary, a secondary, and an arbiter.

An arbiter will always be an arbiter whereas a primary may step down and become a secondary and a secondary may become the primary during an election.仲裁器将始终是仲裁器,而在选举期间,primary可能会下台并成为secondary,secondary可能成为primary。

Asynchronous Replication异步复制

Secondaries replicate the primary's oplog and apply the operations to their data sets asynchronously. 辅助对象复制主对象的操作日志,并将操作异步应用于其数据集。By having the secondaries' data sets reflect the primary's data set, the replica set can continue to function despite the failure of one or more members.通过让辅助数据集反映主数据集,副本集可以在一个或多个成员出现故障的情况下继续运行。

For more information on replication mechanics, see Replica Set Oplog and Replica Set Data Synchronization.有关复制机制的更多信息,请参阅副本集操作日志副本集数据同步

Slow Operations慢速操作

Starting in version 4.2, secondary members of a replica set now log oplog entries that take longer than the slow operation threshold to apply. 从4.2版开始,副本集的辅助成员现在会记录应用时间超过慢速操作阈值的oplog条目These slow oplog messages:这些慢速操作日志消息:

  • Are logged for the secondaries in the diagnostic log.诊断日志中为辅助设备记录。
  • Are logged under the REPL component with the text applied op: <oplog entry> took <num>ms.REPL组件下记录,并应用文本applied op: <oplog entry> took <num>ms
  • Do not depend on the log levels (either at the system or component level)不依赖于日志级别(在系统或组件级别)
  • Do not depend on the profiling level.不要依赖于分析级别。
  • May be affected by slowOpSampleRate, depending on your MongoDB version:可能会受到slowOpSampleRate的影响,具体取决于您的MongoDB版本:

    • In MongoDB 4.2, these slow oplog entries are not affected by the slowOpSampleRate. MongoDB logs all slow oplog entries regardless of the sample rate.在MongoDB 4.2中,这些慢速操作日志条目不受slowOpSampleRate的影响。MongoDB记录所有慢速操作日志条目,而不管采样率如何。
    • In MongoDB 4.4 and later, these slow oplog entries are affected by the slowOpSampleRate.在MongoDB 4.4及更高版本中,这些慢速操作日志条目受到slowOpSampleRate的影响。

The profiler does not capture slow oplog entries.探查器未捕获慢速操作日志项。

Replication Lag and Flow Control复制滞后和流量控制

Replication lag refers to the amount of time that it takes to copy (i.e. replicate) a write operation on the primary to a secondary. Some small delay period may be acceptable, but significant problems emerge as replication lag grows, including building cache pressure on the primary.复制滞后是指将primary上的写操作复制(即复制)到secondary所需的时间。一些小的延迟期可能是可以接受的,但随着复制滞后的增加,会出现重大问题,包括在主服务器上构建缓存压力。

Starting in MongoDB 4.2, administrators can limit the rate at which the primary applies its writes with the goal of keeping the majority committed lag under a configurable maximum value flowControlTargetLagSeconds.从MongoDB 4.2开始,管理员可以限制主应用写入的速率,目的是将大多数提交的延迟保持在可配置的最大值flowControlTargetLagSeconds之下。

By default, flow control is enabled.默认情况下,流量控制处于enabled状态。

Note

For flow control to engage, the replica set/sharded cluster must have: featureCompatibilityVersion (fCV) of 4.2 and read concern majority enabled. 要进行流控制,副本集/分片集群必须具有:featureCompatibilityVersion(fCV)4.2,并启用读取关注多数。That is, enabled flow control has no effect if fCV is not 4.2 or if read concern majority is disabled.也就是说,如果fCV不是4.2,或者如果读取关注majority被禁用,则启用的流量控制无效。

With flow control enabled, as the lag grows close to the flowControlTargetLagSeconds, writes on the primary must obtain tickets before taking locks to apply writes. 在启用流控制的情况下,随着滞后时间接近flowControlTargetLagSeconds,主服务器上的写入必须在获取锁以应用写入之前获得票证。By limiting the number of tickets issued per second, the flow control mechanism attempts to keep the lag under the target.通过限制每秒发出的票证数量,流控制机制试图将滞后保持在目标之下。

For more information, see Check the Replication Lag and Flow Control.有关详细信息,请参阅检查复制滞后流量控制

Automatic Failover自动故障切换

When a primary does not communicate with the other members of the set for more than the configured electionTimeoutMillis period (10 seconds by default), an eligible secondary calls for an election to nominate itself as the new primary. 当主要成员在超过配置的electionTimeoutMillis时间段(默认情况下为10秒)的时间内未与集合的其他成员通信时,符合条件的次要成员会要求进行选举以提名自己为新的主要成员。The cluster attempts to complete the election of a new primary and resume normal operations.集群试图完成新初选的选举并恢复正常运行。

Diagram of an election of a new primary. In a three member replica set with two secondaries, the primary becomes unreachable. The loss of a primary triggers an election where one of the secondaries becomes the new primary

The replica set cannot process write operations until the election completes successfully. 在选举成功完成之前,副本集无法处理写入操作。The replica set can continue to serve read queries if such queries are configured to run on secondaries while the primary is offline.如果将读取查询配置为在primary脱机时在secondary上运行,则副本集可以继续为读取查询提供服务。

The median time before a cluster elects a new primary should not typically exceed 12 seconds, assuming default replica configuration settings. 假设默认的副本配置设置,群集选择新主服务器之前的中间时间通常不应超过12秒。This includes time required to mark the primary as unavailable and call and complete an election. 这包括将初选标记为不可用、调用并完成选举所需的时间。You can tune this time period by modifying the settings.electionTimeoutMillis replication configuration option. 您可以通过修改settings.electionTimeoutMillis复制配置选项来调整此时间段。Factors such as network latency may extend the time required for replica set elections to complete, which in turn affects the amount of time your cluster may operate without a primary. 网络延迟等因素可能会延长副本集选举完成所需的时间,这反过来又会影响集群在没有主服务器的情况下运行的时间。These factors are dependent on your particular cluster architecture.这些因素取决于您的特定集群体系结构。

Lowering the electionTimeoutMillis replication configuration option from the default 10000 (10 seconds) can result in faster detection of primary failure. electionTimeoutMillis复制配置选项从默认的10000(10秒)降低,可以更快地检测主故障。However, the cluster may call elections more frequently due to factors such as temporary network latency even if the primary is otherwise healthy. 然而,由于诸如临时网络延迟之类的因素,即使主网络在其他方面是健康的,集群也可能更频繁地调用选举。This can result in increased rollbacks for w : 1 write operations.这可能会导致w:1写入操作的回滚增加。

Your application connection logic should include tolerance for automatic failovers and the subsequent elections. 您的应用程序连接逻辑应该包括对自动故障切换和后续选择的容忍度。MongoDB drivers can detect the loss of the primary and automatically retry certain write operations a single time, providing additional built-in handling of automatic failovers and elections:MongoDB驱动程序可以检测到主写入操作的丢失,并一次自动重试某些写入操作,从而提供对自动故障切换和选择的额外内置处理:

Compatible drivers enable retryable writes by default兼容的驱动程序默认情况下启用可重试写入

Starting in version 4.4, MongoDB provides mirrored reads to pre-warm electable secondary members' cache with the most recently accessed data. 从4.4版本开始,MongoDB为预热可选举辅助成员的缓存提供镜像读取,其中包含最近访问的数据。Pre-warming the cache of a secondary can help restore performance more quickly after an election.预先预热辅助缓存可以帮助在选举后更快地恢复性能。

To learn more about MongoDB’s failover process, see:要了解有关MongoDB故障转移过程的更多信息,请参阅:

Read Operations读取操作

Read Preference读取首选项

By default, clients read from the primary [1]; however, clients can specify a read preference to send read operations to secondaries.默认情况下,客户端从primary[1]读取;但是,客户端可以指定一个读取首选项,以便将读取操作发送到辅助设备。

Diagram of an application that uses read preference secondary.

Asynchronous replication to secondaries means that reads from secondaries may return data that does not reflect the state of the data on the primary.到辅助设备的异步复制意味着从辅助设备读取的数据可能会返回不反映主设备上数据状态的数据。

Multi-document transactions that contain read operations must use read preference primary. 包含读取操作的多文档事务必须使用读取首选项primaryAll operations in a given transaction must route to the same member.给定事务中的所有操作都必须路由到同一成员。

For information on reading from replica sets, see Read Preference.有关从副本集中读取的信息,请参阅读取首选项

Data Visibility数据可见性

Depending on the read concern, clients can see the results of writes before the writes are durable:根据读取关注的不同,客户端可以在写入持久之前看到写入的结果:

  • Regardless of a write's write concern, other clients using "local" or "available" read concern can see the result of a write operation before the write operation is acknowledged to the issuing client.无论写入的写入关注如何,使用"local""available"读取关注的其他客户端都可以在向发出请求的客户端确认写入操作之前看到写入操作的结果。
  • Clients using "local" or "available" read concern can read data which may be subsequently rolled back during replica set failovers.使用"local""available"读取关注的客户端可以读取数据,这些数据随后可能在副本集故障切换期间回滚。

For operations in a multi-document transaction, when a transaction commits, all data changes made in the transaction are saved and visible outside the transaction. 对于多文档事务中的操作,当事务提交时,事务中所做的所有数据更改都将保存并在事务外部可见。That is, a transaction will not commit some of its changes while rolling back others.也就是说,事务在回滚其他更改时不会提交某些更改。

Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.在事务提交之前,在事务中所做的数据更改在事务外部是不可见的。

However, when a transaction writes to multiple shards, not all outside read operations need to wait for the result of the committed transaction to be visible across the shards. 然而,当一个事务写入多个分片时,并不是所有的外部读取操作都需要等待提交的事务的结果在分片中可见。For example, if a transaction is committed and write 1 is visible on shard A but write 2 is not yet visible on shard B, an outside read at read concern "local" can read the results of write 1 without seeing write 2.例如,如果事务已提交,并且写1在分片a上可见,但写2在分片B上还不可见,则外部读取时关注"local"可以读取写1的结果,而不会看到写2。

For more information on read isolations, consistency and recency for MongoDB, see Read Isolation, Consistency, and Recency.有关MongoDB的读取隔离、一致性和最近性的更多信息,请参阅读取隔离、连贯性和最近度

Mirrored Reads镜像读取

Mirrored reads reduce the impact of primary elections following an outage or planned maintenance. 镜像读取减少了在中断或计划维护后初选的影响。After a failover in a replica set, the secondary that takes over as the new primary updates its cache as new queries come in. 在复制副本集中进行故障切换后,作为新主服务器接管的辅助服务器会随着新查询的到来而更新其缓存。While the cache is warming up performance can be impacted.缓存正在预热时,性能可能会受到影响。

Starting in version 4.4, mirrored reads pre-warm the caches of electable secondary replica set members. 从版本4.4开始,镜像读取预先预热可选择的secondary副本集成员的缓存。To pre-warm the caches of electable secondaries, the primary mirrors a sample of the supported operations it receives to electable secondaries.为了预热可选择的辅助设备的缓存,主设备将其接收到的支持操作的示例镜像到可选择的次要设备。

The size of the subset of electable secondary replica set members that receive mirrored reads can be configured with the mirrorReads parameter. See Enable/Disable Support for Mirrored Reads for further details.可以使用mirrorReads参数配置接收镜像读取的可选择的secondary副本集成员的子集的大小。有关更多详细信息,请参阅启用/禁用对镜像读取的支持

Note

Mirrored reads do not affect the primary's response to the client. 镜像读取不会影响主服务器对客户端的响应。The reads that the primary mirrors to secondaries are "fire-and-forget" operations. The primary doesn't await responses.读取到次级的主镜像是“即发即弃”操作。初选并不等待回应。

Supported Operations支持的操作

Mirrored reads support the following operations:镜像读取支持以下操作:

  • count
  • distinct
  • find
  • findAndModify (Specifically, the filter is sent as a mirrored read)(具体而言,筛选器是作为镜像读取发送的)
  • update (Specifically, the filter is sent as a mirrored read)(具体而言,筛选器是作为镜像读取发送的)

Enable/Disable Support for Mirrored Reads启用/禁用对镜像读取的支持

Starting in MongoDB 4.4, mirrored reads are enabled by default and use a default sampling rate of 0.01. 从MongoDB 4.4开始,镜像读取默认启用,并使用0.01的默认采样率To disable mirrored reads, set the mirrorReads parameter to { samplingRate: 0.0 }:要禁用镜像读取,请将mirrorReads参数设置为{ samplingRate: 0.0 }

db.adminCommand( {
setParameter: 1,
mirrorReads: { samplingRate: 0.0 }
} )

With a sampling rate greater than 0.0, the primary mirrors supported reads to a subset of electable secondaries. 在采样率大于0.0的情况下,primary镜像支持可选择的secondary的子集的读取。With a sampling rate of 0.01, the primary mirrors one percent of the supported reads it receives to each electable secondary.在采样率为0.01的情况下,主设备将其接收到的支持读取的1%镜像到每个可选择的辅助设备。

Consider a replica set that consists of one primary and two electable secondaries. 考虑一个由一个主副本和两个可选举的辅助副本组成的副本集。If the primary receives 1000 operations that can be mirrored and the sampling rate is 0.01, the primary sends about 10 reads to electable secondaries. 如果主设备接收到1000个可以镜像的操作,并且采样率为0.01,则主设备向可选择的辅助设备发送大约10个读取。Each electable secondary receives only a fraction of the 10 reads. 每一个可选择的二级只能接收10次读取中的一小部分。Each read that is mirrored, is sent to a randomly chosen non-empty selection of electable secondaries.每个镜像的读取都被发送到随机选择的可选择的非空次级。

Change the Sampling Rate for Mirrored Reads更改镜像读取的采样率

To change the sampling rate for mirrored reads, set the mirrorReads parameter to a number between 0.0 and 1.0:要更改镜像读取的采样率,请将mirrorReads参数设置为0.01.0之间的数字:

  • A sampling rate of 0.0 disables mirrored reads.采样率为0.0将禁用镜像读取。
  • A sampling rate of a number between 0.0 and 1.0 results in the primary forwarding a random sample of the supported reads at the specified sample rate to electable secondaries.0.01.0之间的数字的采样率导致主设备以指定的采样率将支持读取的随机样本转发到可选择的辅助设备。
  • A sampling rate of 1.0 results in the primary forwarding all supported reads to electable secondaries.1.0的采样率会导致主服务器将所有支持的读取转发到可选择的辅助服务器。

For details, see mirrorReads.有关详细信息,请参阅mirrorReads

Mirrored Reads Metrics镜像读取度量

Starting in MongoDB 4.4, the serverStatus command and the db.serverStatus() shell method return mirroredReads metrics if you specify the field in the operation:从MongoDB 4.4开始,如果在操作中指定字段,serverStatus命令和db.serverStatus()shell方法将返回mirroredReads度量:

db.serverStatus( { mirroredReads: 1 } )

Transactions事务

Starting in MongoDB 4.0, multi-document transactions are available for replica sets.从MongoDB 4.0开始,多文档事务可用于副本集。

Multi-document transactions that contain read operations must use read preference primary. All operations in a given transaction must route to the same member.包含读取操作的多文档事务必须使用读取首选项primary。给定事务中的所有操作都必须路由到同一成员。

Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.在事务提交之前,在事务中所做的数据更改在事务外部是不可见的。

However, when a transaction writes to multiple shards, not all outside read operations need to wait for the result of the committed transaction to be visible across the shards. 然而,当一个事务写入多个分片时,并不是所有的外部读取操作都需要等待提交的事务的结果在分片中可见。For example, if a transaction is committed and write 1 is visible on shard A but write 2 is not yet visible on shard B, an outside read at read concern "local" can read the results of write 1 without seeing write 2.例如,如果事务已提交,并且写1在分片a上可见,但写2在分片B上还不可见,则外部读取时关注"local"可以读取写1的结果,而不会看到写2。

Change Streams更改流

Starting in MongoDB 3.6, change streams are available for replica sets and sharded clusters. 从MongoDB 3.6开始,变更流可用于副本集和分片集群。Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. 更改流允许应用程序访问实时数据更改,而不会带来跟踪操作日志的复杂性和风险。Applications can use change streams to subscribe to all data changes on a collection or collections.应用程序可以使用更改流订阅集合上的所有数据更改。

Additional Features其他功能

Replica sets provide a number of options to support application needs. 副本集提供了许多选项来支持应用程序的需要。For example, you may deploy a replica set with members in multiple data centers, or control the outcome of elections by adjusting the members[n].priority of some members. 例如,您可以使用多个数据中心成员部署副本集,或者通过调整某些成员的members[n].priority来控制选举结果。Replica sets also support dedicated members for reporting, disaster recovery, or backup functions.副本集还支持专用成员执行报告、灾难恢复或备份功能。

See Priority 0 Replica Set Members, Hidden Replica Set Members and Delayed Replica Set Members for more information.有关详细信息,请参阅优先级为0的副本集成员隐藏副本集成员延迟副本集成员

[1](1, 2) In some circumstances, two nodes in a replica set may transiently believe that they are the primary, but at most, one of them will be able to complete writes with { w: "majority" } write concern. 某些情况下,副本集中的两个节点可能会暂时认为它们是主节点,但最多其中一个节点能够完成具有{ w: "majority" }写入关注的写入。The node that can complete { w: "majority" } writes is the current primary, and the other node is a former primary that has not yet recognized its demotion, typically due to a network partition. 可以完成{ w: "majority" }写入的节点是当前主节点,而另一个节点是尚未识别其降级的前主节点,通常是由于网络分区When this occurs, clients that connect to the former primary may observe stale data despite having requested read preference primary, and new writes to the former primary will eventually roll back.当这种情况发生时,连接到前一个主服务器的客户端可能会观察到过时的数据,尽管请求了读取首选项primary,并且对前一个primary的新写入最终会回滚。