Database Manual

Replication复制

A replica set in MongoDB is a group of mongod processes that maintain the same data set. MongoDB中的副本集是一组维护相同数据集的mongod进程。Replica sets provide redundancy and high availability, and are the basis for all production deployments. 副本集提供冗余和高可用性,是所有生产部署的基础。This section introduces replication in MongoDB as well as the components and architecture of replica sets. The section also provides tutorials for common tasks related to replica sets.本节介绍MongoDB中的复制以及副本集的组件和架构。本节还提供了与副本集相关的常见任务的教程。

You can deploy a replica set in the UI for deployments hosted in MongoDB Atlas.您可以在MongoDB Atlas中托管的部署的UI中部署副本集

Redundancy and Data Availability冗余和数据可用性

Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server.复制提供了冗余并提高了数据可用性。由于不同数据库服务器上有多个数据副本,复制提供了对单个数据库服务器丢失的容错级别。

In some cases, replication can provide increased read capacity as clients can send read operations to different servers. Maintaining copies of data in different data centers can increase data locality and availability for distributed applications. 在某些情况下,复制可以提供更大的读取容量,因为客户端可以向不同的服务器发送读取操作。在不同的数据中心维护数据副本可以提高分布式应用程序的数据局部性和可用性。You can also maintain additional copies for dedicated purposes, such as disaster recovery, reporting, or backup.您还可以为特定目的维护其他副本,如灾难恢复、报告或备份。

Replication in MongoDBMongoDB中的复制

A replica set is a group of mongod instances that maintain the same data set. 副本集是一组维护相同数据集的mongod实例。A replica set contains several data bearing nodes and optionally one arbiter node. Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes.副本集包含多个数据承载节点和可选的一个仲裁器节点。在数据承载节点中,只有一个成员被视为主节点,而其他节点被视为次节点。

Warning

Each replica set node must belong to one, and only one, replica set. Replica set nodes cannot belong to more than one replica set.每个副本集节点必须属于一个且只能属于一个副本集。副本集节点不能属于多个副本集。

The primary node receives all write operations. 主节点接收所有写入操作。A replica set can have only one primary capable of confirming writes with { w: "majority" } write concern; although in some circumstances, another mongod instance may transiently believe itself to also be primary. [1] The primary records all changes to its data sets in its operation log, i.e. oplog.一个副本集只能有一个主副本能够确认具有{ w: "majority" }写入关注的写入;尽管在某些情况下,另一个蒙古人可能会暂时认为自己也是主要的。[1]主服务器在其操作日志(即oplog)中记录对其数据集的所有更改。 For more information on primary node operation, see Replica Set Primary.有关主节点操作的详细信息,请参阅副本集主节点

Diagram of default routing of reads and writes to the primary.

The secondaries replicate the primary's oplog and apply the operations to their data sets such that the secondaries' data sets reflect the primary's data set. 次级服务器复制初级服务器的oplog,并将操作应用于其数据集,使次级服务器的数据集反映初级服务器的数据集中。If the primary is unavailable, an eligible secondary will hold an election to elect itself the new primary. For more information on secondary members, see Replica Set Secondary Members.如果初选不可用,符合条件的中学将举行选举,选出自己的新初选。有关次要成员的详细信息,请参阅副本集次要成员

Diagram of a 3 member replica set that consists of a primary and two secondaries.

In some circumstances (such as you have a primary and a secondary but cost constraints prohibit adding another secondary), you may choose to add a mongod instance to a replica set as an arbiter. 在某些情况下(例如您有一个主实例和一个辅助实例,但成本限制禁止添加另一个辅助),您可以选择将mongod实例添加到副本集中作为仲裁器An arbiter participates in elections but does not hold data (i.e. does not provide data redundancy). 仲裁器参与选举,但不保存数据(即不提供数据冗余)。For more information on arbiters, see Replica Set Arbiter.有关仲裁器的更多信息,请参阅副本集仲裁器

Diagram of a replica set that consists of a primary, a secondary, and an arbiter.

An arbiter will always be an arbiter whereas a primary may step down and become a secondary and a secondary may become the primary during an election.仲裁器将始终是仲裁者,而primary可能会下台并成为secondary,而secondary可能会在选举期间成为primary。

Asynchronous Replication异步复制

Secondaries replicate the primary's oplog and apply the operations to their data sets asynchronously. By having the secondaries' data sets reflect the primary's data set, the replica set can continue to function despite the failure of one or more members.次要人员复制主要人员的oplog,并将操作异步应用于他们的数据集。通过使次级数据集反映初级数据集,副本集可以在一个或多个成员发生故障的情况下继续运行。

For more information on replication mechanics, see Replica Set Oplog and Replica Set Data Synchronization.有关复制机制的更多信息,请参阅副本集操作日志副本集数据同步

Slow Operations慢操作

Secondary members of a replica set now log oplog entries that take longer than the slow operation threshold to apply. These slow oplog messages:副本集的次要成员现在记录的oplog条目的应用时间超过了慢操作阈值。这些缓慢的oplog消息:

  • Are logged for the secondaries in the diagnostic log.诊断日志中记录次级故障。
  • Are logged under the REPL component with the text applied op: <oplog entry> took <num>ms.REPL组件下记录,并应用文本applied op: <oplog entry> took <num>ms
  • Do not depend on the log levels (either at the system or component level)不依赖于日志级别(无论是在系统级别还是组件级别)
  • Do not depend on the profiling level.不要依赖于分析级别。
  • Are affected by slowOpSampleRate.slowOpSampleRate的影响。

The profiler does not capture slow oplog entries.分析器不会捕获慢速oplog条目。

Replication Lag and Flow Control复制延迟和流控制

Replication lag is a delay between an operation on the primary and the application of that operation from the oplog to the secondary. Some small delay period may be acceptable, but significant problems emerge as replication lag grows, including building cache pressure on the primary.复制延迟是指primary和从oplogsecondary的应用程序之间的延迟。一些小的延迟期可能是可以接受的,但随着复制延迟的增加,会出现重大问题,包括在primary上构建缓存压力。

Administrators can limit the rate at which the primary applies its writes with the goal of keeping the majority committed lag under a configurable maximum value flowControlTargetLagSeconds.管理员可以限制主应用其写入的速率,目的是将大多数提交的延迟保持在可配置的最大值flowControlTargetLagSeconds以下。

By default, flow control is enabled.默认情况下,启用流控制。

With flow control enabled, as the lag grows close to the flowControlTargetLagSeconds, writes on the primary must obtain tickets before taking locks to apply writes. 启用流控制后,随着延迟增长到接近flowControlTargetLagSeconds,primary上的写入必须在获取锁以应用写入之前获得票证。By limiting the number of tickets issued per second, the flow control mechanism attempts to keep the lag under the target.通过限制每秒发出的票证数量,流量控制机制试图将延迟控制在目标范围内。

For more information, see Check the Replication Lag and Flow Control.有关更多信息,请参阅检查复制延迟流控制

Automatic Failover自动故障切换

When a primary does not communicate with the other members of the set for more than the configured electionTimeoutMillis period (10 seconds by default), an eligible secondary calls for an election to nominate itself as the new primary. The cluster attempts to complete the election of a new primary and resume normal operations.当一个初选与该组其他成员的通信时间超过配置的electionTimeoutMillis(选举超时毫秒)期(默认为10秒)时,符合条件的二级初选会要求选举提名自己为新的初选。该小组试图完成新初选的选举并恢复正常运作。

Diagram of an election of a new primary. In a three member replica set with two secondaries, the primary becomes unreachable. The loss of a primary triggers an election where one of the secondaries becomes the new primary

The replica set cannot process write operations until the election completes successfully. The replica set can continue to serve read queries if such queries are configured to run on secondaries while the primary is offline.在选举成功完成之前,副本集无法处理写入操作。如果将读取查询配置为在主服务器脱机时在secondary上运行,则副本集可以继续为读取查询提供服务。

The median time before a cluster elects a new primary should not typically exceed 12 seconds, assuming default replica configuration settings. 假设默认副本配置设置,集群选择新主服务器之前的中位时间通常不应超过12秒。This includes time required to mark the primary as unavailable and call and complete an election. 这包括将初选标记为不可用以及呼叫并完成选举所需的时间。You can tune this time period by modifying the settings.electionTimeoutMillis replication configuration option. 您可以通过修改设置settings.electionTimeoutMillis复制配置选项来调整此时间段。Factors such as network latency may extend the time required for replica set elections to complete, which in turn affects the amount of time your cluster may operate without a primary. These factors are dependent on your particular cluster architecture.网络延迟等因素可能会延长完成副本集选举所需的时间,这反过来又会影响集群在没有主服务器的情况下运行的时间。这些因素取决于特定集群架构。

Lowering the electionTimeoutMillis replication configuration option from the default 10000 (10 seconds) can result in faster detection of primary failure. electionTimeoutMillis复制配置选项从默认值10000(10秒)降低可以更快地检测到主故障。However, the cluster may call elections more frequently due to factors such as temporary network latency even if the primary is otherwise healthy. 然而,由于临时网络延迟等因素,即使主网络在其他方面是健康的,集群也可能更频繁地调用选举。This can result in increased rollbacks for w : 1 write operations.这可能会导致w:1写入操作的回滚增加。

Your application connection logic should include tolerance for automatic failovers and the subsequent elections. 应用程序连接逻辑应包括对自动故障转移和后续选择的容忍度。MongoDB drivers can detect the loss of the primary and automatically retry certain write operations a single time, providing additional built-in handling of automatic failovers and elections:MongoDB驱动程序可以检测到主写入操作的丢失,并自动重试某些写入操作一次,提供额外的内置自动故障转移和选择处理

Compatible drivers enable retryable writes by default默认情况下,兼容的驱动程序允许可重试的写入

MongoDB provides mirrored reads to pre-warm electable secondary members' cache with the most recently accessed data. Pre-warming the cache of a secondary can help restore performance more quickly after an election.MongoDB为预热可选择的辅助成员缓存提供镜像读取,其中包含最近访问的数据。预热辅助缓存可以帮助在选举后更快地恢复性能。

To learn more about MongoDB's failover process, see:要了解有关MongoDB故障转移过程的更多信息,请参阅:

Read Operations读取操作

Read Preference读取首选项

By default, clients read from the primary [1]; however, clients can specify a read preference to send read operations to secondaries.默认情况下,客户端从主服务器读取[1];但是,客户端可以指定读取首选项,以将读取操作发送给次级服务器。

Diagram of an application that uses read preference secondary.

Asynchronous replication to secondaries means that reads from secondaries may return data that does not reflect the state of the data on the primary.异步复制到次服务器意味着从次服务器读取的数据可能不会反映主服务器上的数据状态。

Distributed transactions that contain read operations must use read preference primary. All operations in a given transaction must route to the same member.包含读取操作的分布式事务必须使用读取首选项primary。给定事务中的所有操作都必须路由到同一成员。

For information on reading from replica sets, see Read Preference.有关从副本集读取的信息,请参阅读取首选项

Data Visibility数据可见性

Depending on the read concern, clients can see the results of writes before the writes are durable:根据读取关注,客户端可以在写入持久之前看到写入结果:

  • Regardless of a write's write concern, other clients using "local" or "available" read concern can see the result of a write operation before the write operation is acknowledged to the issuing client.无论写入的写入关注如何,使用"local""available"读取关注的其他客户端都可以在向发出客户端确认写入操作之前看到写入操作的结果。
  • Clients using "local" or "available" read concern can read data which may be subsequently rolled back during replica set failovers.使用"local""available"读取关注的客户端可以读取数据,这些数据随后可能会在副本集故障转移期间回滚

For operations in a multi-document transaction, when a transaction commits, all data changes made in the transaction are saved and visible outside the transaction. That is, a transaction will not commit some of its changes while rolling back others.对于多文档事务中的操作,当事务提交时,事务中所做的所有数据更改都会被保存并在事务外部可见。也就是说,一个事务不会在回滚其他更改的同时提交其中的一些更改。

Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.在事务提交之前,事务中所做的数据更改在事务外部不可见。

However, when a transaction writes to multiple shards, not all outside read operations need to wait for the result of the committed transaction to be visible across the shards. 然而,当一个事务写入多个分片时,并非所有外部读取操作都需要等待提交事务的结果在分片之间可见。For example, if a transaction is committed and write 1 is visible on shard A but write 2 is not yet visible on shard B, an outside read at read concern "local" can read the results of write 1 without seeing write 2.例如,如果一个事务已提交,并且写1在分片a上可见,但写2在分片B上尚不可见,则外部读取关注"local"可以读取写1的结果,而看不到写2。

For more information on read isolations, consistency and recency for MongoDB, see Read Isolation, Consistency, and Recency.有关MongoDB的读取隔离、一致性和最近性的更多信息,请参阅读取隔离、稠度和最近性

Mirrored Reads镜像阅读

Mirrored reads reduce the impact of primary elections following an outage or planned maintenance. 镜像读取可以减少中断或计划维护后初选的影响。After a failover in a replica set, the secondary that takes over as the new primary updates its cache as new queries come in. 在副本集中进行故障转移后,作为新主服务器接管的次服务器会随着新查询的到来更新其缓存。While the cache is warming up performance can be impacted.缓存预热时,性能可能会受到影响。

Mirrored reads pre-warm the caches of electable secondary replica set members. 镜像读取对electable的secondary副本集成员的缓存进行预热。To pre-warm the caches of electable secondaries, the primary mirrors a sample of the supported operations it receives to electable secondaries.为了预热可选次级缓存,初级将其接收到的受支持操作的示例镜像到可选次级。

The size of the subset of electable secondary replica set members that receive mirrored reads can be configured with the mirrorReads parameter. 可以使用mirrorReads参数配置接收镜像读取的electable的secondary副本集成员子集的大小。See Enable/Disable Support for Mirrored Reads for further details.有关更多详细信息,请参阅启用/禁用镜像读取支持。

Note

Mirrored reads do not affect the primary's response to the client. The reads that the primary mirrors to secondaries are "fire-and-forget" operations. The primary doesn't await responses.镜像读取不会影响主服务器对客户端的响应。读取到次级的主镜像是“即发即弃”操作。初选不等待回应。

Targeted Mirrored Reads有针对性的镜像读取

Starting in MongoDB 8.2, you can selectively mirror read operations to specific servers that need their caches warmed up by tagging the nodes for read mirroring. Unlike general mirrored reads, targeted read mirroring allows you to target hidden nodes and mirror from both primary and secondary nodes.从MongoDB 8.2开始,您可以通过标记节点进行读取镜像,将读取操作选择性地镜像到需要预热缓存的特定服务器。与常规镜像读取不同,目标读取镜像允许您定位隐藏节点并从主节点和次节点进行镜像。

You can configure targeted mirrored reads using the targetedMirroring field in the mirrorReads parameter.您可以使用mirrorReads参数中的targetedMirroring字段配置目标镜像读取。

Supported Operations支持的操作

Mirrored reads support the following operations:镜像读取支持以下操作:

  • count
  • distinct
  • find
  • findAndModify (Specifically, the filter is sent as a mirrored read)(具体来说,筛选器作为镜像读取发送)
  • update (Specifically, the filter is sent as a mirrored read)(具体来说,筛选器作为镜像读取发送)

Enable/Disable Support for Mirrored Reads启用/禁用镜像读取支持

Mirrored reads are enabled by default and use a default sampling rate of 0.01. 默认情况下启用镜像读取,并使用默认采样率0.01To disable mirrored reads, set the mirrorReads parameter to { samplingRate: 0.0 }:要禁用镜像读取,请将mirrorReads参数设置为{ samplingRate: 0.0 }

db.adminCommand( {
setParameter: 1,
mirrorReads: { samplingRate: 0.0 }
} )

With a sampling rate greater than 0.0, the primary mirrors supported reads to a subset of electable secondaries. With a sampling rate of 0.01, the primary mirrors one percent of the supported reads it receives to a selection of electable secondaries.采样率大于0.0时,主镜像支持读取electable的secondary镜像子集。采样率为0.01时,主镜将其接收到的支持读数的1%镜像到可选择的次级镜上。

For example, consider a replica set that consists of one primary and two electable secondaries.例如,考虑一个由一个主副本和两个可选次副本组成的副本集。 If the primary receives 1000 operations that can be mirrored and the sampling rate is 0.01, the primary mirrors about 10 supported reads to electable secondaries. 如果主服务器接收1000个可以镜像的操作,并且采样率为0.01,则主服务器将大约10个支持的读取镜像到可选择的次服务器。Each electable secondary receives only a fraction of the 10 reads. The primary sends each mirrored read to a randomly chosen, non-empty selection of electable secondaries.每个可选择的中学只收到10次阅读中的一小部分。主服务器将每次镜像读取发送到随机选择的、非空的可选次服务器。

Change the Sampling Rate for Mirrored Reads更改镜像读取的采样率

To change the sampling rate for mirrored reads, set the mirrorReads parameter to a number between 0.0 and 1.0:要更改镜像读取的采样率,请将mirrorReads参数设置为0.01.0之间的数字:

  • A sampling rate of 0.0 disables mirrored reads.采样率为0.0会禁用镜像读取。
  • A sampling rate of a number between 0.0 and 1.0 results in the primary forwarding a random sample of the supported reads at the specified sample rate to electable secondaries.采样率在0.01.0之间的数字会导致主服务器以指定的采样率将支持的读取的随机样本转发给可选择的次服务器。
  • A sampling rate of 1.0 results in the primary forwarding all supported reads to electable secondaries.采样率为1.0时,主服务器会将所有支持的读取转发到可选择的次服务器。

For details, see mirrorReads.有关详细信息,请参阅mirrorReads

Mirrored Reads Metrics镜像读取指标

The serverStatus command and the db.serverStatus() shell method return mirroredReads metrics if you specify the field in the operation:如果在操作中指定了字段,则serverStatus命令和db.serverStatus()shell方法将返回mirroredReads指标:

db.serverStatus( { mirroredReads: 1 } )

Transactions事务

Multi-document transactions are available for replica sets.多文档事务可用于副本集。

Distributed transactions that contain read operations must use read preference primary. All operations in a given transaction must route to the same member.包含读取操作的分布式事务必须使用读取首选项主。给定事务中的所有操作都必须路由到同一成员。

Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.在事务提交之前,事务中所做的数据更改在事务外部不可见。

However, when a transaction writes to multiple shards, not all outside read operations need to wait for the result of the committed transaction to be visible across the shards. 然而,当一个事务写入多个分片时,并非所有外部读取操作都需要等待提交事务的结果在分片之间可见。For example, if a transaction is committed and write 1 is visible on shard A but write 2 is not yet visible on shard B, an outside read at read concern "local" can read the results of write 1 without seeing write 2.例如,如果一个事务已提交,并且写1在分片a上可见,但写2在分片B上尚不可见,则外部读取关注"local"可以读取写1的结果,而看不到写2。

Change Streams更改流

Change streams are available for replica sets and sharded clusters. Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a collection or collections.更改流可用于副本集和分片集群。变更流允许应用程序访问实时数据变更,而不会带来跟踪oplog的复杂性和风险。应用程序可以使用更改流订阅一个或多个集合上的所有数据更改。

Additional Features附加功能

Replica sets provide a number of options to support application needs. 副本集提供了许多选项来支持应用程序需求。For example, you may deploy a replica set with members in multiple data centers, or control the outcome of elections by adjusting the members[n].priority of some members. 例如,您可以在多个数据中心部署一个包含成员的副本集,或者通过调整某些成员的members[n].priority来控制选举结果。Replica sets also support dedicated members for reporting, disaster recovery, or backup functions.副本集还支持用于报告、灾难恢复或备份功能的专用成员。

See Priority 0 Replica Set Members, Hidden Replica Set Members and Delayed Replica Set Members for more information.有关详细信息,请参阅优先级0副本集成员隐藏副本集成员延迟副本集成员

[1](1, 2) In some circumstances, two nodes in a replica set may transiently believe that they are the primary, but at most, one of them will be able to complete writes with { w: "majority" } write concern. 某些情况下,副本集中的两个节点可能会暂时认为它们是主节点,但最多其中一个节点能够完成{ w: "majority" }写操作。The node that can complete { w: "majority" } writes is the current primary, and the other node is a former primary that has not yet recognized its demotion, typically due to a network partition. 可以完成{ w: "majority" }写入的节点是当前主节点,另一个节点是尚未识别其降级的前主节点,通常是由于网络分区When this occurs, clients that connect to the former primary may observe stale data despite having requested read preference primary, and new writes to the former primary will eventually roll back.当这种情况发生时,连接到前一个primary的客户端可能会观察到过时的数据,尽管已经请求了读取首选项主服务器,并且对前一个主机的新写入最终将回滚。