Docs HomeMongoDB Manual

Replica Set Oplog副本集操作日志

The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases.oplog(操作日志)是一个特殊的封顶集合,用于滚动记录修改数据库中存储的数据的所有操作。

Unlike other capped collections, the oplog can grow past its configured size limit to avoid deleting the majority commit point.与其他有上限的集合不同,oplog可以超过其配置的大小限制,以避免删除多数提交点

New in version 4.4:4.4版新增:MongoDB 4.4 supports specifying a minimum oplog retention period in hours, where MongoDB only removes an oplog entry if:MongoDB 4.4支持以小时为单位指定oplog的最小保留期,其中MongoDB仅在以下情况下删除oplog条目:

  • The oplog has reached the maximum configured size, andoplog已达到配置的最大大小,并且
  • The oplog entry is older than the configured number of hours.oplog条目的时间早于配置的小时数。

MongoDB applies database operations on the primary and then records the operations on the primary's oplog. MongoDB在primary上应用数据库操作,然后将操作记录在primary的oplog上。The secondary members then copy and apply these operations in an asynchronous process. 然后,secondary成员在异步进程中复制并应用这些操作。All replica set members contain a copy of the oplog, in the local.oplog.rs collection, which allows them to maintain the current state of the database.所有副本集成员都在local.oplog.rs集合中包含oplog的副本,这使他们能够维护数据库的当前状态。

To facilitate replication, all replica set members send heartbeats (pings) to all other members. Any secondary member can import oplog entries from any other member.为了便于复制,所有副本集成员都会向所有其他成员发送检测信号(ping)。任何辅助成员都可以从任何其他成员导入oplog条目。

Each operation in the oplog is idempotent. oplog中的每个操作都是幂等的That is, oplog operations produce the same results whether applied once or multiple times to the target dataset.也就是说,无论对目标数据集应用一次还是多次,oplog操作都会产生相同的结果。

Oplog SizeOplog大小

When you start a replica set member for the first time, MongoDB creates an oplog of a default size if you do not specify the oplog size. 当您第一次启动副本集成员时,如果您没有指定oplog大小,MongoDB会创建一个默认大小的oplog。[1]

For Unix and Windows systems对于Unix和Windows系统

The default oplog size depends on the storage engine:默认操作日志大小取决于存储引擎:

Storage Engine存储引擎Default Oplog Size默认操作日志大小Lower Bound下限Upper Bound上限
In-Memory Storage Engine5% of physical memory50 MB50 GB
WiredTiger Storage Engine5% of free disk space990 MB50 GB
For 64-bit macOS systems对于64位macOS系统

The default oplog size is 192 MB of either physical memory or free disk space depending on the storage engine:默认操作日志大小为192 MB的物理内存或可用磁盘空间,具体取决于存储引擎:

Storage Engine存储引擎Default Oplog Size默认操作日志大小
In-Memory Storage Engine内存存储引擎192 MB of physical memory192 MB物理内存
WiredTiger Storage EngineWiredTiger存储引擎192 MB of free disk space192 MB可用磁盘空间

In most cases, the default oplog size is sufficient. 在大多数情况下,默认的oplog大小就足够了。For example, if an oplog is 5% of free disk space and fills up in 24 hours of operations, then secondaries can stop copying entries from the oplog for up to 24 hours without becoming too stale to continue replicating. 例如,如果操作日志占可用磁盘空间的5%,并且在24小时的操作中已满,则辅助设备可以停止从操作日志中复制条目长达24小时,而不会变得太陈旧而无法继续复制。However, most replica sets have much lower operation volumes, and their oplogs can hold much higher numbers of operations.但是,大多数副本集的操作量要低得多,它们的操作日志可以容纳更多的操作。

Before mongod creates an oplog, you can specify its size with the oplogSizeMB option. mongod创建oplog之前,您可以使用oplogSizeMB选项指定其大小。Once you have started a replica set member for the first time, use the replSetResizeOplog administrative command to change the oplog size. 第一次启动复制副本集成员后,请使用replSetResizeOplog管理命令更改操作日志大小。replSetResizeOplog enables you to resize the oplog dynamically without restarting the mongod process.replSetResizeOplog使您能够在不重新启动mongod进程的情况下动态调整oplog的大小。

New in version 4.4:4.4版新增:Starting in MongoDB 4.4, you can specify the minimum number of hours to preserve an oplog entry. 从MongoDB 4.4开始,您可以指定保留oplog条目的最小小时数。The mongod only truncates an oplog entry if: mongod仅在以下情况下截断oplog条目:

  • The oplog has reached the maximum configured size, andoplog已达到配置的最大大小,并且
  • The oplog entry is older than the configured number of hours based on the host system clock.oplog条目早于基于主机系统时钟配置的小时数。
By default MongoDB does not set a minimum oplog retention period and automatically truncates the oplog starting with the oldest entries to maintain the configured maximum oplog size.默认情况下,MongoDB不会设置oplog的最小保留期,并自动截断从最旧条目开始的oplog,以保持配置的最大oplog大小。

See Minimum Oplog Retention Period for more information.有关详细信息,请参阅最短操作日志保留期

[1] The oplog can grow past its configured size limit to avoid deleting the majority commit point.oplog可以超过其配置的大小限制,以避免删除多数提交点

Minimum Oplog Retention Period最短操作日志保留期

New in version 4.4:4.4版新增:Starting in MongoDB 4.4, you can specify the minimum number of hours to preserve an oplog entry. 从MongoDB 4.4开始,您可以指定保留oplog条目的最小小时数。The mongod only removes an oplog entry if: mongod仅在以下情况下删除oplog条目:

  • The oplog has reached the maximum configured size, andoplog已达到配置的最大大小,并且
  • The oplog entry is older than the configured number of hours based on the host system clock.oplog条目早于基于主机系统时钟配置的小时数。
By default MongoDB does not set a minimum oplog retention period and automatically truncates the oplog starting with the oldest entries to maintain the configured maximum oplog size.默认情况下,MongoDB不会设置oplog的最小保留期,并自动截断从最旧条目开始的oplog,以保持配置的最大oplog大小。

To configure the minimum oplog retention period when starting the mongod, either:要在启动mongod时配置oplog的最小保留期,请执行以下操作之一:

To configure the minimum oplog retention period on a running mongod, use replSetResizeOplog. 要在运行的mongod上配置oplog的最小保留期,请使用replSetResizeOplogSetting the minimum oplog retention period while the mongod is running overrides any values set on startup. mongod运行时设置oplog的最小保留期会覆盖启动时设置的任何值。You must update the value of the corresponding configuration file setting or command line option to persist those changes through a server restart.您必须更新相应配置文件设置或命令行选项的值,以便在服务器重新启动时保持这些更改。

Oplog Window操作日志窗口

oplog entries are time-stamped. oplog条目带有时间戳。The oplog window is the time difference between the newest and the oldest timestamps in the oplog. oplog窗口是oplog中最新和最旧时间戳之间的时间差。If a secondary node loses connection with the primary, it can only use replication to sync up again if the connection is restored within the oplog window.如果辅助节点失去与主节点的连接,则只有在oplog窗口内恢复连接时,它才能使用replication再次同步。

Workloads that Might Require a Larger Oplog Size可能需要更大操作日志大小的工作负载

If you can predict your replica set's workload to resemble one of the following patterns, then you might want to create an oplog that is larger than the default. 如果您可以预测复制副本集的工作负载类似于以下模式之一,那么您可能需要创建一个比默认值更大的操作日志。Conversely, if your application predominantly performs reads with a minimal amount of write operations, a smaller oplog may be sufficient.相反,如果您的应用程序主要用最少的写操作执行读取,那么较小的oplog可能就足够了。

The following workloads might require a larger oplog size.以下工作负载可能需要更大的oplog大小。

Updates to Multiple Documents at Once一次更新多个文档

The oplog must translate multi-updates into individual operations in order to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in data size or disk use.操作日志必须将多个更新转换为单独的操作,以保持幂等性。这可以使用大量的oplog空间,而不会相应增加数据大小或磁盘使用量。

Deletions Equal the Same Amount of Data as Inserts删除等于插入的数据量

If you delete roughly the same amount of data as you insert, the database will not grow significantly in disk use, but the size of the operation log can be quite large.如果删除的数据量与插入的数据量大致相同,则数据库的磁盘使用量不会显著增加,但操作日志的大小可能相当大。

Significant Number of In-Place Updates大量的就地更新

If a significant portion of the workload is updates that do not increase the size of the documents, the database records a large number of operations but does not change the quantity of data on disk.如果工作负载的很大一部分是不会增加文档大小的更新,则数据库会记录大量操作,但不会更改磁盘上的数据量。

Oplog Status操作日志状态

To view oplog status, including the size and the time range of operations, issue the rs.printReplicationInfo() method. 要查看操作日志状态,包括操作的大小和时间范围,请发出rs.printReplicationInfo()方法。For more information on oplog status, see Check the Size of the Oplog.有关操作日志状态的更多信息,请参阅检查操作日志的大小

Replication Lag and Flow Control复制滞后和流量控制

Under various exceptional situations, updates to a secondary's oplog might lag behind the desired performance time. 在各种特殊情况下,对secondary操作日志的更新可能会滞后于所需的性能时间。Use db.getReplicationInfo() from a secondary member and the replication status output to assess the current state of replication and determine if there is any unintended replication delay.使用辅助成员的db.getReplicationInfo()和复制状态输出来评估复制的当前状态,并确定是否存在任何意外的复制延迟。

Starting in MongoDB 4.2, administrators can limit the rate at which the primary applies its writes with the goal of keeping the majority committed lag under a configurable maximum value flowControlTargetLagSeconds.从MongoDB 4.2开始,管理员可以限制主应用写入的速率,目的是将大多数提交的延迟保持在可配置的最大值flowControlTargetLagSeconds之下。

By default, flow control is enabled.默认情况下,流量控制处于enabled状态。

Note

For flow control to engage, the replica set/sharded cluster must have: featureCompatibilityVersion (fCV) of 4.2 and read concern majority enabled. 要进行流控制,副本集/分片集群必须具有:featureCompatibilityVersion (fCV)4.2,并启用读取关注多数。That is, enabled flow control has no effect if fCV is not 4.2 or if read concern majority is disabled.也就是说,如果fCV不是4.2,或者如果读取关注多数被禁用,则启用的流量控制无效。

See Replication Lag for more information.有关详细信息,请参阅复制滞后

Slow Oplog Application操作日志应用程序缓慢

Secondary members of a replica set log oplog entries that take longer than the slow operation threshold to apply. 复制副本集的辅助成员记录应用时间超过慢速操作阈值的oplog项。These messages are logged for the secondaries under the REPL component with the text applied op: <oplog entry> took <num>ms.这些消息记录在REPL组件下的辅助设备中,并应用文本applied op: <oplog entry> took <num>ms

2018-11-16T12:31:35.886-05:00 I REPL   [repl writer worker 13] applied op: command { ... }, took 112ms

The slow oplog application logging on secondaries are:辅助设备上的操作日志应用程序日志记录速度较慢:

For more information on setting the slow operation threshold, see有关设置慢速操作阈值的更多信息,请参阅

Oplog Collection BehaviorOplog集合行为

You cannot drop the local.oplog.rs collection from any replica set member if your MongoDB deployment uses the WiredTiger Storage Engine. 如果MongoDB部署使用WiredTiger存储引擎,则不能从任何副本集成员中删除local.oplog.rs集合。You cannot drop the local.oplog.rs collection from a standalone MongoDB instance. 您不能从独立的MongoDB实例中删除local.oplog.rs集合。mongod requires the oplog for both Replication and recovery of a node if the node goes down.如果节点出现故障,则需要用于节点的复制和恢复的操作日志。

Starting in MongoDB 5.0, it is no longer possible to perform manual write operations to the oplog on a cluster running as a replica set. 从MongoDB 5.0开始,在作为副本集运行的集群上,不再可能对oplog执行手动写入操作。Performing write operations to the oplog when running as a standalone instance should only be done with guidance from MongoDB Support.当作为独立实例运行时,对oplog执行写操作只能在MongoDB支持的指导下完成。