Troubleshoot Replica Sets副本集故障排除

On this page本页内容

This section describes common strategies for troubleshooting replica set deployments.本节介绍了对副本集部署进行故障排除的常见策略。

Check Replica Set Status检查副本集状态

To display the current state of the replica set and current state of each member, run the rs.status() method in a mongosh session that is connected to the replica set's primary. 要显示副本集的当前状态和每个成员的当前状态,请在连接到副本集的primarymongosh会话中运行rs.status()方法。For descriptions of the information displayed by rs.status(), see replSetGetStatus.有关rs.status()显示的信息的说明,请参阅replSetGetStatus

Note注意

The rs.status() method is a wrapper that runs the replSetGetStatus database command.rs.status()方法是运行replSetGetStatus数据库命令的包装器。

Check the Replication Lag检查复制滞后

Replication lag is a delay between an operation on the primary and the application of that operation from the oplog to the secondary. 复制延迟是primary上的操作与该操作从oplogsecondary的应用程序之间的延迟。Replication lag can be a significant issue and can seriously affect MongoDB replica set deployments. 复制延迟可能是一个重要问题,并会严重影响MongoDB副本集部署。Excessive replication lag makes "lagged" members ineligible to quickly become primary and increases the possibility that distributed read operations will be inconsistent.过度的复制滞后会使“滞后”成员无法快速成为主要成员,并增加分布式读取操作不一致的可能性。

To check the current length of replication lag:要检查复制延迟的当前长度,请执行以下操作:

  • In a mongosh session that is connected to the primary, call the rs.printSecondaryReplicationInfo() method.在连接到主会话的mongosh会话中,调用rs.printSecondaryReplicationInfo()方法。

    Returns the syncedTo value for each member, which shows the time when the last oplog entry was written to the secondary, as shown in the following example:返回每个成员的syncedTo值,该值显示最后一个oplog条目写入辅助数据库的时间,如下例所示:

    source: m1.example.net:27017
        syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
        0 secs (0 hrs) behind the primary
    source: m2.example.net:27017
        syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
        0 secs (0 hrs) behind the primary

    A delayed member may show as 0 seconds behind the primary when the inactivity period on the primary is greater than the members[n].secondaryDelaySecs value.当主节点上的非活动周期大于members[n].secondaryDelaySecs值时,延迟的成员可能显示为比主节点晚0秒。

    Note注意

    The rs.status() method is a wrapper around the replSetGetStatus database command.rs.status()方法是replSetGetStatus数据库命令的包装器。

  • Monitor the rate of replication by checking for non-zero or increasing oplog time values in the Replication Lag graph available in Cloud Manager and in Ops Manager.通过在Cloud ManagerOps Manager中的replication Lag图中检查非零或增加的oplog时间值来监控复制速率。

Replication Lag Causes复制滞后原因

Possible causes of replication lag include:复制延迟的可能原因包括:

  • Network Latency网络延迟

    Check the network routes between the members of your set to ensure that there is no packet loss or network routing issue.检查集合成员之间的网络路由,以确保没有数据包丢失或网络路由问题。

    Use tools including ping to test latency between set members and traceroute to expose the routing of packets network endpoints.使用包括ping在内的工具来测试集合成员之间的延迟,并使用traceroute来公开数据包网络端点的路由。

  • Disk Throughput磁盘吞吐量

    If the file system and disk device on the secondary is unable to flush data to disk as quickly as the primary, then the secondary will have difficulty keeping state. 如果辅助设备上的文件系统和磁盘设备无法像主设备一样快速地将数据刷新到磁盘,那么辅助设备将很难保持状态。Disk-related issues are incredibly prevalent on multi-tenant systems, including virtualized instances, and can be transient if the system accesses disk devices over an IP network (as is the case with Amazon's EBS system.)与磁盘相关的问题在多租户系统(包括虚拟化实例)上非常普遍,如果系统通过IP网络访问磁盘设备,则可能是暂时的(就像亚马逊的EBS系一致性样)

    Use system-level tools to assess disk status, including iostat or vmstat.使用系统级工具评估磁盘状态,包括iostatvmstat

  • Concurrency并发性

    In some cases, long-running operations on the primary can block replication on secondaries. 在某些情况下,主服务器上的长时间运行操作可能会阻止辅助服务器上的复制。For best results, configure write concern to require confirmation of replication to secondaries. 为了获得最佳结果,请将写入关注配置为要求确认复制到辅助服务器。This prevents write operations from returning if replication cannot keep up with the write load.如果复制无法跟上写入负载,这将防止写入操作返回。

    You can also use the database profiler to see if there are slow queries or long-running operations that correspond to the incidences of lag.您还可以使用数据库探查器来查看是否存在与延迟发生率相对应的慢速查询或长时间运行的操作。

  • Appropriate Write Concern适当的写入问题

    If you are performing a large data ingestion or bulk load operation that requires a large number of writes to the primary, particularly with unacknowledged write concern, the secondaries will not be able to read the oplog fast enough to keep up with changes.如果您正在执行大量数据摄取或大容量加载操作,需要对主服务器进行大量写入,特别是在未确认写入关注的情况下,辅助服务器将无法以足够快的速度读取oplog以跟上更改。

    To prevent this, request write acknowledgement write concern after every 100, 1,000, or another interval to provide an opportunity for secondaries to catch up with the primary.为了防止这种情况发生,请在每100、1000或另一个间隔后请求写确认写入关注,以便为辅助设备提供机会赶上主设备。

    For more information see:有关详细信息,请参阅:

Flow Control流量控制

Starting in MongoDB 4.2, administrators can limit the rate at which the primary applies its writes with the goal of keeping the majority committed lag under a configurable maximum value flowControlTargetLagSeconds.从MongoDB 4.2开始,管理员可以限制主应用其写入的速率,目的是将majority提交的延迟保持在可配置的最大值flowControlTargetLagSeconds之下。

By default, flow control is enabled.

Note注意

For flow control to engage, the replica set/sharded cluster must have: featureCompatibilityVersion (FCV) of 4.2 and read concern majority enabled. 为了实现流控制,副本集/分片集群必须具有:featureCompatibilityVersion (FCV)4.2,并启用读取关注majorityThat is, enabled flow control has no effect if FCV is not 4.2 or if read concern majority is disabled.也就是说,如果FCV不是4.2或如果读取关注多数被禁用,则启用的流量控制无效。

With flow control enabled, as the lag grows close to the flowControlTargetLagSeconds, writes on the primary must obtain tickets before taking locks to apply writes. 启用流控制后,随着延迟接近flowControlTargetLagSeconds,在主节点上的写操作必须在获取锁以应用写操作之前获得票证。By limiting the number of tickets issued per second, the flow control mechanism attempts to keep the lag under the target.通过限制每秒发出的票的数量,流量控制机制试图将延迟保持在目标之下。

Replication lag can occur without the replica set receiving sufficient load to engage flow control, such as in the case of an unresponsive secondary.复制延迟可能发生在副本集没有接收到足够的负载来进行流控制的情况下,例如在secondary没有响应的情况下。

To view the status of flow control, run the following commands on the primary:要查看流量控制的状态,请在primary上运行以下命令:

  1. Run the rs.printSecondaryReplicationInfo() method to determine if any nodes are lagging:运行rs.printSecondaryReplicationInfo()方法以确定是否有任何节点滞后:

    rs.printSecondaryReplicationInfo()

    Example output:示例输出:

    source: 192.0.2.2:27017
    {
      syncedTo: 'Mon Jan 31 2022 18:58:50 GMT+0000 (Coordinated Universal Time)',
      replLag: '0 secs (0 hrs) behind the primary '
    }
    ---
    source: 192.0.2.3:27017
    {
      syncedTo: 'Mon Jan 31 2022 18:58:05 GMT+0000 (Coordinated Universal Time)',
      replLag: '45 secs (0 hrs) behind the primary '
    }
  2. Run the serverStatus command and use the flowControl.isLagged value to determine whether the replica set has engaged flow control:运行serverStatus命令并使用flowControl.isLagged值来确定复制集是否具有流控制:

    db.runCommand( { serverStatus: 1 } ).flowControl.isLagged

    Example output:示例输出:

    false

    If flow control has not engaged, investigate the secondary to determine the cause of the replication lag, such as limitations in the hardware, network, or application.如果流控制尚未启用,请调查secondary以确定复制延迟的原因,例如硬件、网络或应用程序中的限制。

For information on flow control statistics, see:有关流量控制统计信息,请参阅:

Slow Application of Oplog EntriesOplog条目应用缓慢

Starting in version 4.2 (also available starting in 4.0.6), secondary members of a replica set now log oplog entries that take longer than the slow operation threshold to apply. 从4.2版开始(也从4.0.6版开始提供),副本集的辅助成员现在记录的oplog条目的应用时间超过了慢操作阈值。These slow oplog messages:这些慢速oplog消息:

  • Are logged for the secondaries in the diagnostic log.诊断日志中记录辅助设备。
  • Are logged under the REPL component with the text applied op: <oplog entry> took <num>ms.记录在REPL组件下,并带有文本applied op: <oplog entry> took <num>ms
  • Do not depend on the log levels (either at the system or component level)不依赖日志级别(在系统或组件级别)
  • Do not depend on the profiling level.不要依赖于分析级别。
  • May be affected by slowOpSampleRate, depending on your MongoDB version:可能会受到slowOpSampleRate的影响,具体取决于MongoDB版本:

    • In MongoDB 4.2 and earlier, these slow oplog entries are not affected by the slowOpSampleRate. 在MongoDB 4.2及更早版本中,这些慢oplog条目不受慢slowOpSampleRate的影响。MongoDB logs all slow oplog entries regardless of the sample rate.MongoDB记录所有慢速oplog条目,而不考虑采样率。
    • In MongoDB 4.4 and later, these slow oplog entries are affected by the slowOpSampleRate.在MongoDB 4.4及更高版本中,这些慢oplog条目受slowOpSampleRate的影响。

The profiler does not capture slow oplog entries.探查器未捕获慢速oplog条目。

Test Connections Between all Members测试所有成员之间的连接

All members of a replica set must be able to connect to every other member of the set to support replication. 副本集的所有成员必须能够连接到该集的每个其他成员以支持复制。Always verify connections in both "directions." 始终验证两个“方向”的连接Networking topologies and firewall configurations can prevent normal and required connectivity, which can block replication.网络拓扑和防火墙配置会阻止正常和所需的连接,从而阻止复制。

Changed in version 3.6.在版本3.6中更改

Warning警告
Before binding to a non-localhost (e.g. publicly accessible) IP address, ensure you have secured your cluster from unauthorized access. 在绑定到非本地主机(例如,可公开访问的)IP地址之前,请确保已保护集群免受未经授权的访问。For a complete list of security recommendations, see Security Checklist. 有关安全建议的完整列表,请参阅安全检查表At minimum, consider enabling authentication and hardening network infrastructure.至少,考虑启用身份验证加强网络基础设施
MongoDB binaries, mongod and mongos, bind to localhost by default. MongoDB二进制文件mongodmongos默认绑定到localhost。If the net.ipv6 configuration file setting or the --ipv6 command line option is set for the binary, the binary additionally binds to the localhost IPv6 address.如果为二进制文件设置了net.ipv6配置文件设置或--ipv6命令行选项,则二进制文件还将绑定到localhost ipv6地址。By default mongod and mongos that are bound to localhost only accept connections from clients that are running on the same computer. 默认情况下,绑定到localhost的mongodmongos只接受来自运行在同一台计算机上的客户端的连接。This binding behavior includes mongosh and other members of your replica set or sharded cluster. 此绑定行为包括mongosh和副本集或分片集群的其他成员。Remote clients cannot connect to binaries that are bound only to localhost.远程客户端无法连接到仅绑定到本地主机的二进制文件。To override the default binding and bind to other IP addresses, use the net.bindIp configuration file setting or the --bind_ip command-line option to specify a list of hostnames or IP addresses.要覆盖默认绑定并绑定到其他IP地址,请使用net.bindIp配置文件设置或--bind_ip命令行选项指定主机名或IP地址列表。
Warning警告
Starting in MongDB 5.0, split horizon DNS nodes that are only configured with an IP address fail startup validation and report an error. 从MongDB 5.0开始,仅使用IP地址配置的拆分地平线DNS节点无法通过启动验证并报告错误。See disableSplitHorizonIPCheck.请参见disableSplitHorizonIPCheck
For example, the following mongod instance binds to both the localhost and the hostname My-Example-Associated-Hostname, which is associated with the IP address 198.51.100.1:例如,以下mongod实例绑定到本地主机和主机名My-Example-Associated-Hostname,该主机名与IP地址198.51.100.1关联:
mongod --bind_ip localhost,My-Example-Associated-Hostname
In order to connect to this instance, remote clients must specify the hostname or its associated IP address 198.51.100.1:为了连接到此实例,远程客户端必须指定主机名或其关联的IP地址198.51.100.1
mongosh --host My-Example-Associated-Hostname
mongosh --host 198.51.100.1

Consider the following example of a bidirectional test of networking:考虑以下网络双向测试示例:

Example示例

Given a replica set with three members running on three separate hosts:给定一个副本集,其中三个成员在三个独立的主机上运行:

  • m1.example.net
  • m2.example.net
  • m3.example.net

All three use the default port 27017.这三个端口都使用默认端口27017

  1. Test the connection from m1.example.net to the other hosts with the following operation set m1.example.net:使用以下操作集m1.example.net测试从m1.example.net到其他主机的连接:

    mongosh --host m2.example.net --port 27017
    mongosh --host m3.example.net --port 27017
  2. Test the connection from m2.example.net to the other two hosts with the following operation set from m2.example.net, as in:使用m2.example.net中的以下操作集测试从m2.example.net到其他两台主机的连接,如所示:

    mongosh --host m1.example.net --port 27017
    mongosh --host m3.example.net --port 27017

    You have now tested the connection between m2.example.net and m1.example.net in both directions.现在,您已经在两个方向上测试了m2.example.netm1.example.net之间的连接。

  3. Test the connection from m3.example.net to the other two hosts with the following operation set from the m3.example.net host, as in:使用m3.example.net主机上的以下操作集测试m3.example.net与其他两台主机的连接,如所示:

    mongosh --host m1.example.net --port 27017
    mongosh --host m2.example.net --port 27017

If any connection, in any direction fails, check your networking and firewall configuration and reconfigure your environment to allow these connections.如果任何方向的连接失败,请检查网络和防火墙配置,并重新配置环境以允许这些连接。

Socket Exceptions when Rebooting More than One Secondary重新启动多个辅助服务器时出现套接字异常

When you reboot members of a replica set, ensure that the set is able to elect a primary during the maintenance. 重新启动副本集的成员时,请确保该集能够在维护期间选择主副本。This means ensuring that a majority of the set's members[n].votes are available.这意味着确保该组members[n].votes多数可用。

When a set's active members can no longer form a majority, the set's primary steps down and becomes a secondary. 当一个集合的活跃成员不能再形成多数时,该集合的primary会逐渐减少,成为secondaryStarting in MongoDB 4.2, when the primary steps down, it no longer closes all client connections. 从MongoDB 4.2开始,当主服务器关闭时,它不再关闭所有客户端连接。In MongoDB 4.0 and earlier, when the primary steps down, it closes all client connections.在MongoDB 4.0及更早版本中,当主服务器关闭时,它会关闭所有客户端连接。

Clients cannot write to the replica set until the members elect a new primary.在成员选择新的主副本之前,客户端无法写入副本集。

Example示例

Given a three-member replica set where every member has one vote, the set can elect a primary if at least two members can connect to each other. 给定一个三成员副本集,其中每个成员都有一票,如果至少有两个成员可以相互连接,则该集可以选择一个主副本集。If you reboot the two secondaries at once, the primary steps down and becomes a secondary. 如果同时重新启动两个辅助服务器,则主服务器将降级并成为辅助服务器。Until at least another secondary becomes available, i.e. at least one of the rebooted secondaries also becomes available, the set has no primary and cannot elect a new primary.在至少有另一个辅助设备可用之前,即至少有一个重新启动的辅助设备也可用之前,该集没有主设备,无法选择新的主设备。

For more information on votes, see Replica Set Elections. 有关投票的详细信息,请参阅副本集选举For related information on connection errors, see Does TCP keepalive time affect MongoDB Deployments?.有关连接错误的相关信息,请参阅TCPkeepalive时间是否影响MongoDB部署?

Check the Size of the Oplog检查Oplog的大小

A larger oplog can give a replica set a greater tolerance for lag, and make the set more resilient.较大的oplog可以为副本集提供更大的延迟容忍度,并使副本集更具弹性。

To check the size of the oplog for a given replica set member, connect to the member in mongosh and run the rs.printReplicationInfo() method.要检查给定副本集成员的oplog大小,请连接到mongosh中的成员并运行rs.printReplicationInfo()方法。

The output displays the size of the oplog and the date ranges of the operations contained in the oplog. 输出显示oplog的大小和oplog中包含的操作的日期范围。In the following example, the oplog is about 10 MB and is able to fit about 26 hours (94400 seconds) of operations:在下面的示例中,oplog大约为10MB,能够容纳大约26小时(94400秒)的操作:

configured oplog size:   10.10546875MB
log length start to end: 94400 (26.22hrs)
oplog first event time:  Mon Mar 19 2012 13:50:38 GMT-0400 (EDT)
oplog last event time:   Wed Oct 03 2012 14:59:10 GMT-0400 (EDT)
now:
                     Wed Oct 03 2012 15:00:21 GMT-0400 (EDT)

The oplog should be long enough to hold all transactions for the longest downtime you expect on a secondary. oplog应该足够长,以便在辅助服务器上保存所有事务的最长停机时间。[1] At a minimum, an oplog should be able to hold minimum 24 hours of operations; however, many users prefer to have 72 hours or even a week's work of operations.操作日志至少应能保存24小时的操作;然而,许多用户更喜欢72小时甚至一周的工作时间。

For more information on how oplog size affects operations, see:有关oplog大小如何影响操作的更多信息,请参阅:

Note注意

You normally want the oplog to be the same size on all members. 您通常希望所有成员的oplog大小相同。If you resize the oplog, resize it on all members.如果调整oplog的大小,请在所有成员上调整其大小。

To change oplog size, see the Change the Size of the Oplog tutorial.要更改oplog大小,请参阅更改oplog的大小教程

[1] Starting in MongoDB 4.0, the oplog can grow past its configured size limit to avoid deleting the majority commit point.从MongoDB 4.0开始,oplog可以超过其配置的大小限制,以避免删除majority提交点
←  Replica Set Protocol VersionThe local Database →