Troubleshoot Replica Sets副本集疑难解答
On this page本页内容
This section describes common strategies for troubleshooting replica set deployments.本节介绍了对副本集部署进行故障排除的常见策略。
Check Replica Set Status检查副本集状态
To display the current state of the replica set and current state of each member, run the 要显示副本集的当前状态和每个成员的当前状态,请在连接到副本集的primary的rs.status()
method in a mongosh
session that is connected to the replica set's primary. mongosh
会话中运行<rs.status()
方法。For descriptions of the information displayed by 有关rs.status()
, see replSetGetStatus.rs.status()
显示的信息的描述,请参阅replSetGetStatus
。
The rs.status()
method is a wrapper that runs the replSetGetStatus
database command.rs.status()
方法是一个运行replSetGetStatus
数据库命令的包装器。
Check the Replication Lag检查复制滞后
Replication lag is a delay between an operation on the primary and the application of that operation from the oplog to the secondary. 复制滞后是指primary上的操作和该操作的应用程序从oplog到secondary之间的延迟。Replication lag can be a significant issue and can seriously affect MongoDB replica set deployments. 复制滞后可能是一个重大问题,并可能严重影响MongoDB副本集的部署。Excessive replication lag makes "lagged" members ineligible to quickly become primary and increases the possibility that distributed read operations will be inconsistent.过多的复制滞后使“滞后”成员没有资格快速成为主要成员,并增加了分布式读取操作不一致的可能性。
To check the current length of replication lag:要检查复制滞后的当前长度,请执行以下操作:
In a在连接到主服务器的mongosh
session that is connected to the primary, call thers.printSecondaryReplicationInfo()
method.mongosh
会话中,调用rs.printSecondaryReplicationInfo()
方法。Returns the返回每个成员的syncedTo
value for each member, which shows the time when the last oplog entry was written to the secondary, as shown in the following example:syncedTo
值,该值显示最后一个oplog条目写入辅助日志的时间,如以下示例所示:source: m1.example.net:27017
syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
0 secs (0 hrs) behind the primary
source: m2.example.net:27017
syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
0 secs (0 hrs) behind the primaryA delayed member may show as当主成员的不活动时间大于0
seconds behind the primary when the inactivity period on the primary is greater than themembers[n].secondaryDelaySecs
value.members[n].secondaryDelaySecs
值时,延迟成员可能显示为比主成员晚0
秒。NoteThers.status()
method is a wrapper around thereplSetGetStatus
database command.rs.status()
方法是replSetGetStatus
数据库命令的包装器。Starting in MongoDB 7.0, the从MongoDB 7.0开始,慢速查询日志消息中的totalOplogSlotDurationMicros
in the slow query log message shows the time between a write operation getting a commit timestamp to commit the storage engine writes and actually committing.totalOplogSlotDurationMicros
显示了写入操作获得提交存储引擎写入的提交时间戳与实际提交之间的时间。mongod
supports parallel writes.mongod
支持并行写入。However, it commits write operations with commit timestamps in any order.但是,它以任何顺序提交带有提交时间戳的写操作。ExampleConsider the following writes with commit timestamps:考虑以下带有提交时间戳的写入操作:- writeA with Timestamp1
- writeB with Timestamp2
- writeC with Timestamp3
Suppose writeB commits first at Timestamp2.假设writeB在Timestamp2首先提交。Replication is paused until writeA commits because writeA's oplog entry with Timestamp1 is required for replication to copy the oplog to secondary replica set members.复制将暂停,直到writeA提交,因为复制需要writeA的带有Timestamp1的oplog条目才能将oplog复制到辅助副本集成员。Monitor the rate of replication by checking for non-zero or increasing oplog time values in the Replication Lag graph available in Cloud Manager and in Ops Manager.通过在Cloud Manager和Ops Manager中可用的replication Lag图中检查非零或增加的oplog时间值来监控复制速率。
Replication Lag Causes复制滞后的原因
Possible causes of replication lag include:复制滞后的可能原因包括:
Network Latency网络延迟Check the network routes between the members of your set to ensure that there is no packet loss or network routing issue.检查集合成员之间的网络路由,以确保没有数据包丢失或网络路由问题。Use tools including使用包括ping
to test latency between set members andtraceroute
to expose the routing of packets network endpoints.ping
在内的工具来测试集合成员之间的延迟,使用traceroute
来公开数据包网络端点的路由。Disk Throughput磁盘吞吐量If the file system and disk device on the secondary is unable to flush data to disk as quickly as the primary, then the secondary will have difficulty keeping state.如果辅助设备上的文件系统和磁盘设备无法像主设备那样快速地将数据刷新到磁盘,则辅助设备将难以保持状态。Disk-related issues are incredibly prevalent on multi-tenant systems, including virtualized instances, and can be transient if the system accesses disk devices over an IP network (as is the case with Amazon's EBS system.)与磁盘相关的问题在包括虚拟化实例在内的多租户系统中非常普遍,如果系统通过IP网络访问磁盘设备(就像亚马逊的EBS系一致性样),这些问题可能是暂时的Use system-level tools to assess disk status, including使用系统级工具来评估磁盘状态,包括iostat
orvmstat
.iostat
或vmstat
。Concurrency并发In some cases, long-running operations on the primary can block replication on secondaries.在某些情况下,主服务器上的长时间运行操作可能会阻止辅助服务器上的复制。For best results, configure write concern to require confirmation of replication to secondaries.为了获得最佳结果,请将写入关注配置为要求确认复制到辅助设备。This prevents write operations from returning if replication cannot keep up with the write load.如果复制无法跟上写入负载,这将防止写入操作返回。You can also use the database profiler to see if there are slow queries or long-running operations that correspond to the incidences of lag.您还可以使用数据库探查器来查看是否存在与滞后事件相对应的慢速查询或长时间运行的操作。Appropriate Write Concern适当的书面关注If you are performing a large data ingestion or bulk load operation that requires a large number of writes to the primary, particularly with如果执行的大型数据摄取或大容量加载操作需要对主系统进行大量写入,特别是在未确认的写入关注下,则辅助系统将无法以足够快的速度读取操作日志以跟上更改。unacknowledged write concern
, the secondaries will not be able to read the oplog fast enough to keep up with changes.To prevent this, request write acknowledgement write concern after every 100, 1,000, or another interval to provide an opportunity for secondaries to catch up with the primary.为了防止这种情况,每隔100、1000或另一个时间间隔请求写入确认写入关注,以便为辅助设备提供赶上主设备的机会。For more information see:有关更多信息,请参阅:
Flow Control流量控制
Starting in MongoDB 4.2, administrators can limit the rate at which the primary applies its writes with the goal of keeping the 从MongoDB 4.2开始,管理员可以限制主应用写入的速率,目的是将大多数提交的延迟保持在可配置的最大值majority committed
lag under a configurable maximum value flowControlTargetLagSeconds
.flowControlTargetLagSeconds
之下。
By default, flow control is 默认情况下,流量控制处于启用状态。enabled
.
For flow control to engage, the replica set/sharded cluster must have: featureCompatibilityVersion (fCV) of 要进行流控制,副本集/分片集群必须具有:featureCompatibilityVersion(fCV)4.2
and read concern majority enabled
. 4.2
,并启用读取关注majority
。That is, enabled flow control has no effect if fCV is not 也就是说,如果fCV不是4.2
or if read concern majority is disabled.4.2
,或者如果读取关注多数被禁用,则启用的流量控制无效。
With flow control enabled, as the lag grows close to the 在启用流控制的情况下,随着滞后时间接近flowControlTargetLagSeconds
, writes on the primary must obtain tickets before taking locks to apply writes. flowControlTargetLagSeconds
,主服务器上的写入必须在获取锁以应用写入之前获得票证。By limiting the number of tickets issued per second, the flow control mechanism attempts to keep the lag under the target.通过限制每秒发出的票证数量,流控制机制试图将滞后保持在目标之下。
Replication lag can occur without the replica set receiving sufficient load to engage flow control, such as in the case of an unresponsive secondary.复制滞后可能会在复制副本集没有接收到足够的负载来参与流控制的情况下发生,例如在secondary服务器没有响应的情况下。
To view the status of flow control, run the following commands on the primary:要查看流量控制的状态,请在primary
上运行以下命令:
Run the运行rs.printSecondaryReplicationInfo()
method to determine if any nodes are lagging:rs.printSecondaryReplicationInfo()
方法以确定是否有任何节点滞后:rs.printSecondaryReplicationInfo()
Example output:示例输出:source: 192.0.2.2:27017
{
syncedTo: 'Mon Jan 31 2022 18:58:50 GMT+0000 (Coordinated Universal Time)',
replLag: '0 secs (0 hrs) behind the primary '
}
---
source: 192.0.2.3:27017
{
syncedTo: 'Mon Jan 31 2022 18:58:05 GMT+0000 (Coordinated Universal Time)',
replLag: '45 secs (0 hrs) behind the primary '
}Run the运行serverStatus
command and use theflowControl.isLagged
value to determine whether the replica set has engaged flow control:serverStatus
命令并使用flowControl.isLagged
值来确定复制集是否已启用流控制:db.runCommand( { serverStatus: 1 } ).flowControl.isLagged
Example output:输出示例:false
If flow control has not engaged, investigate the secondary to determine the cause of the replication lag, such as limitations in the hardware, network, or application.如果尚未启用流控制,请调查secondary以确定复制滞后的原因,例如硬件、网络或应用程序中的限制。
For information on flow control statistics, see:有关流量控制统计信息,请参阅:
Slow Application of Oplog EntriesOplog条目应用缓慢
Starting in version 4.2, secondary members of a replica set now log oplog entries that take longer than the slow operation threshold to apply. 从4.2版开始,副本集的辅助成员现在会记录应用时间超过慢速操作阈值的oplog条目。These slow oplog messages:这些慢速操作日志消息:
Are logged for the secondaries in the在诊断日志中为辅助设备记录。diagnostic log
.Are logged under the在REPL
component with the textapplied op: <oplog entry> took <num>ms
.REPL
组件下记录,并应用文本applied op: <oplog entry> took <num>ms
。Do not depend on the log levels (either at the system or component level)不依赖于日志级别(在系统或组件级别)Do not depend on the profiling level.不要依赖于分析级别。May be affected by可能会受到slowOpSampleRate
, depending on your MongoDB version:slowOpSampleRate
的影响,具体取决于您的MongoDB版本:In MongoDB 4.2, these slow oplog entries are not affected by the在MongoDB 4.2中,这些慢速操作日志条目不受slowOpSampleRate
.slowOpSampleRate
的影响。MongoDB logs all slow oplog entries regardless of the sample rate.MongoDB记录所有慢速操作日志条目,而不管采样率如何。In MongoDB 4.4 and later, these slow oplog entries are affected by the在MongoDB 4.4及更高版本中,这些慢速操作日志条目受到slowOpSampleRate
.slowOpSampleRate
的影响。
The profiler does not capture slow oplog entries.探查器未捕获慢速操作日志项。
Test Connections Between all Members测试所有成员之间的连接
All members of a replica set must be able to connect to every other member of the set to support replication. 副本集的所有成员都必须能够连接到该集的其他成员,才能支持复制。Always verify connections in both "directions." 始终验证两个“方向”上的连接Networking topologies and firewall configurations can prevent normal and required connectivity, which can block replication.网络拓扑和防火墙配置可能会阻止正常和必需的连接,从而阻止复制。
Before you bind your instance to a publicly-accessible IP address, you must secure your cluster from unauthorized access. 在将实例绑定到可公开访问的IP地址之前,必须保护群集不受未经授权的访问。For a complete list of security recommendations, see Security Checklist. 有关安全建议的完整列表,请参阅安全检查表。At minimum, consider enabling authentication and hardening network infrastructure.至少,考虑启用身份验证和强化网络基础设施。
MongoDB binaries, MongoDB二进制文件mongod
and mongos
, bind to localhost by default. mongod
和mongos
默认绑定到localhost。If the 如果为二进制文件设置了net.ipv6
configuration file setting or the --ipv6
command line option is set for the binary, the binary additionally binds to the localhost IPv6 address.net.ipv6
配置文件设置或--ipv6
命令行选项,则二进制文件将额外绑定到localhost ipv6地址。
By default 默认情况下,绑定到localhost的mongod
and mongos
that are bound to localhost only accept connections from clients that are running on the same computer. mongod
和mongos
只接受来自在同一台计算机上运行的客户端的连接。This binding behavior includes 这种绑定行为包括mongosh
and other members of your replica set or sharded cluster. mongosh
和复制集或分片集群的其他成员。Remote clients cannot connect to binaries that are bound only to localhost.远程客户端无法连接到仅绑定到localhost的二进制文件。
To override the default binding and bind to other IP addresses, use the 要覆盖默认绑定并绑定到其他IP地址,请使用net.bindIp
configuration file setting or the --bind_ip
command-line option to specify a list of hostnames or IP addresses.net.bindIp
配置文件设置或--bind_ip
命令行选项指定主机名或IP地址列表。
Starting in MongDB 5.0, split horizon DNS从MongDB 5.0开始,仅配置有IP地址的拆分域DNS节点无法通过启动验证并报告错误。 nodes that are only configured with an IP address fail startup validation and report an error.
See 请参阅disableSplitHorizonIPCheck
.disableSplitHorizonIPCheck
。
For example, the following 例如,以下mongod
instance binds to both the localhost and the hostname My-Example-Associated-Hostname
, which is associated with the IP address 198.51.100.1
:mongod
实例绑定到localhost和主机名My-Example-Associated-Hostname
,后者与IP地址198.51.100.1
关联:
mongod --bind_ip localhost,My-Example-Associated-Hostname
In order to connect to this instance, remote clients must specify the hostname or its associated IP address 为了连接到此实例,远程客户端必须指定主机名或其关联的IP地址198.51.100.1
:198.51.100.1
:
mongosh --host My-Example-Associated-Hostname
mongosh --host 198.51.100.1
Consider the following example of a bidirectional test of networking:考虑以下网络双向测试示例:
Given a replica set with three members running on three separate hosts:给定一个复制副本集,其中有三个成员在三个独立的主机上运行:
m1.example.net
m2.example.net
m3.example.net
All three use the default port 这三个端口都使用默认端口27017
.27017
。
Test the connection from使用以下操作集m1.example.net
to the other hosts with the following operation setm1.example.net
:m1.example.net
测试从m1.example.net
到其他主机的连接:mongosh --host m2.example.net --port 27017
mongosh --host m3.example.net --port 27017Test the connection from使用m2.example.net
to the other two hosts with the following operation set fromm2.example.net
, as in:m2.example.net
中设置的以下操作测试m2.example.net
与其他两台主机的连接,如中所示:mongosh --host m1.example.net --port 27017
mongosh --host m3.example.net --port 27017You have now tested the connection between现在,您已经在两个方向上测试了m2.example.net
andm1.example.net
in both directions.m2.example.net
和m1.example.net
之间的连接。Test the connection from使用来自m3.example.net
to the other two hosts with the following operation set from them3.example.net
host, as in:m3.example.net
主机的以下操作集测试从m3.example.net
到其他两个主机的连接,如中所示:mongosh --host m1.example.net --port 27017
mongosh --host m2.example.net --port 27017
If any connection, in any direction fails, check your networking and firewall configuration and reconfigure your environment to allow these connections.如果任何方向的任何连接失败,请检查网络和防火墙配置,并重新配置环境以允许这些连接。
Socket Exceptions when Rebooting More than One Secondary重新启动多个辅助时出现套接字异常
When you reboot members of a replica set, ensure that the set is able to elect a primary during the maintenance. 重新启动副本集的成员时,请确保该集能够在维护期间选择主副本集。This means ensuring that a majority of the set's 这意味着要确保该组members[n].votes
are available.members[n].votes
的大多数可用。
When a set's active members can no longer form a majority, the set's primary steps down and becomes a secondary. The primary does not close client connections when it steps down.当一个集合的活动成员不能再构成多数时,该集合的primary将退出并成为secondary。primary在关闭时不会关闭客户端连接。
Clients cannot write to the replica set until the members elect a new primary.在成员选择新的主副本之前,客户端无法写入副本集。
Given a three-member replica set where every member has one vote, the set can elect a primary if at least two members can connect to each other. 给定一个由三个成员组成的副本集,其中每个成员都有一票,如果至少有两个成员可以相互连接,则该集可以选择一个主要成员。If you reboot the two secondaries at once, the primary steps down and becomes a secondary. 如果同时重新启动两个辅助系统,则主系统将退出并成为辅助系统。Until at least another secondary becomes available, i.e. at least one of the rebooted secondaries also becomes available, the set has no primary and cannot elect a new primary.在至少有另一个辅助设备可用之前,即至少有一个重新启动的辅助设备也可用,该集没有主设备,无法选择新的主设备。
For more information on votes, see Replica Set Elections. For related information on connection errors, see Does TCP 有关投票的详细信息,请参阅副本集选举。有关连接错误的相关信息,请参阅TCPkeepalive
time affect MongoDB Deployments?.keepalive
时间是否影响MongoDB部署?。
Check the Size of the Oplog检查操作日志的大小
A larger oplog can give a replica set a greater tolerance for lag, and make the set more resilient.更大的oplog可以为副本集提供更大的滞后容忍度,并使副本集更有弹性。
To check the size of the oplog for a given replica set member, connect to the member in 要检查给定副本集成员的操作日志大小,请连接到mongosh
and run the rs.printReplicationInfo()
method.mongosh
中的成员并运行rs.printReplicationInfo()
方法。
The output displays the size of the oplog and the date ranges of the operations contained in the oplog. In the following example, the oplog is about 10 MB and is able to fit about 26 hours (94400 seconds) of operations:输出显示操作日志的大小以及操作日志中包含的操作的日期范围。在以下示例中,操作日志大约为10MB,并且能够适应大约26小时(94400秒)的操作:
configured oplog size: 10.10546875MB
log length start to end: 94400 (26.22hrs)
oplog first event time: Mon Mar 19 2012 13:50:38 GMT-0400 (EDT)
oplog last event time: Wed Oct 03 2012 14:59:10 GMT-0400 (EDT)
now: Wed Oct 03 2012 15:00:21 GMT-0400 (EDT)
The oplog should be long enough to hold all transactions for the longest downtime you expect on a secondary. 操作日志的长度应足以容纳所有事务,以便在辅助系统上度过最长的停机时间。[1] At a minimum, an oplog should be able to hold minimum 24 hours of operations; however, many users prefer to have 72 hours or even a week's work of operations.操作日志至少应能够容纳24小时的操作;然而,许多用户更喜欢72小时甚至一周的操作时间。
For more information on how oplog size affects operations, see:有关操作日志大小如何影响操作的更多信息,请参阅:
You normally want the oplog to be the same size on all members. If you resize the oplog, resize it on all members.您通常希望所有成员的操作日志大小相同。如果调整oplog的大小,请调整所有成员的大小。
To change oplog size, see the Change the Size of the Oplog tutorial.要更改操作日志大小,请参阅更改操作日志的大小教程。
[1] | majority commit point . |