Database Manual / Replication / High Availability

Replica Set Elections副本集选举

~~Replica sets use elections to determine which set member will become primary. Replica sets can trigger an election in response to a variety of events, such as:~~副本集使用选举来确定哪个集成员将成为primary。副本集可以触发选举以响应各种事件，例如：

~~Adding a new node to the replica set,~~向副本集添加新节点，
~~initiating a replica set,~~启动副本集，
performing replica set maintenance using methods such as rs.stepDown() or rs.reconfig(), and
the secondary members losing connectivity to the primary for more than the configured timeout (10 seconds by default).

In the following diagram, the primary node was unavailable for longer than the configured timeout and triggers the automatic failover process. One of the remaining secondaries calls for an election to select a new primary and automatically resume normal operations.

Diagram of an election of a new primary. In a three member replica set with two secondaries, the primary becomes unreachable. The loss of a primary triggers an election where one of the secondaries becomes the new primary

click to enlarge

The replica set cannot process write operations until the election completes successfully. The replica set can continue to serve read queries if such queries are configured to run on secondaries.

The median time before a cluster elects a new primary should not typically exceed 12 seconds, assuming default replica configuration settings. This includes time required to mark the primary as unavailable and call and complete an election. You can tune this time period by modifying the settings.electionTimeoutMillis replication configuration option. Factors such as network latency may extend the time required for replica set elections to complete, which in turn affects the amount of time your cluster may operate without a primary. These factors are dependent on your particular cluster architecture.

Your application connection logic should include tolerance for automatic failovers and the subsequent elections. MongoDB drivers can detect the loss of the primary and automatically retry certain write operations a single time, providing additional built-in handling of automatic failovers and elections:

~~Compatible drivers enable retryable writes by default~~默认情况下，兼容的驱动程序允许可重试的写入

Factors and Conditions that Affect Elections影响选举的因素和条件

Replication Election Protocol复制选择协议

Replication protocolVersion: 1 reduces replica set failover time and accelerate the detection of multiple simultaneous primaries.

You can use catchUpTimeoutMillis to prioritize between faster failovers and preservation of w:1 writes.

For more information on pv1, see Self-Managed Replica Set Protocol Version.

Heartbeats心跳

~~Replica set members send heartbeats (pings) to each other every two seconds. If a heartbeat does not return within 10 seconds, the other members mark the delinquent member as inaccessible.~~副本集成员每两秒发送一次心跳（ping）。如果心跳在10秒内没有恢复，其他成员会将违规成员标记为无法访问。

Member Priority会员优先级

After a replica set has a stable primary, the election algorithm will make a "best-effort" attempt to have the secondary with the highest priority available call an election. Member priority affects both the timing and the outcome of elections; secondaries with higher priority call elections relatively sooner than secondaries with lower priority, and are also more likely to win. However, a lower priority instance can be elected as primary for brief periods, even if a higher priority secondary is available. Replica set members continue to call elections until the highest priority member available becomes primary.

Members with a priority value of 0 cannot become primary and do not seek election. For details, see Priority 0 Replica Set Members.

Mirrored Reads

MongoDB provides mirrored reads to pre-warm electable secondary members' cache with the most recently accessed data. With mirrored reads, the primary can mirror a subset of operations that it receives and send them to a subset of electable secondaries. Pre-warming the cache of a secondary can help restore performance more quickly after an election.

For details, see Mirrored Reads.

Loss of a Data Center

With a distributed replica set, the loss of a data center may affect the ability of the remaining members in other data center or data centers to elect a primary.

If possible, distribute the replica set members across data centers to maximize the likelihood that even with a loss of a data center, one of the remaining replica set members can become the new primary.

Tip

Replica Sets Distributed Across Two or More Data Centers

Network Partition

A network partition may segregate a primary into a partition with a minority of nodes. When the primary detects that it can only see a minority of voting nodes in the replica set, the primary steps down and becomes a secondary. Independently, a member in the partition that can communicate with a majority of the voting nodes (including itself) holds an election to become the new primary.

Voting Members

The replica set member configuration setting members[n].votes and member state determine whether a member votes in an election.

All replica set members that have their members[n].votes setting equal to 1 vote in elections. To exclude a member from voting in an election, change the value of the member's members[n].votes configuration to 0.
- Non-voting (i.e. votes is 0) members must have priority of 0.
- Members with priority greater than 0 cannot have 0 votes.
Only voting members in the following states are eligible to vote:
- PRIMARY
- SECONDARY
- STARTUP2 (unless the member was newly added to the replica set)
- RECOVERING
- ARBITER
- ROLLBACK

Tip

Non-Voting Members

Although non-voting members do not vote in elections, these members hold copies of the replica set's data and can accept read operations from client applications.

Because a replica set can have up to 50 members, but only 7 voting members, non-voting members allow a replica set to have more than seven members.

Non-voting (i.e. votes is 0) members must have priority of 0.

For instance, the following nine-member replica set has seven voting members and two non-voting members.

Diagram of a 9 member replica set with the maximum of 7 voting members.

A non-voting member has both votes and priority equal to 0:

{
   "_id" : <num>,
   "host" : <hostname:port>,
   "arbiterOnly" : false,
   "buildIndexes" : true,
   "hidden" : false,
   "priority" : 0,
   "tags" : {

   },
   "secondaryDelaySecs" : Long(0),
   "votes" : 0
}

Important

Do not alter the number of votes to control which members will become primary. Instead, modify the members[n].priority option. Only alter the number of votes in exceptional cases. For example, to permit more than seven members.

~~To configure a non-voting member, see Configure a Non-Voting Self-Managed Replica Set Member.~~要配置无表决权成员，请参阅配置无表决性自我管理副本集成员。

Back

~~High Availability~~高可用性

~~Failover Rollbacks~~故障转移回滚