On this page本页内容
moveChunk commit failed
This page describes common strategies for troubleshooting sharded cluster deployments.本页介绍用于排除分片群集部署故障的常见策略。
mongos
Instances Become Unavailablemongos
实例不可用If each application server has its own 如果每个应用服务器都有自己的mongos
instance, other application servers can continue to access the database. mongos
实例,则其他应用服务器可以继续访问数据库。Furthermore, 此外,mongos
instances do not maintain persistent state, and they can restart and become unavailable without losing any state or data. mongos
实例不保持持久状态,它们可以重新启动并变得不可用,而不会丢失任何状态或数据。When a 当mongos
instance starts, it retrieves a copy of the config database and can begin routing queries.mongos
实例启动时,它会检索配置数据库的副本,并可以开始路由查询。
Replica sets provide high availability for shards. 副本集为分片提供了高可用性。If the unavailable 如果不可用的mongod
is a primary, then the replica set will elect a new primary. mongod
是主服务器,则副本集将选举一个新的主服务器。If the unavailable 如果不可用的mongod
is a secondary, and it disconnects the primary and secondary will continue to hold all data. mongod
是辅助设备,并且它断开了主设备的连接,则辅助设备将继续保存所有数据。In a three member replica set, even if a single member of the set experiences catastrophic failure, two other members have full copies of the data. 在三个成员的副本集中,即使副本集中的单个成员发生灾难性故障,其他两个成员也拥有数据的完整副本。[1]
Always investigate availability interruptions and failures. 始终调查可用性中断和故障。If a system is unrecoverable, replace it and create a new member of the replica set as soon as possible to replace the lost redundancy.如果系统不可恢复,请更换它并尽快创建副本集的新成员,以替换丢失的冗余。
[1] | |
In a sharded cluster, 在分片集群中,mongod
and mongos
instances monitor the replica sets in the sharded cluster (e.g. shard replica sets, config server replica set).mongod
和mongos
实例监视分片集群中的副本集(例如,分片副本集、配置服务器副本集)。
If all members of a replica set shard are unavailable, all data held in that shard is unavailable. 如果副本集分片的所有成员都不可用,则该分片中保存的所有数据都不可用。However, the data on all other shards will remain available, and it is possible to read and write data to the other shards. 但是,所有其他分片上的数据将保持可用,并且可以读取和写入其他分片上的数据。However, your application must be able to deal with partial results, and you should investigate the cause of the interruption and attempt to recover the shard as soon as possible.但是,应用程序必须能够处理部分结果,您应该调查中断的原因,并尝试尽快恢复分片。
Replica sets provide high availability for the config servers. 副本集为配置服务器提供了高可用性。[2] If an unavailable config server is a primary, then the replica set will elect a new primary.如果不可用的配置服务器是主服务器,则副本集将选举一个新的主服务器。
If the replica set config server loses its primary and cannot elect a primary, the cluster's metadata becomes read only. 如果副本集配置服务器丢失了其主服务器,并且无法选择主服务器,则集群的元数据将变为只读。You can still read and write data from the shards, but no chunk migration or chunk splits will occur until a primary is available.您仍然可以读取和写入分片中的数据,但在主分片可用之前,不会发生块迁移或块拆分。
Distributing replica set members across two data centers provides benefit over a single data center. 跨两个数据中心分发副本集成员比单个数据中心更具优势。In a two data center distribution,在两个数据中心的分布中,
If possible, distribute members across at least three data centers. 如果可能,将成员分布在至少三个数据中心。For config server replica sets (CSRS), the best practice is to distribute across three (or more depending on the number of members) centers. 对于配置服务器副本集(CSR),最佳做法是跨三个(或更多,取决于成员数量)中心分发。If the cost of the third data center is prohibitive, one distribution possibility is to evenly distribute the data bearing members across the two data centers and store the remaining member in the cloud if your company policy allows.如果第三个数据中心的成本过高,一种分发可能性是在两个数据中心之间均匀分布承载数据的成员,并在公司政策允许的情况下将其余成员存储在云中。
All config servers must be running and available when you first initiate a sharded cluster.第一次启动分片集群时,所有配置服务器都必须运行且可用。
[2] | mongod instances (SCCC) as config servers is no longer supported.mongod 实例(SCCC)用作配置服务器。 |
A query returns the following warning when one or more of the 当一个或多个mongos
instances has not yet updated its cache of the cluster's metadata from the config database:mongos
实例尚未从配置数据库更新其集群元数据缓存时,查询将返回以下警告:
could not initialize cursor across all shards because : stale config detected
This warning should not propagate back to your application. 此警告不应传播回应用程序。The warning will repeat until all the 警告将重复,直到所有mongos
instances refresh their caches. mongos
实例刷新其缓存。To force an instance to refresh its cache, run the 要强制实例刷新其缓存,请运行flushRouterConfig
command.flushRouterConfig
命令。
To troubleshoot a shard key, see Troubleshoot Shard Keys.要对分片键进行故障排除,请参阅对分片键进行故障排除。
To ensure cluster availability:要确保群集可用性,请执行以下操作:
mongod
instance fails, the replica set members will elect another to be primary and continue operation. mongod
实例失败,副本集成员将选择另一个作为主实例并继续操作。mongos
to isolate most operations to a single shard. mongos
将大多数操作隔离到单个shard。Changed in version 3.2.在版本3.2中更改。
Starting in MongoDB 3.2, config servers can be deployed as replica sets. 从MongoDB 3.2开始,可以将配置服务器部署为副本集。The 分片集群的mongos
instances for the sharded cluster must specify the same config server replica set name but can specify hostname and port of different members of the replica set.mongos
实例必须指定相同的配置服务器副本集名称,但可以指定副本集不同成员的主机名和端口。
Starting in 3.4, the use of the deprecated mirrored 从3.4开始,不再支持将已弃用的镜像mongod
instances as config servers (SCCC) is no longer supported. mongod
实例用作配置服务器(SCCC)。Before you can upgrade your sharded clusters to 3.4, you must convert your config servers from SCCC to CSRS.在将分片集群升级到3.4之前,必须将配置服务器从SCCC转换为CSR。
To convert your config servers from SCCC to CSRS, see the MongoDB 3.4 manual Upgrade Config Servers to Replica Set.要将配置服务器从SCCC转换为CSR,请参阅MongoDB 3.4手动将配置服务器升级到副本集。
With earlier versions of MongoDB sharded clusters that use the topology of three mirrored 对于早期版本的MongoDB分片集群,它们使用配置服务器的三个镜像mongod
instances for config servers, mongos
instances in a sharded cluster must specify identical configDB
string.mongod
实例的拓扑结构,分片集群中的mongos
实例必须指定相同的configDB
字符串。
Use CNAMEs to identify your config servers to the cluster so that you can rename and renumber your config servers without downtime.使用CNAMEs在集群中标识配置服务器,以便在不停机的情况下重命名和重新编号配置服务器。
moveChunk commit failed
At the end of a chunk migration, the shard must connect to the config database to update the chunk's record in the cluster metadata. 在区块迁移结束时,分片必须连接到config
数据库,以更新集群元数据中区块的记录。If the shard fails to connect to the config database, MongoDB reports the following error:如果分片无法连接到配置数据库,MongoDB将报告以下错误:
ERROR: moveChunk commit failed: version is at <n>|<nn> instead of <N>|<NN>" and "ERROR: TERMINATING"
When this happens, the primary member of the shard's replica set then terminates to protect data consistency. 发生这种情况时,分片副本集的主要成员将终止以保护数据一致性。If a secondary member can access the config database, data on the shard becomes accessible again after an election.如果辅助成员可以访问配置数据库,则在选择后可以再次访问分片上的数据。
The user will need to resolve the chunk migration failure independently. 用户需要独立解决区块迁移故障。If you encounter this issue, ask the MongoDB Community or MongoDB Support to address this issue.如果遇到此问题,请向MongoDB社区或MongoDB支持人员咨询以解决此问题。