Distributed Local Writes for Insert Only Workloads仅插入工作负载的分布式本地写入

~~On this page~~本页内容

~~Scenario~~场景
~~Procedure~~程序

~~MongoDB Tag Aware Sharding allows administrators to control data distribution in a sharded cluster by defining ranges of the shard key and tagging them to one or more shards.~~MongoDB标记感知分片允许管理员通过定义分片键的范围并将其标记到一个或多个分片来控制分片集群中的数据分布。

This tutorial uses Zones along with a multi-datacenter sharded cluster deployment and application-side logic to support distributed local writes, as well as high write availability in the event of a replica set election or datacenter failure.本教程使用分区以及多数据中心分片群集部署和应用程序端逻辑来支持分布式本地写入，以及在副本集选择或数据中心故障时的高写入可用性。

Changed in version 4.0.3.在版本4.0.3中更改。

By defining the zones and the zone ranges before sharding an empty or a non-existing collection, the shard collection operation creates chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. 通过在对空集合或不存在的集合进行分片之前定义分区和分区范围，分片集合操作将为定义的分区范围创建块以及任何其他块，以覆盖整个分片键值范围，并基于分区范围执行初始块分布。~~This initial creation and distribution of chunks allows for faster setup of zoned sharding.~~ 块的初始创建和分布允许更快地设置分区分片。~~After the initial distribution, the balancer manages the chunk distribution going forward.~~在初始分发之后，平衡器继续管理区块分发。~~See Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection for an example.~~有关示例，请参阅空集合或不存在集合的预定义分区和分区范围。

~~Important~~重要

~~The concepts discussed in this tutorial require a specific deployment architecture, as well as application-level logic.~~本教程中讨论的概念需要特定的部署体系结构以及应用程序级逻辑。

~~These concepts require familiarity with MongoDB sharded clusters, replica sets, and the general behavior of zones.~~这些概念需要熟悉MongoDB分片集群、副本集和区域的一般行为。

~~This tutorial assumes an insert-only or insert-intensive workload.~~ 本教程假设只插入或插入密集型工作负载。~~The concepts and strategies discussed in this tutorial are not well suited for use cases that require fast reads or updates.~~本教程中讨论的概念和策略不太适合需要快速读取或更新的用例。

Scenario场景

~~Consider an insert-intensive application, where reads are infrequent and low priority compared to writes.~~ 考虑一个插入密集型应用程序，在该应用程序中，与写入相比，读取不频繁且优先级较低。~~The application writes documents to a sharded collection, and requires near-constant uptime from the database to support its SLAs or SLOs.~~该应用程序将文档写入分片集合，并且需要数据库近乎恒定的正常运行时间来支持其SLA或SLO。

~~The following represents a partial view of the format of documents the application writes to the database:~~以下是应用程序写入数据库的文档格式的局部视图：

{
   "_id" : ObjectId("56f08c447fe58b2e96f595fa"),
   "message_id" : 329620,
   "datacenter" : "alfa",
   "userid" : 123,
   ...
}
{
   "_id" : ObjectId("56f08c447fe58b2e96f595fb"),
   "message_id" : 578494,
   "datacenter" : "bravo",
   "userid" : 456,
   ...
}
{
   "_id" : ObjectId("56f08c447fe58b2e96f595fc"),
   "message_id" : 689979,
   "datacenter" : "bravo",
   "userid" : 789,
   ...
}

Shard Key分片键

~~The collection uses the { datacenter : 1, userid : 1 } compound index as the shard key.~~集合使用{ datacenter : 1, userid : 1 }复合索引作为分片键。

~~The datacenter field in each document allows for creating a tag range on each distinct datacenter value.~~ 每个文档中的数据中心字段允许在每个不同的datacenter值上创建标记范围。~~Without the datacenter field, it would not be possible to associate a document with a specific datacenter.~~如果没有datacenter字段，则无法将文档与特定的数据中心相关联。

~~The userid field provides a high cardinality and low frequency component to the shard key relative to datacenter.~~userid字段为分片键提供了相对于datacenter的高基数和低频率组件。

~~See Choosing a Shard Key for more general instructions on selecting a shard key.~~有关选择分片键的更多一般说明，请参阅选择分片键。

Architecture建筑学

~~The deployment consists of two datacenters, alfa and bravo. There are two shards, shard0000 and shard0001.~~ 部署包括两个数据中心，alfa和bravo。有两个分片，shard0000和shard0001。~~Each shard is a replica set with three members.~~ 每个分片是一个具有三个成员的副本集。~~shard0000 has two members on alfa and one priority 0 member on bravo.~~ shard0000在alfa上有两个成员，在bravo上有一个priority 0成员。~~shard0001 has two members on bravo and one priority 0 member on alfa.~~shard0001在bravo上有两个成员，在alfa上有一个priority 0成员。

Diagram of sharded cluster architecture for high availability

Write Operations写入操作

~~If an inserted or updated document matches a configured tag range, it can only be written to a shard with the related tag.~~如果插入或更新的文档与配置的标记范围匹配，则只能将其写入具有相关标记的分片。

~~MongoDB can write documents that do not match a configured tag range to any shard in the cluster.~~MongoDB可以将与配置的标记范围不匹配的文档写入集群中的任何分片。

~~Note~~注意

~~The behavior described above requires the cluster to be in a steady state with no chunks violating a configured tag range.~~ 上述行为要求集群处于稳定状态，没有违反配置标记范围的块。~~See the following section on the balancer for more information.~~有关更多信息，请参阅以下平衡器部分。

Balancer平衡器

~~The balancer migrates the tagged chunks to the appropriate shard.~~ 平衡器将标记的块迁移到适当的分片。~~Until the migration, shards may contain chunks that violate configured tag ranges and tags.~~ 在迁移之前，分片可能包含违反配置的标记范围和标记的块。~~Once balancing completes, shards should only contain chunks whose ranges do not violate its assigned tags and tag ranges.~~一旦平衡完成，分片应该只包含范围不违反其分配的标记和标记范围的块。

~~Adding or removing tags or tag ranges can result in chunk migrations.~~ 添加或删除标记或标记范围可能会导致区块迁移。~~Depending on the size of your data set and the number of chunks a tag range affects, these migrations may impact cluster performance.~~ 根据数据集的大小和标记范围影响的块数，这些迁移可能会影响群集性能。~~Consider running your balancer during specific scheduled windows.~~ 考虑在特定的计划窗口期间运行平衡器。~~See Schedule the Balancing Window for a tutorial on how to set a scheduling window.~~有关如何设置计划窗口的教程，请参阅计划平衡窗口。

Application Behavior应用程序行为

~~By default, the application writes to the nearest datacenter.~~ 默认情况下，应用程序会写入最近的数据中心。If the local datacenter is down, or if writes to that datacenter are not acknowledged within a set time period, the application switches to the other available datacenter by changing the value of the datacenter field before attempting to write the document to the database.如果本地数据中心关闭，或者在设置的时间段内未确认对该数据中心的写入，则应用程序在尝试将文档写入数据库之前，通过更改datacenter字段的值，切换到其他可用的数据中心。

~~The application supports write timeouts.~~ 应用程序支持写入超时。~~The application uses Write Concern to set a timeout for each write operation.~~应用程序使用写入关注点为每个写入操作设置超时。

~~If the application encounters a write or timeout error, it modifies the datacenter field in each document and performs the write.~~ 如果应用程序遇到写入或超时错误，它会修改每个文档中的datacenter字段并执行写入。~~This routes the document to the other datacenter.~~ 这会将文档路由到其他数据中心。~~If both datacenters are down, then writes cannot succeed.~~ 如果两个数据中心都已关闭，则写入无法成功。~~See Resolve Write Failure.~~请参阅解决写入失败。

~~The application periodically checks connectivity to any data centers marked as "down".~~ 应用程序定期检查与标记为“关闭”的任何数据中心的连接。~~If connectivity is restored, the application can continue performing normal write operations.~~如果连接恢复，应用程序可以继续执行正常的写入操作。

Given the switching logic, as well as any load balancers or similar mechanisms in place to handle client traffic between datacenters, the application cannot predict which of the two datacenters a given document was written to. 给定交换逻辑，以及任何负载平衡器或用于处理数据中心之间客户端流量的类似机制，应用程序无法预测给定文档写入到两个数据中心中的哪一个。~~To ensure that no documents are missed as a part of read operations, the application must perform broadcast queries by not including the datacenter field as a part of any query.~~为了确保在读取操作中不会丢失任何文档，应用程序必须执行广播查询，不将datacenter字段作为任何查询的一部分。

~~The application performs reads using a read preference of nearest to reduce latency.~~应用程序使用最接近的读取首选项执行读取，以减少延迟。

~~It is possible for a write operation to succeed despite a reported timeout error.~~ 尽管报告了超时错误，但写入操作仍有可能成功。~~The application responds to the error by attempting to re-write the document to the other datacenter - this can result in a document being duplicated across both datacenters.~~ 应用程序通过尝试将文档重新写入另一个数据中心来响应错误-这可能导致文档在两个数据中心之间重复。~~The application resolves duplicates as a part of the read logic.~~应用程序将重复项解析为读取逻辑的一部分。

Switching Logic开关逻辑

~~The application has logic to switch datacenters if one or more writes fail, or if writes are not acknowledged within a set time period.~~ 如果一个或多个写操作失败，或者在设置的时间段内未确认写操作，则应用程序具有切换数据中心的逻辑。~~The application modifies the datacenter field based on the target datacenter's tag to direct the document towards that datacenter.~~应用程序根据目标数据中心的标记修改datacenter字段，以将文档指向该数据中心。

~~For example, an application attempting to write to the alfa datacenter might follow this general procedure:~~例如，试图写入alfa数据中心的应用程序可能遵循以下一般过程：

~~Attempt to write document, specifying datacenter : alfa.~~尝试写入文档，指定datacenter : alfa。
On write timeout or error, log alfa as momentarily down.
Attempt to write same document, modifying datacenter : bravo.
On write timeout or error, log bravo as momentarily down.
If both alfa and bravo are down, log and report errors.

See Resolve Write Failure.

Procedure程序

Configure Shard Tags

You must be connected to a mongos associated with the target sharded cluster in order to proceed. You cannot create tags by connecting directly to a shard replica set member.

Tag each shard.

Tag each shard in the alfa data center with the alfa tag.

sh.addShardTag("shard0000", "alfa")

Tag each shard in the bravo data center with the bravo tag.

sh.addShardTag("shard0001", "bravo")

You can review the tags assigned to any given shard by running sh.status().

Define ranges for each tag.

Define the range for the alfa database and associate it to the alfa tag using the sh.addTagRange() method. This method requires:

The full namespace of the target collection.
The inclusive lower bound of the range.
The exclusive upper bound of the range.
The name of the tag.

sh.addTagRange(
  "<database>.<collection>",
  { "datacenter" : "alfa", "userid" : MinKey },
  { "datacenter" : "alfa", "userid" : MaxKey }, 
  "alfa"
)

Define the range for the bravo database and associate it to the bravo tag using the sh.addTagRange() method. This method requires:

The full namespace of the target collection.
The inclusive lower bound of the range.
The exclusive upper bound of the range.
The name of the tag.

sh.addTagRange(
  "<database>.<collection>",
  { "datacenter" : "bravo", "userid" : MinKey },
  { "datacenter" : "bravo", "userid" : MaxKey }, 
  "bravo"
)

The MinKey and MaxKey values are reserved special values for comparisons. MinKey always compares as less than every other possible value, while MaxKey always compares as greater than every other possible value. The configured ranges capture every user for each datacenter.

Review the changes.

The next time the balancer runs, it splits and migrates chunks across the shards respecting the tag ranges and tags.

Once balancing finishes, the shards tagged as alfa should only contain documents with datacenter : alfa, while shards tagged as bravo should only contain documents with datacenter : bravo.

You can review the chunk distribution by running sh.status().

Resolve Write Failure

When the application's default datacenter is down or inaccessible, the application changes the datacenter field to the other datacenter.

For example, the application attempts to write the following document to the alfa datacenter by default:

{
   "_id" : ObjectId("56f08c447fe58b2e96f595fa"),
   "message_id" : 329620,
   "datacenter" : "alfa",
   "userid" : 123,
   ...
}

If the application receives an error on attempted write, or if the write acknowledgement takes too long, the application logs the datacenter as unavailable and alters the datacenter field to point to the bravo datacenter.

{
   "_id" : ObjectId("56f08c457fe58b2e96f595fb"),
   "message_id" : 329620,
   "datacenter" : "bravo",
   "userid" : 123,
   ...
}

The application periodically checks the alfa datacenter for connectivity. If the datacenter is reachable again, the application can resume normal writes.

~~Note~~注意

It is possible that the original write to datacenter : alfa succeeded, especially if the error was related to a timeout. If so, the document with message_id : 329620 may now be duplicated across both datacenters. Applications must resolve duplicates as a part of read operations.

Resolve Duplicate Documents on Reads

The application's switching logic allows for potential document duplication. When performing reads, the application resolves any duplicate documents on the application layer.

The following query searches for documents where the userid is 123. Note that while userid is part of the shard key, the query does not include the datacenter field, and therefore does not perform a targeted read operation.

db.collection.find( { "userid" : 123 } )

The results show that the document with message_id of 329620 has been inserted into MongoDB twice, probably as a result of a delayed write acknowledgement.

{
  "_id" : ObjectId("56f08c447fe58b2e96f595fa"),
  "message_id" : 329620
  "datacenter" : "alfa",
  "userid" : 123,
  data : {...}
}
{
  "_id" : ObjectId("56f08c457fe58b2e96f595fb"),
  "message_id" : 329620
  "datacenter" : "bravo",
  "userid" : 123,
  ...
}

The application can either ignore the duplicates, taking one of the two documents, or it can attempt to trim the duplicates until only a single document remains.

One method for trimming duplicates is to use the ObjectId.getTimestamp() method to extract the timestamp from the _id field. The application can then keep either the first document inserted, or the last document inserted. This assumes the _id field uses the MongoDB ObjectId().

For example, using getTimestamp() on the document with ObjectId("56f08c447fe58b2e96f595fa") returns:

ISODate("2016-03-22T00:05:24Z")

Using getTimestamp() on the document with ObjectId("56f08c457fe58b2e96f595fb") returns:

ISODate("2016-03-22T00:05:25Z")

← Segmenting Data by Application or Customer Data Partitioning with Chunks →