Docs HomeMongoDB Manual

Distributed Local Writes for Insert Only Workloads仅插入工作负载的分布式本地写入

On this page本页内容

MongoDB Tag Aware Sharding allows administrators to control data distribution in a sharded cluster by defining ranges of the shard key and tagging them to one or more shards.MongoDB Tag Aware 分片允许管理员通过定义分片键的范围并将其标记到一个或多个分片来控制分片集群中的数据分布。

This tutorial uses Zones along with a multi-datacenter sharded cluster deployment and application-side logic to support distributed local writes, as well as high write availability in the event of a replica set election or datacenter failure.本教程使用Zones以及多数据中心分片集群部署和应用程序端逻辑来支持分布式本地写入,以及在副本集选择或数据中心故障时的高写入可用性。

By defining the zones and the zone ranges before sharding an empty or a non-existing collection, the shard collection operation creates chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. 通过在对空集合或不存在的集合进行分片之前定义区域和区域范围,分片集合操作为定义的区域范围以及任何额外的块创建块,以覆盖分片键值的整个范围,并基于区域范围执行初始块分布。This initial creation and distribution of chunks allows for faster setup of zoned sharding. 这种块的初始创建和分布允许更快地设置分区分片。After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,平衡器管理接下来的块分发。

See Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection for an example.有关示例,请参阅空集合或不存在集合的预定义分区和分区范围

Important

The concepts discussed in this tutorial require a specific deployment architecture, as well as application-level logic.本教程中讨论的概念需要特定的部署体系结构以及应用程序级逻辑。

These concepts require familiarity with MongoDB sharded clusters, replica sets, and the general behavior of zones.这些概念需要熟悉MongoDB分片集群副本集区域的一般行为。

This tutorial assumes an insert-only or insert-intensive workload. 本教程假定仅插入或插入密集型工作负载。The concepts and strategies discussed in this tutorial are not well suited for use cases that require fast reads or updates.本教程中讨论的概念和策略不太适合需要快速阅读或更新的用例。

Scenario情形

Consider an insert-intensive application, where reads are infrequent and low priority compared to writes. 考虑一个插入密集型应用程序,在该应用程序中,读取次数很少,而且与写入相比优先级较低。The application writes documents to a sharded collection, and requires near-constant uptime from the database to support its SLAs or SLOs.应用程序将文档写入到分片集合中,并且需要数据库几乎恒定的正常运行时间来支持其SLA或SLO。

The following represents a partial view of the format of documents the application writes to the database:以下是应用程序写入数据库的文档格式的部分视图:

{
"_id" : ObjectId("56f08c447fe58b2e96f595fa"),
"message_id" : 329620,
"datacenter" : "alfa",
"userid" : 123,
...
}
{
"_id" : ObjectId("56f08c447fe58b2e96f595fb"),
"message_id" : 578494,
"datacenter" : "bravo",
"userid" : 456,
...
}
{
"_id" : ObjectId("56f08c447fe58b2e96f595fc"),
"message_id" : 689979,
"datacenter" : "bravo",
"userid" : 789,
...
}

Shard Key分片键

The collection uses the { datacenter : 1, userid : 1 } compound index as the shard key.该集合使用{ datacenter : 1, userid : 1 }复合索引作为分片键

The datacenter field in each document allows for creating a tag range on each distinct datacenter value. 每个文档中的datacenter字段允许在每个不同的数据中心值上创建一个标记范围。Without the datacenter field, it would not be possible to associate a document with a specific datacenter.如果没有datacenter字段,就不可能将文档与特定的数据中心相关联。

The userid field provides a high cardinality and low frequency component to the shard key relative to datacenter.userid字段为分片键提供了相对于数据中心的高基数和低频率分量。

See Choosing a Shard Key for more general instructions on selecting a shard key.有关选择分片关键点的更多常规说明,请参阅选择分片关键帧

Architecture策略

The deployment consists of two datacenters, alfa and bravo. 部署由两个数据中心组成,alfabravoThere are two shards, shard0000 and shard0001. 有两个分片,shard0000shard0001Each shard is a replica set with three members. 每个分片都是一个由三个成员组成的复制集shard0000 has two members on alfa and one priority 0 member on bravo. shard0000alfa上有两个成员,在bravo上有一个优先级为0的成员shard0001 has two members on bravo and one priority 0 member on alfa.shard0001bravo上有两个成员,在alfa上有一个优先级为0的成员。

Diagram of sharded cluster architecture for high availability

Tags标记

This application requires one tag per datacenter. 此应用程序要求每个数据中心有一个标记。Each shard has one tag assigned to it based on the datacenter containing the majority of its replica set members. 每个分片都有一个标签,根据包含其大多数副本集成员的数据中心分配给它。There are two tag ranges, one for each datacenter.有两个标记范围,每个数据中心一个。

alfa Datacenter数据中心

Tag shards with a majority of members on this datacenter as alfa.将此数据中心上大多数成员的分片标记为alfa

Create a tag range with:使用以下内容创建标记范围:

  • a lower bound of { "datacenter" : "alfa", "userid" : MinKey },下边界为{ "datacenter" : "alfa", "userid" : MinKey }
  • an upper bound of { "datacenter" : "alfa", "userid" : MaxKey }, and上边界为{ "datacenter" : "alfa", "userid" : MaxKey },并且
  • the tag alfa标记为alfa
bravo Datacenter数据中心

Tag shards with a majority of members on this datacenter as bravo.将此数据中心上大多数成员的分片标记为bravo

Create a tag range with:使用以下内容创建标记范围:

  • a lower bound of { "datacenter" : "bravo", "userid" : MinKey },上边界为{ "datacenter" : "bravo", "userid" : MinKey }
  • an upper bound of { "datacenter" : "bravo", "userid" : MaxKey }, and上边界为{ "datacenter" : "bravo", "userid" : MaxKey },并且
  • the tag 标记为bravo
Note

The MinKey and MaxKey values are reserved special values for comparisonsMinKeyMaxKey值是为进行比较而保留的特殊值

Based on the configured tags and tag ranges, mongos routes documents with datacenter : alfa to the alfa datacenter, and documents with datacenter : bravo to the bravo datacenter.根据配置的标签和标签范围,mongos将具有datacenter : alfa的文档路由到alfa数据中心,将具有datacenter : bravo的文档路由至bravo数据中心。

Write Operations写入操作

If an inserted or updated document matches a configured tag range, it can only be written to a shard with the related tag.如果插入或更新的文档与配置的标记范围匹配,则只能将其写入具有相关标记的分片。

MongoDB can write documents that do not match a configured tag range to any shard in the cluster.MongoDB可以将与配置的标记范围不匹配的文档写入集群中的任何分片。

Note

The behavior described above requires the cluster to be in a steady state with no chunks violating a configured tag range. 上面描述的行为要求集群处于稳定状态,没有块违反配置的标签范围。See the following section on the balancer for more information.有关详细信息,请参阅平衡器的以下部分。

Balancer平衡器

The balancer migrates the tagged chunks to the appropriate shard. 平衡器将标记的块迁移到适当的分片。Until the migration, shards may contain chunks that violate configured tag ranges and tags. 在迁移之前,分片可能包含违反配置的标签范围和标签的块。Once balancing completes, shards should only contain chunks whose ranges do not violate its assigned tags and tag ranges.一旦平衡完成,分片应该只包含其范围不违反其分配的标记和标记范围的块。

Adding or removing tags or tag ranges can result in chunk migrations. 添加或删除标记或标记范围可能导致区块迁移。Depending on the size of your data set and the number of chunks a tag range affects, these migrations may impact cluster performance. 根据数据集的大小和标记范围影响的块数,这些迁移可能会影响集群性能。Consider running your balancer during specific scheduled windows. 请考虑在特定的计划窗口期间运行平衡器See Schedule the Balancing Window for a tutorial on how to set a scheduling window.有关如何设置计划窗口的教程,请参阅计划平衡窗口

Application Behavior应用程序行为

By default, the application writes to the nearest datacenter. 默认情况下,应用程序会写入最近的数据中心。If the local datacenter is down, or if writes to that datacenter are not acknowledged within a set time period, the application switches to the other available datacenter by changing the value of the datacenter field before attempting to write the document to the database.如果本地数据中心关闭,或者在设置的时间段内未确认对该数据中心的写入,则在尝试将文档写入数据库之前,应用程序将通过更改datacenter字段的值切换到其他可用的数据中心。

The application supports write timeouts. 应用程序支持写入超时。The application uses Write Concern to set a timeout for each write operation.应用程序使用写入关注为每个写入操作设置超时

If the application encounters a write or timeout error, it modifies the datacenter field in each document and performs the write. 如果应用程序遇到写入或超时错误,它会修改每个文档中的datacenter字段并执行写入。This routes the document to the other datacenter. 这会将文档路由到其他数据中心。If both datacenters are down, then writes cannot succeed. 如果两个数据中心都关闭,那么写入就无法成功。See Resolve Write Failure.请参阅解决写入故障

The application periodically checks connectivity to any data centers marked as "down". 应用程序定期检查与任何标记为“关闭”的数据中心的连接。If connectivity is restored, the application can continue performing normal write operations.如果恢复了连接,则应用程序可以继续执行正常的写入操作。

Given the switching logic, as well as any load balancers or similar mechanisms in place to handle client traffic between datacenters, the application cannot predict which of the two datacenters a given document was written to. 给定交换逻辑,以及任何负载均衡器或类似机制来处理数据中心之间的客户端流量,应用程序无法预测给定文档被写入到两个数据中心中的哪一个。To ensure that no documents are missed as a part of read operations, the application must perform broadcast queries by not including the datacenter field as a part of any query.为了确保在读取操作中不会遗漏任何文档,应用程序必须执行广播查询datacenter字段作为任何查询的一部分。

The application performs reads using a read preference of nearest to reduce latency.应用程序使用nearest读取偏好执行读取,以减少延迟。

It is possible for a write operation to succeed despite a reported timeout error. 尽管报告了超时错误,写入操作仍有可能成功。The application responds to the error by attempting to re-write the document to the other datacenter - this can result in a document being duplicated across both datacenters. 应用程序通过尝试将文档重写到另一个数据中心来响应错误,这可能导致文档在两个数据中心之间重复。The application resolves duplicates as a part of the read logic.应用程序将重复项解析为读取逻辑的一部分。

Switching Logic切换逻辑

The application has logic to switch datacenters if one or more writes fail, or if writes are not acknowledged within a set time period. 如果一个或多个写入失败,或者在设置的时间段内未确认写入,应用程序具有切换数据中心的逻辑。The application modifies the datacenter field based on the target datacenter's tag to direct the document towards that datacenter.应用程序根据目标数据中心的标记修改datacenter字段,以将文档指向该数据中心。

For example, an application attempting to write to the alfa datacenter might follow this general procedure:例如,试图写入alfa数据中心的应用程序可能会遵循以下常规过程:

  1. Attempt to write document, specifying datacenter : alfa.尝试写入文档,指定datacenter : alfa
  2. On write timeout or error, log alfa as momentarily down.在写入超时或出现错误时,将alfa记录为暂时关闭。
  3. Attempt to write same document, modifying datacenter : bravo.尝试编写相同的文档,修改datacenter : bravo
  4. On write timeout or error, log bravo as momentarily down.在写入超时或出现错误时,将bravo记录为暂时关闭。
  5. If both alfa and bravo are down, log and report errors.如果alfabravo都关闭,请记录并报告错误。

See Resolve Write Failure.请参阅解决写入故障

Procedure过程

Configure Shard Tags配置分片标记

You must be connected to a mongos associated with the target sharded cluster in order to proceed. 您必须连接到与目标分片集群相关联的mongos才能继续。You cannot create tags by connecting directly to a shard replica set member.不能通过直接连接到分片复制集成员来创建标记。

1

Tag each shard.标记每个分片。

Tag each shard in the alfa data center with the alfa tag.使用alfa标签标记alfa数据中心中的每个分片。

sh.addShardTag("shard0000", "alfa")

Tag each shard in the bravo data center with the bravo tag.使用bravo标签标记bravo数据中心中的每个分片。

sh.addShardTag("shard0001", "bravo")

You can review the tags assigned to any given shard by running sh.status().您可以通过运行sh.status()来查看分配给任何给定分片的标记。

2

Define ranges for each tag.定义每个标记的范围。

Define the range for the alfa database and associate it to the alfa tag using the sh.addTagRange() method. 定义alfa数据库的范围,并使用sh.addTagRange()方法将其与alfa标记关联。This method requires:此方法需要:

  • The full namespace of the target collection.目标集合的完整命名空间。
  • The inclusive lower bound of the range.范围的包含下限。
  • The exclusive upper bound of the range.范围的唯一上限。
  • The name of the tag.标记的名称。
sh.addTagRange( 
"<database>.<collection>",
{ "datacenter" : "alfa", "userid" : MinKey },
{ "datacenter" : "alfa", "userid" : MaxKey },
"alfa"
)

Define the range for the bravo database and associate it to the bravo tag using the sh.addTagRange() method. 定义bravo数据库的范围,并使用sh.addTagRange()方法将其与bravo标记关联。This method requires:此方法需要:

  • The full namespace of the target collection.目标集合的完整命名空间。
  • The inclusive lower bound of the range.范围的包含下限。
  • The exclusive upper bound of the range.范围的唯一上限。
  • The name of the tag.标记的名称。
sh.addTagRange( 
"<database>.<collection>",
{ "datacenter" : "bravo", "userid" : MinKey },
{ "datacenter" : "bravo", "userid" : MaxKey },
"bravo"
)

The MinKey and MaxKey values are reserved special values for comparisons. MinKeyMaxKey值是为进行比较而保留的特殊值。MinKey always compares as less than every other possible value, while MaxKey always compares as greater than every other possible value. MinKey总是比其他可能的值小,而MaxKey总是比每个其他可能的值更大。The configured ranges capture every user for each datacenter.配置的范围捕获每个datacenter的每个用户。

3

Review the changes.查看更改。

The next time the balancer runs, it migrates data across the shards respecting the configured zones.下次平衡器运行时,它会根据配置的区域在分片之间迁移数据。

Once balancing finishes, the shards tagged as alfa should only contain documents with datacenter : alfa, while shards tagged as bravo should only contain documents with datacenter : bravo.一旦平衡完成,标记为alfa的分片应该只包含datacenter:alfa的文档,而标记为bravo的分片应该仅包含datacentr:bravo的文档。

You can review the chunk distribution by running sh.status().您可以通过运行sh.status()来查看区块分布。

Resolve Write Failure解决写入故障

When the application's default datacenter is down or inaccessible, the application changes the datacenter field to the other datacenter.当应用程序的默认数据中心关闭或无法访问时,应用程序会将datacenter字段更改为其他数据中心。

For example, the application attempts to write the following document to the alfa datacenter by default:例如,默认情况下,应用程序会尝试将以下文档写入alfa数据中心:

{
"_id" : ObjectId("56f08c447fe58b2e96f595fa"),
"message_id" : 329620,
"datacenter" : "alfa",
"userid" : 123,
...
}

If the application receives an error on attempted write, or if the write acknowledgement takes too long, the application logs the datacenter as unavailable and alters the datacenter field to point to the bravo datacenter.如果应用程序在尝试写入时收到错误,或者写入确认耗时过长,则应用程序会将数据中心记录为不可用,并更改datacenter字段以指向bravo数据中心。

{
"_id" : ObjectId("56f08c457fe58b2e96f595fb"),
"message_id" : 329620,
"datacenter" : "bravo",
"userid" : 123,
...
}

The application periodically checks the alfa datacenter for connectivity. 应用程序定期检查alfa数据中心的连接情况。If the datacenter is reachable again, the application can resume normal writes.如果数据中心可以再次访问,则应用程序可以恢复正常写入。

Note

It is possible that the original write to datacenter : alfa succeeded, especially if the error was related to a timeout. 有可能原始写入datacenter : alfa成功,特别是当错误与超时有关时。If so, the document with message_id : 329620 may now be duplicated across both datacenters. 如果是这样,那么message_id : 329620的文档现在可以在两个数据中心中复制。Applications must resolve duplicates as a part of read operations.作为读取操作的一部分,应用程序必须解析重复项。

Resolve Duplicate Documents on Reads读取时解决重复文档

The application's switching logic allows for potential document duplication. 应用程序的切换逻辑允许潜在的文档重复。When performing reads, the application resolves any duplicate documents on the application layer.执行读取时,应用程序会解析应用程序层上的任何重复文档。

The following query searches for documents where the userid is 123. 以下查询搜索userid123的文档。Note that while userid is part of the shard key, the query does not include the datacenter field, and therefore does not perform a targeted read operation.请注意,虽然userid是分片键的一部分,但查询不包括datacenter字段,因此不执行定向读取操作

db.collection.find( { "userid" : 123 } )

The results show that the document with message_id of 329620 has been inserted into MongoDB twice, probably as a result of a delayed write acknowledgement.结果显示,message_id329620的文档已经被插入MongoDB两次,可能是由于写入确认延迟。

{
"_id" : ObjectId("56f08c447fe58b2e96f595fa"),
"message_id" : 329620
"datacenter" : "alfa",
"userid" : 123,
data : {...}
}
{
"_id" : ObjectId("56f08c457fe58b2e96f595fb"),
"message_id" : 329620
"datacenter" : "bravo",
"userid" : 123,
...
}

The application can either ignore the duplicates, taking one of the two documents, or it can attempt to trim the duplicates until only a single document remains.应用程序可以忽略重复项,获取两个文档中的一个,也可以尝试修剪重复项,直到只剩下一个文档。

One method for trimming duplicates is to use the ObjectId.getTimestamp() method to extract the timestamp from the _id field. 修剪重复项的一种方法是使用ObjectId.getTimestamp()方法从_id字段中提取时间戳。The application can then keep either the first document inserted, or the last document inserted. 然后,应用程序可以保持插入的第一个文档或插入的最后一个文档。This assumes the _id field uses the MongoDB ObjectId().这假设_id字段使用MongoDB ObjectId()

For example, using getTimestamp() on the document with ObjectId("56f08c447fe58b2e96f595fa") returns:例如,对ObjectId("56f08c447fe58b2e96f595fa")的文档使用getTimestamp()返回:

ISODate("2016-03-22T00:05:24Z")

Using getTimestamp() on the document with ObjectId("56f08c457fe58b2e96f595fb") returns:ObjectId("56f08c457fe58b2e96f595fb")的文档使用getTimestamp()返回:

ISODate("2016-03-22T00:05:25Z")