Segmenting Data by Location按位置分割数据
On this page本页内容
In sharded clusters, you can create zones of sharded data based on the shard key. 在分片集群中,可以根据分片键创建分片数据区域。You can associate each zone with one or more shards in the cluster. 您可以将每个区域与集群中的一个或多个分片相关联。A shard can associate with any number of zones. 分片可以与任意数量的区域关联。In a balanced cluster, MongoDB migrates chunks covered by a zone only to those shards associated with the zone.在一个平衡的集群中,MongoDB只将一个区域覆盖的区块迁移到与该区域关联的分片中。
By defining the zones and the zone ranges before sharding an empty or a non-existing collection, the shard collection operation creates chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. 通过在对空集合或不存在的集合进行分片之前定义区域和区域范围,分片集合操作为定义的区域范围以及任何额外的块创建块,以覆盖分片键值的整个范围,并基于区域范围执行初始块分布。This initial creation and distribution of chunks allows for faster setup of zoned sharding. After the initial distribution, the balancer manages the chunk distribution going forward.这种块的初始创建和分布允许更快地设置分区分片。在初始分发之后,平衡器管理接下来的块分发。
See Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection for an example.有关示例,请参阅空集合或不存在集合的预定义分区和分区范围。
This tutorial uses Zones to segment data based on a geographic area.本教程使用分区根据地理区域对数据进行分段。
The following are some example use cases for segmenting data by geographic area:以下是按地理区域分割数据的一些示例用例:
An application that has to segment user data by country必须按国家/地区划分用户数据的应用程序A database that has to allocate resources by country必须按国家分配资源的数据库
The following diagram illustrates a sharded cluster that uses geographic zones to manage and satisfy data segmentation requirements.下图展示了一个分片集群,该集群使用地理区域来管理和满足数据分割需求。
Scenario情形
A financial chat application logs messages, tracking the country of the originating user. 金融聊天应用程序记录消息,跟踪发起用户的国家/地区。The application stores the logs in the 该应用程序将日志存储在chat
database under the messages
collection. chat
数据库中的messages
集合下。The chats contain information that must be segmented by country to have servers local to the country serve read and write requests for the country's users. 聊天包含的信息必须按国家/地区进行细分,才能让国家/地区的本地服务器为国家/地区用户提供读写请求。A group of countries can be assigned same zone in order to share resources.为了共享资源,一组国家可以被分配到同一区域。
The application currently has users in the US, UK, and Germany. 该应用程序目前在美国、英国和德国都有用户。The country
field represents the user's country based on its ISO 3166-1 Alpha-2 two-character country codes.
country
字段代表用户基于ISO 3166-1 Alpha-2两个字符国家代码的国家/地区。
The following documents represent a partial view of three chat messages:以下文档表示三条聊天消息的部分视图:
{
"_id" : ObjectId("56f08c447fe58b2e96f595fa"),
"country" : "US",
"userid" : 123,
"message" : "Hello there",
...,
}
{
"_id" : ObjectId("56f08c447fe58b2e96f595fb"),
"country" : "UK",
"userid" : 456,
"message" : "Good Morning"
...,
}
{
"_id" : ObjectId("56f08c447fe58b2e96f595fc"),
"country" : "DE",
"userid" : 789,
"message" : "Guten Tag"
...,
}
Shard Key分片键
The messages
collection uses the { country : 1, userid : 1 }
compound index as the shard key.messages
集合使用{ country : 1, userid : 1 }
复合索引作为分片键。
The 每个文档中的country
field in each document allows for creating a zone for each distinct country value.country
字段允许为每个不同的country
值创建一个区域。
The userid
field provides a high cardinality and low frequency component to the shard key relative to country
.userid
字段为分片键提供了相对于country
的高基数和低频率分量。
See Choosing a Shard Key for more general instructions on selecting a shard key.有关选择分片关键点的更多常规说明,请参阅选择分片键。
Architecture建筑学
The sharded cluster has shards in two data centers - one in Europe, and one in North America.分片集群在两个数据中心有分片——一个在欧洲,一个在北美。
Zones分区
This application requires one zone per data center.此应用程序要求每个数据中心有一个区域。
EU
-European data center欧洲数据中心-
Shards deployed on this data center are assigned to the部署在此数据中心上的分片被分配到EU
zone.EU
区域。For each country using the对于使用EU
data center for local reads and writes, create a zone range for theEU
zone with:EU
(欧盟)数据中心进行本地读写的每个国家,为EU
区域创建一个区域范围,其中包括:a lower bound of{ "country" : <country>, "userid" : MinKey }
{ "country" : <country>, "userid" : MinKey }
的下边界an upper bound of{ "country" : <country>, "userid" : MaxKey }
{ "country" : <country>, "userid" : MaxKey }
的上边界
NA
-North American data center北美数据中心-
Shards deployed on this data center are assigned to the部署在此数据中心上的分片被分配到NA
zone.NA
区域。For each country using the对于使用NA
data center for local reads and writes, create a zone range for theNA
zone with:NA
数据中心进行本地读写的每个国家/地区,为NA
区域创建一个区域范围,包括:- a lower bound of
{ "country" : <country>, "userid" : MinKey }
- an upper bound of
{ "country" : <country>, "userid" : MaxKey }
- a lower bound of
Write Operations写入操作
With zones, if an inserted or updated document matches a configured zone, it can only be written to a shard inside of that zone.对于区域,如果插入或更新的文档与配置的区域匹配,则只能将其写入该区域内的分片。
MongoDB can write documents that do not match a configured zone to any shard in the cluster.MongoDB可以编写与集群中任何分片的配置区域不匹配的文档。
Read Operations读取操作
MongoDB can route queries to a specific shard if the query includes at least the 如果查询至少包含country
field.country
字段,MongoDB可以将查询路由到特定的分片。
For example, MongoDB can attempt a targeted read operation on the following query:例如,MongoDB可以对以下查询尝试定向读取操作:
chatDB = db.getSiblingDB("chat")
chatDB.messages.find( { "country" : "UK" , "userid" : "123" } )
Queries without the 不带country
field perform broadcast operations.country
字段的查询执行广播操作。
Balancer平衡器
The balancer migrates chunks to the appropriate shard respecting any configured zones. 平衡器根据任何配置的区域将块迁移到适当的分片。Until the migration, shards may contain chunks that violate configured zones. 在迁移之前,分片可能包含违反已配置区域的块。Once balancing completes, shards should only contain chunks whose ranges do not violate its assigned zones.一旦平衡完成,分片应该只包含范围不违反其指定区域的块。
Adding or removing zones or zone ranges can result in chunk migrations. 添加或删除区域或区域范围可能导致区块迁移。Depending on the size of your data set and the number of chunks a zone or zone range affects, these migrations may impact cluster performance. 根据数据集的大小以及区域或区域范围影响的块数,这些迁移可能会影响群集性能。Consider running your balancer during specific scheduled windows. 请考虑在特定的计划窗口期间运行平衡器。See Schedule the Balancing Window for a tutorial on how to set a scheduling window.有关如何设置计划窗口的教程,请参阅计划平衡窗口。
Security安全
For sharded clusters running with Role-Based Access Control, authenticate as a user with at least the 对于使用基于角色的访问控制运行的分片集群,在clusterManager
role on the admin
database.admin
数据库中至少使用clusterManager
角色进行身份验证。
Procedure过程
You must be connected to a 您必须连接到mongos
to create zones and zone ranges. mongos
才能创建区域和区域范围。You cannot create zones or zone ranges by connecting directly to a shard.无法通过直接连接到分片来创建区域或区域范围。
Disable the Balancer (Optional)禁用平衡器(可选)
To reduce performance impacts, the balancer may be disabled on the collection to ensure no migrations take place while configuring the new zones.为了减少对性能的影响,可以在集合上禁用平衡器,以确保在配置新区域时不会发生迁移。
Use 使用sh.disableBalancing()
, specifying the namespace of the collection, to stop the balancer.sh.disableBalancing()
,指定集合的命名空间,以停止平衡器。
sh.disableBalancing("chat.message")
Use 使用sh.isBalancerRunning()
to check if the balancer process is currently running. sh.isBalancerRunning()
检查平衡器进程当前是否正在运行。Wait until any current balancing rounds have completed before proceeding.等待所有当前平衡轮次完成后再继续。
Add each shard to the appropriate zone将每个分片添加到适当的区域
Add each shard in the North American data center to the 将北美数据中心中的每个分片添加到NA
zone.NA
区域。
sh.addShardTag(<shard name>, "NA")
Add each shard in the European data center to the 将欧洲数据中心中的每个分片添加到EU
zone.EU
区域。
sh.addShardTag(<shard name>, "EU")
You can review the zones assigned to any given shard by running 您可以通过运行sh.status()
.sh.status()
来查看分配给任何给定分片的区域。
Define ranges for each zone定义每个分区的范围
For shard key values where 对于country : US
, define a shard key range and associate it to the NA
zone using the sh.addTagRange()
method. country : US
的分片键值,定义一个分片键值范围,并使用sh.addTagRange()
方法将其关联到NA
区域。This method requires:此方法需要:
The full namespace of the target collection.目标集合的完整命名空间。The inclusive lower bound of the range.范围的包含下限。The exclusive upper bound of the range.范围的唯一上限。The name of the zone.区域的名称。
sh.addTagRange(
"chat.messages",
{ "country" : "US", "userid" : MinKey },
{ "country" : "US", "userid" : MaxKey },
"NA"
)
For shard key values where 对于country : UK
, define a shard key range and associate it to the EU
zone using the sh.addTagRange()
method. country : UK
的分片键值,定义一个分片键值范围,并使用sh.addTagRange()
方法将其与欧盟区域关联。This method requires:此方法需要:
The full namespace of the target collection.目标集合的完整命名空间。The inclusive lower bound of the range.范围的包含下限。The exclusive upper bound of the range.范围的唯一上限。The name of the zone.区域的名称。
sh.addTagRange(
"chat.messages",
{ "country" : "UK", "userid" : MinKey },
{ "country" : "UK", "userid" : MaxKey },
"EU"
)
For shard key values where 对于country : DE
, define a shard key range and associate it to the EU
zone using the sh.addTagRange()
method. country : DE
所在的分片键值,定义一个分片键范围,并使用sh.addTagRange()
方法将其关联到EU
区域。This method requires:此方法需要:
The full namespace of the target collection.目标集合的完整命名空间。The inclusive lower bound of the range.范围的包含下限。The exclusive upper bound of the range.范围的唯一上限。The name of the zone.区域的名称。
sh.addTagRange(
"chat.messages",
{ "country" : "DE", "userid" : MinKey },
{ "country" : "DE", "userid" : MaxKey },
"EU"
)
The MinKey
and MaxKey
values are reserved special value for comparisons. MinKey
和MaxKey
值是为进行比较而保留的特殊值。MinKey
always compares as lower than every other possible value, while MaxKey
always compares as higher than every other possible value. MinKey
总是比其他可能的值低,而MaxKey
总是比每个可能的值高。The configured ranges captures every user for each 配置的范围捕获每个device
.device
的每个用户。
Both country : UK
and country : DE
are assigned to the EU
zone. This associates any document with either UK
or DE
as the value for country
to the EU data center.country : UK
和country : DE
都被分配到EU
区域。这把country
的值为UK
或DE
的任何文档关联到欧盟数据中心。
Enable the Balancer (Optional)启用平衡器(可选)
If the balancer was disabled in previous steps, re-enable the balancer at the completion of this procedure to rebalance the cluster.如果在前面的步骤中禁用了平衡器,请在完成此过程时重新启用平衡器以重新平衡群集。
Use 使用sh.enableBalancing()
, specifying the namespace of the collection, to start the balancer.sh.enableBalancing()
,指定集合的命名空间,启动平衡器。
sh.enableBalancing("chat.message")
Use 使用sh.isBalancerRunning()
to check if the balancer process is currently running.sh.isBalancerRunning()
检查平衡器进程当前是否正在运行。
Review the Changes查看更改
The next time the balancer runs, it splits chunks where necessary and migrates chunks across the shards respecting the configured zones.下次平衡器运行时,它会在必要的地方分割块,并根据配置的区域在分片之间迁移块。
Once balancing finishes:一旦平衡完成:
shards in theNA
zone should only contain documents withcountry : US
, andNA
区域中的分片应仅包含country : US
,并且shards in theEU
zone should only contain documents withcountry : UK
orcountry : DE
.EU
区域中的分片应仅包含country : UK
或country : DE
的文档。
A document with a value for 具有country
other than US
, UK
, or DE
can reside on any shard in the cluster.US
、UK
或DE
以外的国家/地区值的文档可以驻留在集群中的任何分片上。
To confirm the chunk distribution, run 要确认区块分布,请运行sh.status()
.sh.status()
。
Updating Zones更新区域
The application requires the following updates:该应用程序需要以下更新:
Documents with具有country : UK
must now be associated to the newUK
data center.country : UK
的文档现在必须与新的英国数据中心关联。Any data in the必须迁移欧盟数据中心中的任何数据EU
data center must be migratedThe chat application now supports users in Mexico. Documents with该聊天应用程序现在支持墨西哥的用户。具有country : MX
must be routed to theNA
data center.country : MX
的文档必须路由到北美数据中心。
Perform the following procedures to update the zone ranges.执行以下步骤以更新分区范围。
Disable the Balancer (Optional)禁用平衡器(可选)
To reduce performance impacts, the balancer may be disabled on the collection to ensure no migrations take place while configuring the new zones or removing the old ones.为了减少对性能的影响,可以在集合上禁用平衡器,以确保在配置新区域或删除旧区域时不会发生迁移。
Use 使用sh.disableBalancing()
, specifying the namespace of the collection, to stop the balancersh.disableBalancing()
,指定集合的命名空间,以停止平衡器
sh.disableBalancing("chat.messages")
Use 使用sh.isBalancerRunning()
to check if the balancer process is currently running. sh.isBalancerRunning()
检查平衡器进程当前是否正在运行。Wait until any current balancing rounds have completed before proceeding.等待所有当前平衡轮次完成后再继续。
Add the new UK
zone添加新的UK
区域
UK
zoneAdd each shard in the 将UK
data center to the UK
zone.UK
数据中心中的每个分片添加到UK
区域。
sh.addShardTag("<shard name>", "UK")
You can review the zones assigned to any given shard by running 您可以通过运行sh.status()
.sh.status()
来查看分配给任何给定分片的区域。
Remove the old zone range删除旧的区域范围
Remove the old zone range associated to the 使用UK
country using the sh.removeTagRange()
method. sh.removeTagRange()
方法删除与UK
国家关联的旧区域范围。This method requires:此方法需要:
The full namespace of the target collection.目标集合的完整命名空间。The inclusive lower bound of the range.范围的包含下限。The exclusive upper bound of the range.范围的唯一上限。The name of the zone.区域的名称。
sh.removeTagRange(
"chat.messages",
{ "country" : "UK", "userid" : MinKey },
{ "country" : "UK", "userid" : MaxKey }
"EU"
)
Add new zone ranges添加新的区域范围
For shard key values where 对于country : UK
, define a shard key range and associate it to the UK
zone using the sh.addTagRange()
method. country : UK
的分片键值,定义一个分片键值范围,并使用sh.addTagRange()
方法将其关联到UK
区域。This method requires:此方法需要:
The full namespace of the target collection.目标集合的完整命名空间。The inclusive lower bound of the range.范围的包含下限。The exclusive upper bound of the range.范围的唯一上限。The name of the zone.区域的名称。
sh.addTagRange(
"chat.message",
{ "country" : "UK", "userid" : MinKey },
{ "country" : "UK", "userid" : MaxKey },
"UK"
)
For shard key values where 对于country : MX
, define a shard key range and associate it to the NA
zone using the sh.addTagRange()
method. country : MX
所在的分片键值,定义一个分片键值范围,并使用sh.addTagRange()
方法将其关联到NA
区域。This method requires:此方法需要:
The full namespace of the target collection.目标集合的完整命名空间。The inclusive lower bound of the range.范围的包含下限。The exclusive upper bound of the range.范围的唯一上限。The name of the zone.区域的名称。
sh.addTagRange(
"chat.messages",
{ "country" : "MX", "userid" : MinKey },
{ "country" : "MX", "userid" : MaxKey },
"NA"
)
The MinKey
and MaxKey
values are reserved special values for comparisons. MinKey
always compares as lower than every other possible value, while MaxKey
always compares as higher than every other possible value. MinKey
和MaxKey
值是为进行比较而保留的特殊值。MinKey
总是比其他可能的值低,而MaxKey
总是比每个可能的值高。This ensures the two ranges captures the entire possible value space of 这确保了这两个范围捕获userid
.userid
的整个可能值空间。
Enable the Balancer (Optional)启用平衡器(可选)
If the balancer was disabled in previous steps, re-enable the balancer at the completion of this procedure to rebalance the cluster.如果在前面的步骤中禁用了平衡器,请在完成此过程时重新启用平衡器以重新平衡群集。
Use 使用sh.enableBalancing()
, specifying the namespace of the collection, to start the balancersh.enableBalancing()
,指定集合的命名空间,启动平衡器
sh.enableBalancing("chat.messages")
Use 使用sh.isBalancerRunning()
to check if the balancer process is currently running.sh.isBalancerRunning()
检查平衡器进程当前是否正在运行。
Review the changes查看更改
The next time the balancer runs, it splits chunks where necessary and migrates chunks across the shards respecting the configured zones.下次平衡器运行时,它会在必要的地方分割块,并根据配置的区域在分片之间迁移块。
Before balancing:平衡前:
shards in the欧盟区域中的分片仅包含EU
zone only contain documents wherecountry : DE
orcountry : UK
, andcountry : DE
或country : UK
的文件,并且documents wherecountry : MX
could be stored on any shard in the sharded cluster.country : MX
文档可以存储在分片集群中的任何分片上。
After balancing:平衡后:
shards in the欧盟区域中的分片仅包含EU
zone only contain documents wherecountry : DE
,country : DE
,shards in the英国区域中的分片仅包含UK
zone only contain documents wherecountry : UK
, andcountry : UK
,并且shards in the北美区域中的分片仅包含NA
zone only contain documents wherecountry : US
orcountry : MX
.country : US
或country : MX
的文档。
A document with a value for 具有country
other than US
, MX
, UK
, or DE
can reside on any shard in the cluster.US
、MX
、UK
或DE
以外的国家/地区值的文档可以驻留在集群中的任何分片上。
To confirm the chunk distribution, run 要确认区块分布,请运行sh.status()
.sh.status()
。