Segmenting Data by Application or Customer按应用程序或客户划分数据
On this page本页内容
In sharded clusters, you can create zones of sharded data based on the shard key. 在分片集群中,可以根据分片键创建分片数据区域。You can associate each zone with one or more shards in the cluster. 您可以将每个区域与集群中的一个或多个分片相关联。A shard can associate with any number of zones. 分片可以与任意数量的区域关联。In a balanced cluster, MongoDB migrates chunks covered by a zone only to those shards associated with the zone.在一个平衡的集群中,MongoDB只将一个区域覆盖的区块迁移到与该区域关联的分片中。
By defining the zones and the zone ranges before sharding an empty or a non-existing collection, the shard collection operation creates chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. 通过在对空集合或不存在的集合进行分片之前定义区域和区域范围,分片集合操作为定义的区域范围以及任何额外的块创建块,以覆盖分片键值的整个范围,并基于区域范围执行初始块分布。This initial creation and distribution of chunks allows for faster setup of zoned sharding. 这种块的初始创建和分布允许更快地设置分区分片。After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,平衡器管理接下来的块分发。
See Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection for an example.有关示例,请参阅空集合或不存在集合的预定义分区和分区范围。
This tutorial shows you how to segment data using Zones.本教程将向您展示如何使用区域分割数据。
Consider the following scenarios where segmenting data by application or customer may be necessary:考虑以下可能需要按应用程序或客户细分数据的场景:
A database serving multiple applications为多个应用程序提供服务的数据库A database serving multiple customers为多个客户提供服务的数据库A database that requires isolating ranges or subsets of application or customer data需要隔离应用程序或客户数据的范围或子集的数据库A database that requires resource allocation for ranges or subsets of application or customer data需要为应用程序或客户数据的范围或子集分配资源的数据库
This diagram illustrates a sharded cluster using zones to segment data based on application or customer. 此图显示了一个分片集群,该集群使用区域根据应用程序或客户对数据进行分段。This allows for data to be isolated to specific shards. 这允许将数据隔离到特定的分片。Additionally, each shard can have specific hardware allocated to fit the performance requirement of the data stored on that shard.此外,每个分片都可以分配特定的硬件,以满足存储在该分片上的数据的性能要求。
Scenario情形
An application tracks the score of a user along with a 应用程序跟踪用户的分数以及client
field, storing scores in the gamify
database under the users
collection. client
字段,将分数存储在users
集合下的gamify
数据库中。Each possible value of client
requires its own zone to allow for data segmentation. client
的每个可能值都需要自己的区域,以便进行数据分割。It also allows the administrator to optimize the hardware for each shard associated to a 它还允许管理员针对性能和成本优化与client
for performance and cost.client
关联的每个分片的硬件。
The following documents represent a partial view of two users:以下文档表示两个用户的局部视图:
{
"_id" : ObjectId("56f08c447fe58b2e96f595fa"),
"client" : "robot",
"userid" : 123,
"high_score" : 181,
...,
}
{
"_id" : ObjectId("56f08c447fe58b2e96f595fb"),
"client" : "fruitos",
"userid" : 456,
"high_score" : 210,
...,
}
Shard Key分片键
The users
collection uses the { client : 1, userid : 1 }
compound index as the shard key.users
集合使用{ client : 1, userid : 1 }
复合索引作为分片键。
The 每个文档中的client
field in each document allows creating a zone for each distinct client value.client
字段允许为每个不同的客户端值创建一个区域。
The userid
field provides a high cardinality and low frequency component to the shard key relative to country
.userid
字段为分片键提供了相对于country
的高基数和低频率分量。
See Choosing a Shard Key for more general instructions on selecting a shard key.有关选择分片关键点的更多常规说明,请参阅选择分片关键帧。
Architecture策略
The application requires adding shard to a zone associated to a specific 该应用程序需要将分片添加到与特定client
.client
关联的区域中。
The sharded cluster deployment currently consists of four shards.分片集群部署目前由四个分片组成。
Zones区域
For this application, there are two client zones.对于此应用程序,有两个客户端区域。
Robot client ("robot")机器人客户端(“机器人”)This zone represents all documents where此区域表示client : robot
.client : robot
的所有文档。FruitOS client ("fruitos")FruitOS客户端(“FruitOS”)This zone represents all documents where此区域表示client : fruitos
.client : fruitos
的所有文档。
Write Operations写入操作
With zones, if an inserted or updated document matches a configured zone, it can only be written to a shard inside that zone.对于区域,如果插入或更新的文档与配置的区域匹配,则只能将其写入该区域内的分片。
MongoDB can write documents that do not match a configured zone to any shard in the cluster.MongoDB可以编写与集群中任何分片的配置区域不匹配的文档。
Read Operations读取操作
MongoDB can route queries to a specific shard if the query includes at least the 如果查询至少包括客client
field.client
字段,MongoDB可以将查询路由到特定的分片。
For example, MongoDB can attempt a targeted read operation on the following query:例如,MongoDB可以对以下查询尝试定向读取操作:
chatDB = db.getSiblingDB("gamify")
chatDB.users.find( { "client" : "robot" , "userid" : "123" } )
Queries without the 不包含client
field perform broadcast operations.client
字段的查询执行广播操作。
Balancer平衡器
The balancer migrates chunks to the appropriate shard respecting any configured zones. 平衡器根据任何配置的区域将块迁移到适当的分片。Until the migration, shards may contain chunks that violate configured zones. Once balancing completes, shards should only contain chunks whose ranges do not violate its assigned zones.在迁移之前,分片可能包含违反已配置区域的块。一旦平衡完成,分片应该只包含范围不违反其指定区域的块。
Adding or removing zones or zone ranges can result in chunk migrations. 添加或删除区域或区域范围可能导致区块迁移。Depending on the size of your data set and the number of chunks a zone or zone range affects, these migrations may impact cluster performance. 根据数据集的大小以及区域或区域范围影响的块数,这些迁移可能会影响群集性能。Consider running your balancer during specific scheduled windows. 请考虑在特定的计划窗口期间运行平衡器。See Schedule the Balancing Window for a tutorial on how to set a scheduling window.有关如何设置计划窗口的教程,请参阅计划平衡窗口。
Security安全
For sharded clusters running with Role-Based Access Control, authenticate as a user with at least the 对于使用基于角色的访问控制运行的分片集群,在clusterManager
role on the admin
database.admin
数据库中至少使用clusterManager
角色进行身份验证。
Procedure过程
You must be connected to a 您必须连接到与目标分片集群相关联的mongos
associated to the target sharded cluster to proceed. mongos
才能继续。You cannot create zones or zone ranges by connecting directly to a shard.无法通过直接连接到分片来创建区域或区域范围。
Disable the Balancer禁用平衡器
The balancer must be disabled on the collection to ensure no migrations take place while configuring the new zones.必须在集合上禁用平衡器,以确保在配置新区域时不会发生迁移。
Use 使用sh.disableBalancing()
, specifying the namespace of the collection, to stop the balancer.sh.disableBalancing()
,指定集合的命名空间,以停止平衡器。
sh.disableBalancing("chat.message")
Use 使用sh.isBalancerRunning()
to check if the balancer process is currently running. sh.isBalancerRunning()
检查平衡器进程当前是否正在运行。Wait until any current balancing rounds have completed before proceeding.等待所有当前平衡轮次完成后再继续。
Add each shard to the appropriate zone将每个分片添加到适当的区域
Add 将shard0000
to the robot
zone.shard0000
添加到robot
区域。
sh.addShardTag("shard0000", "robot")
Add 将shard0001
to the robot
zone.shard0001
添加到robot
区域。
sh.addShardTag("shard0001", "robot")
Add 将shard0002
to the fruitos
zone.shard0002
添加到fruitos
区域。
sh.addShardTag("shard0002", "fruitos")
Add 将shard0003
to the fruitos
zone.shard0003
添加到fruitos
区域。
sh.addShardTag("shard0003", "fruitos")
Run 运行sh.status()
to review the zone configured for the sharded cluster.sh.status()
查看为分片集群配置的区域。
Define ranges for each zone定义每个分区的范围
Define range for the 定义robot
client and associate it to the robot
zone using the sh.addTagRange()
method.robot
客户端的范围,并使用sh.addTagRange()
方法将其与robot
区域关联。
This method requires:此方法需要:
The full namespace of the target collection目标集合的完整命名空间The inclusive lower bound of the range范围的包含下界The exclusive upper bound of the range范围的唯一上限The name of the zone区域的名称
sh.addTagRange(
"gamify.users",
{ "client" : "robot", "userid" : MinKey },
{ "client" : "robot", "userid" : MaxKey },
"robot"
)
Define range for the 定义fruitos
client and associate it to the fruitos
zone using the sh.addTagRange()
method.fruitos
客户端的范围,并使用sh.addTagRange()
方法将其与fruitos
区域关联。
This method requires:此方法需要:
The full namespace of the target collection目标集合的完整命名空间The inclusive lower bound of the range范围的包含下界The exclusive upper bound of the range范围的唯一上限The name of the zone区域的名称
sh.addTagRange(
"gamify.users",
{ "client" : "fruitos", "userid" : MinKey },
{ "client" : "fruitos", "userid" : MaxKey },
"fruitos"
)
The MinKey
and MaxKey
values are reserved special values for comparisons. MinKey
和MaxKey
值是为进行比较而保留的特殊值。MinKey
always compares as lower than every other possible value, while MaxKey
always compares as higher than every other possible value. MinKey
总是比其他可能的值低,而MaxKey
总是比每个可能的值高。The configured ranges captures every user for each 配置的范围捕获每个client
.client
的每个用户。
Enable the Balancer启用平衡器
Re-enable the balancer to rebalance the cluster.重新启用平衡器以重新平衡群集。
Use 使用sh.enableBalancing()
, specifying the namespace of the collection, to start the balancer.sh.enableBalancing()
,指定集合的命名空间,启动平衡器。
sh.enableBalancing("chat.message")
Use 使用sh.isBalancerRunning()
to check if the balancer process is currently running.sh.isBalancerRunning()
检查平衡器进程当前是否正在运行。
Review the changes查看更改
The next time the balancer runs, it migrates data across the shards respecting the configured zones.下次平衡器运行时,它会根据配置的区域在分片之间迁移数据。
Once balancing finishes, the shards in the 一旦平衡完成,robot
zone only contain documents with client : robot
, while shards in the fruitos
zone only contain documents with client : fruitos
.robot
区域中的分片只包含带有client:robot
的文档,而fruitos
区域中的片段只包含带有client : fruitos
的文档。
You can confirm the chunk distribution by running 您可以通过运行sh.status()
.sh.status()
来确认区块分布。