Docs HomeMongoDB Manual

Distribute Collections Using Zones使用区域分发集合

In sharded clusters, you can create zones of sharded data based on the shard key. 在分片集群中,可以根据分片键创建分片数据区域You can associate each zone with one or more shards in the cluster. 您可以将每个区域与集群中的一个或多个分片相关联。A shard can associate with any number of zones. 分片可以与任意数量的区域关联。In a balanced cluster, MongoDB migrates chunks covered by a zone only to those shards associated with the zone.在一个平衡的集群中,MongoDB只将一个区域覆盖的区块迁移到与该区域关联的分片中。

You can use zone sharding to distribute collections across a sharded cluster and designate which shards store data for each collection. 您可以使用区域分片在分片集群中分布集合,并为每个集合指定存储数据的分片。You can distribute collections based on shard properties, such as physical resources and available memory, to ensure that each collection is stored on the optimal shard for that data.您可以根据分片属性(如物理资源和可用内存)分发集合,以确保每个集合都存储在该数据的最佳分片上。

Prerequisites先决条件

To complete this tutorial, you must:要完成本教程,您必须:

  • Deploy a sharded cluster. 部署一个分片集群This tutorial uses a sharded cluster with three shards.本教程使用一个具有三个分片的分片群集。
  • Connect to a mongos. 连接到mongosYou cannot create zones or zone ranges by connecting directly to a shard.无法通过直接连接到分片来创建区域或区域范围。
  • Authenticate as a user with at least the clusterManager role on the admin database. 以至少具有admin数据库上的clusterManager角色的用户身份进行身份验证。To view user permissions, use the db.getUser() method.要查看用户权限,请使用db.getUser()方法。

Scenario情形

You have a database called shardDistributionDB that contains two sharded collections:您有一个名为shardDistributionDB的数据库,其中包含两个分片集合:

  • bigData, which contains a large amount of data.,其中包含大量数据。
  • manyIndexes, which contains many large indexes.,其中包含许多大型索引。

You want to limit each collection to a subset of shards so that each collection can use the shards' different physical resources.您希望将每个集合限制为分片的子集,以便每个集合可以使用分片的不同物理资源。

Architecture策略

The sharded cluster has three shards. 分片集群有三个分片。Each shard has unique physical resources:每个分片都有唯一的物理资源:

Shard Name分片名称Physical Resources实物资源
shard0High memory capacity高存储容量
shard1Fast flash storage快速闪存
shard2High memory capacity and fast flash storage高存储容量和快速闪存

Zones区域

To distribute collections based on physical resources, use shard zones. 要基于物理资源分发集合,请使用分片区域。A shard zone associates collections with a specific subset of shards, which restricts the shards that store the collection's data. 分片区域将集合与特定的分片子集相关联,从而限制存储集合数据的分片。In this example, you need two shard zones:在本例中,您需要两个分片区域:

Zone Name区域名称Description描述Collections in this Zone此区域中的集合
HI_RAMServers with high memory capacity.具有高内存容量的服务器Collections requiring more memory, such as collections with large indexes, should be on the HI_RAM shards.需要更多内存的集合,例如具有大索引的集合,应该位于HI_RAM分片上。
FLASHServers with flash drives for fast storage speeds.带有闪存驱动器的服务器可实现快速存储。Large collections requiring fast data retrieval should be on the FLASH shards.需要快速数据检索的大型集合应该在FLASH分片上。

Shard Key分片键

In this tutorial, the shard key you will use to shard each collection is { _id: "hashed" }. 在本教程中,用于对每个集合进行分片的分片键{ _id: "hashed" }You will configure shard zones before you shard the collections. 在对集合进行分片之前,您将配置分片区域。As a result, each collection's data only ever exists on the shards in the corresponding zone.因此,每个集合的数据只存在于相应区域中的分片上。

With hashed sharding, if you shard collections before you configure zones, MongoDB assigns chunks evenly between all shards when sharding is enabled. 使用哈希分片,如果在配置区域之前对集合进行分片,则当启用分片时,MongoDB会在所有分片之间均匀分配This means that chunks may be temporarily assigned to a shard poorly suited to handle that chunk's data.这意味着区块可能会被临时分配给不适合处理该区块数据的分片。

Balancer平衡器

The balancer migrates chunks to the appropriate shard, respecting any configured zones. When balancing is complete, shards only contain chunks whose ranges match its assigned zones.平衡器根据任何配置的区域将块迁移到适当的分片。当平衡完成时,分片只包含范围与其指定区域匹配的块。

Important

Performance性能

Adding, removing, or changing zones or zone ranges can result in chunk migrations. 添加、删除或更改区域或区域范围可能导致区块迁移。Depending on the size of your dataset and the number of chunks a zone or zone range affects, these migrations may impact cluster performance. 根据数据集的大小以及区域或区域范围影响的块数,这些迁移可能会影响集群性能。Consider running the balancer during specific scheduled windows. 请考虑在特定的计划窗口期间运行平衡器。To learn how to set a scheduling window, see Schedule the Balancing Window.要了解如何设置计划窗口,请参阅计划平衡窗口

Steps步骤

Use the following procedure to configure shard zones and distribute collections based on shard physical resources.使用以下过程来配置分片区域并基于分片物理资源分发集合。

1

Add each shard to the appropriate zone.将每个分片添加到适当的区域。

To configure the shards in each zone, use the addShardToZone command.要配置每个区域中的分片,请使用addShardToZone命令。

Add shard0 and shard2 to the HI_RAM zone:shard0shard2添加到HI_RAM区域:

sh.addShardToZone("shard0", "HI_RAM")

sh.addShardToZone("shard2", "HI_RAM")

Add shard1 and shard2 to the FLASH zone:shard1shard2添加到FLASH区域:

sh.addShardToZone("shard1", "FLASH")

sh.addShardToZone("shard2", "FLASH")
2

Add zone ranges for the relevant collections.为相关集合添加区域范围。

To associate a range of shard keys to a zone, use sh.updateZoneKeyRange().要将一系列分片键与区域关联,请使用sh.updateZoneKeyRange()

In this scenario, you want to associate all documents in a collection to the appropriate zone. 在这种情况下,您希望将集合中的所有文档关联到适当的区域。To associate all collection documents to a zone, specify the following zone range:要将所有集合文档关联到某个区域,请指定以下区域范围:

  • a lower bound of 上边界为{ "_id" : MinKey }
  • an upper bound of 下边界为{ "_id" : MaxKey }

For the bigData collection, set:对于bigData集合,请设置:

  • The namespace to shardDistributionDB.bigData,命名空间为shardDistributionDB.bigData
  • The lower bound to MinKey,下边界为MinKey
  • The upper bound to MaxKey,上边界为MaxKey
  • The zone to 区域为FLASH
sh.updateZoneKeyRange(
"shardDistributionDB.bigData",
{ "_id" : MinKey },
{ "_id" : MaxKey },
"FLASH"
)

For the manyIndexes collection, set:对于manyIndexes集合,请设置:

  • The namespace to shardDistributionDB.manyIndexes,命名空间为shardDistributionDB.manyIndexes
  • The lower bound to MinKey,下边界为MinKey
  • The upper bound to MaxKey,上边界为MaxKey
  • The zone to 区域为HI_RAM
sh.updateZoneKeyRange(
"shardDistributionDB.manyIndexes",
{ "_id" : MinKey },
{ "_id" : MaxKey },
"HI_RAM"
)
3

Shard the collections.对集合进行分片

To shard both collections (bigData and manyIndexes), specify a shard key of { _id: "hashed" }.要对两个集合(bigDatamanyIndex)进行分片,请指定一个分片键{ _id: "hashed" }

Run the following commands:运行以下命令:

sh.shardCollection(
"shardDistributionDB.bigData", { _id: "hashed" }
)

sh.shardCollection(
"shardDistributionDB.manyIndexes", { _id: "hashed" }
)
4

Review the changes.查看更改。

To view chunk distribution and shard zones, use the sh.status() method:要查看区块分布和分片区域,请使用sh.status()方法:

sh.status()

The next time the balancer runs, it splits chunks where necessary and migrates chunks across the shards, respecting the configured zones. 下次平衡器运行时,它会在必要的地方分割块,并根据配置的区域在分片之间迁移块。The amount of time the balancer takes to complete depends on several factors, including number of shards, available memory, and IOPS.平衡器完成所需的时间取决于几个因素,包括分片数量、可用内存和IOPS

When balancing finishes:平衡完成时:

  • Chunks for documents in the manyIndexes collection reside on shard0 and shard2manyIndex集合中文档的块位于shard0shard2
  • Chunks for documents in the bigData collection reside on shard1 and shard2.bigData集合中的文档块位于shard0shard2上。

Learn More了解更多信息

To learn more about sharding and balancing, see the following pages:要了解有关分片和平衡的更多信息,请参阅以下页面: