Tiered Hardware for Varying SLA or SLO针对不同SLA或SLO的分层硬件

On this page本页内容

In sharded clusters, you can create zones of sharded data based on the shard key. 在分片集群中,可以基于分片键创建分片数据区域You can associate each zone with one or more shards in the cluster. 您可以将每个分区与集群中的一个或多个分片相关联。A shard can associate with any number of zones. 分片可以与任意数量的区域关联。In a balanced cluster, MongoDB migrates chunks covered by a zone only to those shards associated with the zone.在平衡集群中,MongoDB只将区域覆盖的迁移到与该区域关联的分片。

Tip提示

Changed in version 4.0.3.在版本4.0.3中更改

By defining the zones and the zone ranges before sharding an empty or a non-existing collection, the shard collection operation creates chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. 通过在对空集合或不存在的集合进行分片之前定义区域和区域范围,分片集合操作为定义的区域范围以及覆盖整个分片键值范围的任何附加块创建块,并基于区域范围执行初始块分布。This initial creation and distribution of chunks allows for faster setup of zoned sharding. 块的初始创建和分布允许更快地设置分区分片。After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,平衡器将继续管理块分发。See Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection for an example.有关示例,请参阅为空集合或不存在集合预定义区域和区域范围

This tutorial uses Zones to route documents based on creation date either to shards zoned for supporting recent documents, or those zoned for supporting archived documents.本教程使用区域根据创建日期将文档路由到分区用于支持最近文档的分片,或分区用于支持存档文档的分片。

The following are some example use cases for segmenting data based on Service Level Agreement (SLA) or Service Level Objective (SLO):以下是基于服务级别协议(SLA)或服务级别目标(SLO)划分数据的一些示例用例:

The following diagram illustrates a sharded cluster that uses hardware based zones to satisfy data access SLAs or SLOs.下图说明了使用基于硬件的区域来满足数据访问SLA或SLO的分片集群。

Diagram of sharded cluster architecture for tiered SLA

Scenario情形

A photo sharing application requires fast access to photos uploaded within the last 6 months. 照片共享应用程序需要快速访问过去6个月内上传的照片。The application stores the location of each photo along with its metadata in the photoshare database under the data collection.应用程序将每个照片的位置及其元数据存储在photoshare数据库的data集合下中。

The following documents represent photos uploaded by a single user:以下文档表示单个用户上传的照片:

{
  "_id" : 10003010,
  "creation_date" : ISODate("2012-12-19T06:01:17.171Z"),
  "userid" : 123,
  "photo_location" : "example.net/storage/usr/photo_1.jpg"
}
{
  "_id" : 10003011,
  "creation_date" : ISODate("2013-12-19T06:01:17.171Z"),
  "userid" : 123,
  "photo_location" : "example.net/storage/usr/photo_2.jpg"
}
{
  "_id" : 10003012,
  "creation_date" : ISODate("2016-01-19T06:01:17.171Z"),
  "userid" : 123,
  "photo_location" : "example.net/storage/usr/photo_3.jpg"
}

Note that only the document with _id : 10003012 was uploaded within the past year (as of June 2016).请注意,在过去一年中(截至2016年6月),仅上传了_id : 10003012的文档。

Shard Key分片键

The photo collection uses the { creation_date : 1 } index as the shard key.照片集合使用{ creation_date : 1 }索引作为分片键。

The creation_date field in each document allows for creating zones on the creation date.每个文档中的creation_date字段允许在创建日期创建区域。

Architecture策略

The sharded cluster deployment currently consists of three shards.分片集群部署目前由三个分片组成。

Diagram of sharded cluster architecture for tiered SLA

Zones区域

The application requires adding each shard to a zone based on its hardware tier. 应用程序需要根据其硬件层将每个分片添加到区域。Each hardware tier represents a specific hardware configuration designed to satisfy a given SLA or SLO.每个硬件层都代表一个特定的硬件配置,设计用于满足给定的SLA或SLO。

Diagram of sharded cluster architecture for tiered SLA
Fast Tier ("recent")快速层(“近期”)

These are the fastest performing machines, with large amounts of RAM, fast SSD disks, and powerful CPUs.这些是性能最快的机器,具有大量RAM、快速SSD磁盘和强大的CPU。

The zone requires a range with:区域要求的范围包括:

  • a lower bound of { creation_date : ISODate(YYYY-mm-dd)}, where the Year, Month, and Date specified by YYYY-mm-dd is within the last 6 months.{ creation_date : ISODate(YYYY-mm-dd)}的下限,其中YYYY-mm-dd指定的年、月和日期在最后6个月内。
  • an upper bound of { creation_date : MaxKey }.{ creation_date : MaxKey }的上界。
Archival Tier ("archive")存档层(“存档”)

These machines use less RAM, slower disks, and more basic CPUs. However, they have a greater amount of storage per server.这些机器使用更少的RAM、更慢的磁盘和更基本的CPU。但是,每台服务器的存储量更大。

The zone requires a range with:区域要求的范围包括:

  • a lower bound of { creation_date : MinKey }.{ creation_date : MinKey }的下界。
  • an upper bound of { creation_date : ISODate(YYYY-mm-dd)}, where the Year, Month, and Date match the values used for the recent tier's lower bound.{ creation_date : ISODate(YYYY-mm-dd)}的上界,其中年、月和日期与recent层的下限值相匹配。
Note注意

The MinKey and MaxKey values are reserved special values for comparisons.MinKeyMaxKey值是为比较保留的特殊值。

As performance needs increase, adding additional shards and associating them to the appropriate zone based on their hardware tier allows for the cluster to scale horizontally.随着性能需求的增加,添加额外的分片并根据它们的硬件层将它们关联到适当的区域允许集群水平扩展。

When defining zone ranges based on time spans, weigh the benefits of infrequent updates to the zone ranges against the amount of data that must be migrated on an update. 在基于时间跨度定义区域范围时,根据更新时必须迁移的数据量权衡区域范围不频繁更新的好处。For example, setting a limit of 1 year for data to be considered 'recent' likely covers more data than setting a limit of 1 month. 例如,为被视为“最近”的数据设置1年的限制可能比设置1个月的限制涵盖更多的数据。While there are more migrations required when rotating on a 1 month scale, the amount of documents that must be migrated is lower than rotating on a 1 year scale.虽然以1个月为周期进行轮换时需要进行更多的迁移,但必须迁移的文档数量低于以1年为周期进行的轮换。

Write Operations写操作

With zones, if an inserted or updated document matches a configured zone, it can only be written to a shard inside that zone.对于区域,如果插入或更新的文档与配置的区域匹配,则只能将其写入该区域内的分片。

MongoDB can write documents that do not match a configured zone to any shard in the cluster.MongoDB可以编写与集群中任何分片的配置区域不匹配的文档。

Note注意

The behavior described above requires the cluster to be in a steady state with no chunks violating a configured zone. 上述行为要求集群处于稳定状态,没有块违反配置区域。See the following section on the balancer for more information.有关更多信息,请参阅以下关于平衡器的章节。

Read Operations读操作

MongoDB can route queries to a specific shard if the query includes the shard key.如果查询包含shard密钥,MongoDB可以将查询路由到特定的分片。

For example, MongoDB can attempt a targeted read operation on the following query because it includes creation_date in the query document:例如,MongoDB可以尝试对以下查询执行目标读取操作,因为它在查询文档中包含creation_date

photoDB = db.getSiblingDB("photoshare")
photoDB.data.find( { "creation_date" : ISODate("2015-01-01") } )

If the requested document falls within the recent zone range, MongoDB would route this query to the shards inside that zone, ensuring a faster read compared to a cluster-wide broadcast read operation如果请求的文档位于recent的区域范围内,MongoDB会将查询路由到该区域内的分片,确保与集群范围的广播读取操作相比,读取速度更快

Balancer平衡器

The balancer migrates chunks to the appropriate shard respecting any configured zones. 平衡器将块迁移到与任何配置区域相关的适当分片。Until the migration, shards may contain chunks that violate configured zones. 在迁移之前,分片可能包含违反配置区域的块。Once balancing completes, shards should only contain chunks whose ranges do not violate its assigned zones.一旦平衡完成,分片应该只包含其范围不违反其分配区域的块。

Adding or removing zones or zone ranges can result in chunk migrations. 添加或删除区域或区域范围可能导致块迁移。Depending on the size of your data set and the number of chunks a zone or zone range affects, these migrations may impact cluster performance. 根据数据集的大小以及区域或区域范围影响的块数,这些迁移可能会影响群集性能。Consider running your balancer during specific scheduled windows. 考虑在特定的计划窗口期间运行平衡器See Schedule the Balancing Window for a tutorial on how to set a scheduling window.有关如何设置计划窗口的教程,请参阅计划平衡窗口

Security安全

For sharded clusters running with Role-Based Access Control, authenticate as a user with at least the clusterManager role on the admin database.对于使用基于角色的访问控制运行的分片集群,至少使用admin数据库上的clusterManager角色作为用户进行身份验证。

Procedure过程

You must be connected to a mongos to create zones or zone ranges. 您必须连接到mongos才能创建区域或区域范围。You cannot create zone or zone ranges by connecting directly to a shard.无法通过直接连接到分片来创建分区或分区范围。

1

Disable the Balancer禁用平衡器

The balancer must be disabled on the collection to ensure no migrations take place while configuring the new zones.必须在集合上禁用平衡器,以确保在配置新区域时不会发生迁移。

Use sh.disableBalancing(), specifying the namespace of the collection, to stop the balancer使用sh.disableBalancing(),指定集合的命名空间,以停止平衡器

sh.disableBalancing("photoshare.data")

Use sh.isBalancerRunning() to check if the balancer process is currently running. 使用sh.isBalancerRunning()检查平衡器进程当前是否正在运行。Wait until any current balancing rounds have completed before proceeding.在继续之前,请等待所有当前平衡轮完成。

2

Add each shard to the appropriate zone将每个分片添加到相应的分区

Add shard0000 to the recent zone.shard0000添加recent区域。

sh.addShardTag("shard0000", "recent")

Add shard0001 to the recent zone.shard0001添加到recent区域。

sh.addShardTag("shard0001", "recent")

Add shard0002 to the archive zone.shard0002添加到archive区域。

sh.addShardTag("shard0002", "archive")

You can review the zone assigned to any given shard by running sh.status().您可以通过运行sh.status()查看分配给任何给定分片的区域。

3

Define ranges for each zone定义每个区域的范围

Define range for recent photos and associate it to the recent zone using the sh.addTagRange() method. 定义recent照片的范围,并使用sh.addTagRange()方法将其与最近区域关联。This method requires:该方法要求:

  • the full namespace of the target collection.目标集合的完整命名空间。
  • the inclusive lower bound of the range.范围的包含下限。
  • the exclusive upper bound of the range.范围的唯一上界。
  • the zone.区域。
sh.addTagRange(
  "photoshare.data",
  { "creation_date" : ISODate("2016-01-01") },
  { "creation_date" : MaxKey }, 
  "recent"
)

Define range for older photos and associate it to the archive zone using the sh.addTagRange() method. 定义旧照片的范围,并使用sh.addTagRange()方法将其与archive区域关联。This method requires:该方法要求:

  • the full namespace of the target collection.目标集合的完整命名空间。
  • the inclusive lower bound of the range.范围的包含下限。
  • the exclusive upper bound of the range.范围的唯一上界。
  • the zone.区域。
sh.addTagRange(
  "photoshare.data",
  { "creation_date" : MinKey },
  { "creation_date" : ISODate("2016-01-01") }, 
  "archive"
)

MinKey and MaxKey are reserved special values for comparisons.为比较保留特殊值。

4

Enable the Balancer启用平衡器

Re-enable the balancer to rebalance the cluster.重新启用平衡器以重新平衡群集。

Use sh.enableBalancing(), specifying the namespace of the collection, to start the balancer使用sh.enableBalancing(),指定集合的命名空间,以启动平衡器

sh.enableBalancing("photoshare.data")

Use sh.isBalancerRunning() to check if the balancer process is currently running.使用sh.isBalancerRunning()检查平衡器进程当前是否正在运行。

5

Review the changes回顾变化

The next time the balancer runs, it splits and migrates chunks across the shards respecting configured zones.下一次运行平衡器时,它会根据配置的区域在分片之间分割迁移块。

Once balancing finishes, the shards in the recent zone should only contain documents with creation_date greater than or equal to ISODate("2016-01-01"), while shards in the archive zone should only contain documents with creation_date less than ISODate("2016-01-01").平衡完成后,recent区域中的分片应仅包含creation_date大于或等于ISODate("2016-01-01")的文档,而archive区域中的片段应仅包含creation_date小于ISODate("2016-01-01")的文档。

You can confirm the chunk distribution by running sh.status().您可以通过运行sh.status()来确认区块分布。

Updating Zone Ranges更新区域范围

To update the shard ranges, perform the following operations as a part of a cron job or other scheduled procedure:要更新分片范围,请作为cron作业或其他计划过程的一部分执行以下操作:

1

Disable the Balancer禁用平衡器

The balancer must be disabled on the collection to ensure no migrations take place while configuring the new zones.必须在集合上禁用平衡器,以确保在配置新区域时不会发生迁移。

Use sh.disableBalancing(), specifying the namespace of the collection, to stop the balancer使用sh.disableBalancing(),指定集合的命名空间,以停止平衡器

sh.disableBalancing("photoshare.data")

Use sh.isBalancerRunning() to check if the balancer process is currently running. 使用sh.isBalancerRunning()检查平衡器进程当前是否正在运行。Wait until any current balancing rounds have completed before proceeding.在继续之前,请等待所有当前平衡轮完成。

2

Remove the old shard zone ranges删除旧的分片区域范围

Remove the old recent zone range using the sh.removeTagRange() method. 使用sh.removeTagRange()方法删除旧的recent区域范围。This method requires:该方法要求:

  • the full namespace of the target collection.目标集合的完整命名空间。
  • the inclusive lower bound of the range.范围的包含下限。
  • the exclusive upper bound of the range.范围的唯一上界。
  • the zone.区域。
sh.removeTagRange(
  "photoshare.data",
  { "creation_date" : ISODate("2016-01-01") },
  { "creation_date" : MaxKey }, 
  "recent"
)

Remove the old archive zone range using the sh.removeTagRange() method. 使用sh.removeTagRange()方法删除旧的archive区域范围。This method requires:该方法要求:

  • the full namespace of the target collection.目标集合的完整命名空间。
  • the inclusive lower bound of the range.范围的包含下限。
  • the exclusive upper bound of the range.范围的唯一上界。
  • the zone.区域。
sh.removeTagRange(
  "photoshare.data",
  { "creation_date" : MinKey },
  { "creation_date" : ISODate("2016-01-01") }, 
  "archive"
)

MinKey and MaxKey are reserved special values for comparisons.为比较保留特殊值。

3

Add the new zone range for each zone为每个分区添加新的分区范围

Define range for recent photos and associate it to the recent zone using the sh.addTagRange() method. 定义最近照片的范围,并使用sh.addTagRange()方法将其与recent区域关联。This method requires:该方法要求:

  • the full namespace of the target collection.目标集合的完整命名空间。
  • the inclusive lower bound of the range.范围的包含下限。
  • the exclusive upper bound of the range.范围的唯一上界。
  • the zone.区域。
sh.addTagRange(
  "photoshare.data",
  { "creation_date" : ISODate("2016-06-01") },
  { "creation_date" : MaxKey }, 
  "recent"
)

Define range for older photos and associate it to the archive zone using the sh.addTagRange() method. 定义旧照片的范围,并使用sh.addTagRange()方法将其与archive区域关联。This method requires:该方法要求:

  • the full namespace of the target collection.目标集合的完整命名空间。
  • the inclusive lower bound of the range.范围的包含下限。
  • the exclusive upper bound of the range.范围的唯一上界。
  • the zone.区域。
sh.addTagRange(
  "photoshare.data",
  { "creation_date" : MinKey },
  { "creation_date" : ISODate("2016-06-01") }, 
  "archive"
)

MinKey and MaxKey are reserved special values for comparisons.为比较保留特殊值。

4

Enable the Balancer启用平衡器

Re-enable the balancer to rebalance the cluster.重新启用平衡器以重新平衡群集。

Use sh.enableBalancing(), specifying the namespace of the collection, to start the balancer使用sh.enableBalancing(),指定集合的命名空间,以启动平衡器

sh.enableBalancing("photoshare.data")

Use sh.isBalancerRunning() to check if the balancer process is currently running.使用sh.isBalancerRunning()检查平衡器进程当前是否正在运行。

5

Review the changes回顾变化

The next time the balancer runs, it splits chunks where necessary and migrates chunks across the shards respecting the configured zones.下一次运行平衡器时,它会在必要时分割块,并根据配置的区域跨分片迁移块。

Before balancing, the shards in the recent zone only contained documents with creation_date greater than or equal to ISODate("2016-01-01"), while shards in the archive zone only contained documents with creation_date less than ISODate("2016-01-01").在平衡之前,recent区域中的分片仅包含creation_date大于或等于ISODate("2016-01-01")的文档,而archive区域中的片段仅包含creation_date小于ISODate("2016-01-01")的文档。

Once balancing finishes, the shards in the recent zone should only contain documents with creation_date greater than or equal to ISODate("2016-06-01"), while shards in the archive zone should only contain documents with creation_date less than ISODate("2016-06-01").平衡完成后,recent区域中的分片应仅包含creation_date大于或等于ISODate("2016-06-01")的文档,而archive区域中的片段应仅包含creation_date小于ISODate("2016-06-01")的文档。

You can confirm the chunk distribution by running sh.status().您可以通过运行sh.status()来确认区块分布。

←  Segmenting Data by LocationSegmenting Data by Application or Customer →