Hashed Sharding哈希分片

Hashed sharding uses either a single field hashed index or a compound hashed index (New in 4.4) as the shard key to partition data across your sharded cluster.散列分片使用单字段散列索引复合散列索引(4.4中新增)作为分片键,在分片集群中对数据进行分区。

Sharding on a Single Field Hashed Index在单个字段散列索引上分片

Hashed sharding provides a more even data distribution across the sharded cluster at the cost of reducing Targeted Operations vs. Broadcast Operations. 哈希分片在整个分片集群中提供了更均匀的数据分布,但与广播操作相比,减少了目标操作Post-hash, documents with "close" shard key values are unlikely to be on the same chunk or shard - the mongos is more likely to perform Broadcast Operations to fulfill a given ranged query. 散列后,具有“close”shard键值的文档不太可能位于同一块或shard上-mongos更可能执行广播操作来完成给定的范围查询。mongos can target queries with equality matches to a single shard.mongos可以将相等匹配的查询定位到单个shard。

Diagram of the hashed based segmentation.

Hashed indexes compute the hash value of a single field as the index value; this value is used as your shard key. 散列索引计算单个字段的散列值作为索引值;此值用作分片键。[1]

Sharding on a Compound Hashed Index在复合哈希索引上分片

MongoDB 4.4 adds support for creating compound indexes with a single hashed field. MongoDB 4.4增加了对使用单个哈希字段创建复合索引的支持。To create a compound hashed index, specify hashed as the value of any single index key when creating the index.要创建复合哈希索引,请在创建索引时将hashed指定为任何单个索引键的值。

Compound hashed index compute the hash value of a single field in the compound index; this value is used along with the other fields in the index as your shard key.复合散列索引:计算复合索引中单个字段的散列值;该值与索引中的其他字段一起用作分片键。

Compound hashed sharding supports features like zone sharding, where the prefix (i.e. first) non-hashed field or fields support zone ranges while the hashed field supports more even distribution of the sharded data. 复合散列分片支持区域分片等功能,其中前缀(即第一个)非散列字段或字段支持区域范围,而散列字段支持更均匀的分片数据分布。Compound hashed sharding also supports shard keys with a hashed prefix for resolving data distribution issues related to monotonically increasing fields.复合哈希分片还支持具有哈希前缀的分片键,用于解决与单调递增字段相关的数据分布问题。

Tip提示

MongoDB automatically computes the hashes when resolving queries using hashed indexes. MongoDB在使用哈希索引解析查询时自动计算哈希。Applications do not need to compute hashes.应用程序不需要计算哈希。

Warning警告

MongoDB hashed indexes truncate floating point numbers to 64-bit integers before hashing. MongoDBhashed索引在哈希之前将浮点数截断为64位整数。For example, a hashed index would store the same value for a field that held a value of 2.3, 2.2, and 2.9. 例如,hashed索引将为一个值为2.32.22.9的字段存储相同的值。To prevent collisions, do not use a hashed index for floating point numbers that cannot be reliably converted to 64-bit integers (and then back to floating point). 为防止冲突,请勿对无法可靠转换为64位整数(然后再转换回浮点)的浮点数使用hashed索引。MongoDB hashed indexes do not support floating point values larger than 2 53.MongoDBhashed索引不支持大于253的浮点值。

To see what the hashed value would be for a key, see convertShardKeyToHashed().要查看键的哈希值,请参阅convertShardKeyToHashed()

[1] Starting in version 4.0, mongosh provides the method convertShardKeyToHashed(). 从版本4.0开始,mongosh提供了convertShardKeyToHashed()方法。This method uses the same hashing function as the hashed index and can be used to see what the hashed value would be for a key.此方法使用与散列索引相同的散列函数,并可用于查看键的散列值。

Hashed Sharding Shard Key散列分片键

The field you choose as your hashed shard key should have a good cardinality, or large number of different values. 选择作为哈希分片键的字段应具有良好的基数或大量不同的值。Hashed keys are ideal for shard keys with fields that change monotonically like ObjectId values or timestamps. 哈希键非常适合于字段单调变化的分片键,如ObjectId值或时间戳。A good example of this is the default _id field, assuming it only contains ObjectId values.一个很好的例子是默认_id字段,假设它只包含ObjectId值。

To shard a collection using a hashed shard key, see Shard a Collection.要使用哈希分片键分片集合,请参阅分片集合

Hashed vs Ranged Sharding散列与远程分片

Given a collection using a monotonically increasing value X as the shard key, using ranged sharding results in a distribution of incoming inserts similar to the following:给定使用单调递增值X作为分片键的集合,使用范围分片将导致传入插入的分布,类似于以下内容:

Diagram of poor shard key distribution due to monotonically increasing or decreasing shard key

Since the value of X is always increasing, the chunk with an upper bound of maxKey receives the majority incoming writes. 由于X的值总是在增加,因此maxKey上限的块接收大多数传入写入。This restricts insert operations to the single shard containing this chunk, which reduces or removes the advantage of distributed writes in a sharded cluster.这将插入操作限制在包含该块的单个分片上,从而减少或消除了分片集群中分布式写入的优势。

By using a hashed index on X, the distribution of inserts is similar to the following:通过使用X上的散列索引,插入的分布类似于以下内容:

Diagram of hashed shard key distribution

Since the data is now distributed more evenly, inserts are efficiently distributed throughout the cluster.由于现在数据分布更加均匀,因此插入可以有效地分布在整个集群中。

Shard the Collection分片集合

Use the sh.shardCollection() method, specifying the full namespace of the collection and the target hashed index to use as the shard key.使用sh.shardCollection()方法,指定集合的完整名称空间和要用作分片键的目标哈希索引

sh.shardCollection( "database.collection", { <field> : "hashed" } )

To shard a collection on a compound hashed index, specify the full namespace of the collection and the target compound hashed index to use as the shard key:要在复合哈希索引上分片集合,请指定集合的完整名称空间和目标复合哈希索引以用作分片键

sh.shardCollection(
  "database.collection",
  { "fieldA" : 1, "fieldB" : 1, "fieldC" : "hashed" }
)
Important重要
  • Starting in MongoDB 5.0, you can reshard a collection by changing a collection's shard key.从MongoDB 5.0开始,您可以通过更改集合的分片键来重新分片集合
  • Starting in MongoDB 4.4, you can refine a shard key by adding a suffix field or fields to the existing shard key.从MongoDB 4.4开始,您可以通过向现有分片键添加一个或多个后缀字段来优化分片键
  • In MongoDB 4.2 and earlier, the choice of shard key cannot be changed after sharding.在MongoDB 4.2及更早版本中,分片后不能更改分片键的选择。

Shard a Populated Collection分片已填充的集合

If you shard a populated collection using a hashed shard key:如果使用哈希分片键分片已填充的集合:

  • The sharding operation creates the initial chunk(s) to cover the entire range of the shard key values. 分片操作创建初始块以覆盖整个分片键值范围。The number of chunks created depends on the configured chunk size.创建的块数取决于配置的块大小
  • After the initial chunk creation, the balancer migrates these initial chunks across the shards as appropriate as well as manages the chunk distribution going forward.在初始块创建之后,平衡器会在适当的情况下跨分片迁移这些初始块,并管理后续的块分布。

Shard an Empty Collection分割空集合

Starting in MongoDB 4.0.3, the shard collection operation can perform an initial chunk creation and distribution for empty or non-existing collections if zones and zone ranges have been defined for the collection. 从MongoDB 4.0.3开始,如果为集合定义了区域和区域范围,分片集合操作可以为空集合或不存在的集合执行初始块创建和分发。Initial creation and distribution of chunk allows for faster setup of zoned sharding. 块的初始创建和分发允许更快地设置分区分片。After the initial distribution, the balancer manages the chunk distribution going forward per usual.在初始分发之后,平衡器按照通常的顺序管理块分发。

Sharding Empty Collection on Single Field Hashed Shard Key在单字段哈希分片键上分片空集合
  • With no zones and zone ranges specified for the empty or non-existing collection:没有为空集合或不存在集合指定区域和区域范围

    • The sharding operation creates empty chunks to cover the entire range of the shard key values and performs an initial chunk distribution. 分片操作创建空块以覆盖整个分片键值范围,并执行初始块分布。By default, the operation creates 2 chunks per shard and migrates across the cluster. 默认情况下,该操作为每个分片创建2个块,并跨集群迁移。You can use numInitialChunks option to specify a different number of initial chunks. 您可以使用numInitialChunks选项指定不同数量的初始块。This initial creation and distribution of chunks allows for faster setup of sharding.这种块的初始创建和分布允许更快地设置分片。
    • After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,均衡器将继续管理块分发。
  • With zones and zone ranges specified for the empty or a non-existing collection (Available starting in MongoDB 4.0.3):使用为空集合或不存在集合指定的区域和区域范围(从MongoDB 4.0.3开始提供):

    • The sharding operation creates empty chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. 分片操作为定义的区域范围创建空块以及覆盖整个分片键值范围的任何附加块,并基于区域范围执行初始块分布。This initial creation and distribution of chunks allows for faster setup of zoned sharding.这种块的初始创建和分布允许更快地设置分区分片。
    • After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,均衡器将继续管理块分发。
Sharding Empty Collection on Compound Hashed Shard Key with Hashed Field Prefix在具有散列字段前缀的复合散列分片键上分片空集合

If the compound hashed shard key has the hashed field as the prefix (i.e. the hashed field is the first field in the shard key):如果复合散列分片键具有散列字段作为前缀(即,散列字段是分片键中的第一个字段):

  • With no zones and zone ranges specified for the empty or non-existing collection:没有为空集合或不存在集合指定区域和区域范围

    • The sharding operation creates empty chunks to cover the entire range of the shard key values and performs an initial chunk distribution. 分片操作创建空块以覆盖整个分片键值范围,并执行初始块分布。The value of all non-hashed fields is MinKey at each split point. 所有非散列字段的值在每个分割点都是MinKeyBy default, the operation creates 2 chunks per shard and migrates across the cluster. 默认情况下,该操作为每个分片创建2个块,并跨集群迁移。You can use numInitialChunks option to specify a different number of initial chunks. 您可以使用numInitialChunks选项指定不同数量的初始块。This initial creation and distribution of chunks allows for faster setup of sharding.这种块的初始创建和分布允许更快地设置分片。
    • After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,均衡器将继续管理块分发。
  • With a single zone with a range from MinKey to MaxKey specified for the empty or a non-existing collection andthe presplitHashedZones option specified to sh.shardCollection():对于为空集合或不存在的集合指定的范围为MinKeyMaxKey单个区域,以及为sh.shardCollection()指定的presplitHashedZones选项:

    • The sharding operation creates empty chunks for the defined zone range as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. This initial creation and distribution of chunks allows for faster setup of zoned sharding.分片操作为定义的区域范围创建空块以及覆盖整个分片键值范围的任何附加块,并基于区域范围执行初始块分布。这种块的初始创建和分布允许更快地设置分区分片。
    • After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,均衡器将继续管理块分发。
Sharding Empty Collection on Compound Hashed Shard Key with Non-Hashed Prefix在具有非哈希前缀的复合哈希分片键上分片空集合

If the compound hashed shard key has one or more non-hashed fields as the prefix (i.e. the hashed field is not the first field in the shard key):如果复合散列分片键具有一个或多个非散列字段作为前缀(即,散列字段不是分片键中的第一个字段):

Tip提示
See also: 参阅:

To learn how to deploy a sharded cluster and implement hashed sharding, see Deploy a Sharded Cluster.要了解如何部署分片集群并实现哈希分片,请参阅部署分片群集

←  Troubleshoot Shard KeysRanged Sharding →