Hashed sharding uses either a single field hashed index or a compound hashed index as the shard key to partition data across your sharded cluster.哈希分片使用单个字段哈希索引或复合哈希索引作为分片键,在分片集群中对数据进行分区。
Sharding on a Single Field Hashed Index单字段哈希索引上的分片Hashed sharding provides a more even data distribution across the sharded cluster at the cost of reducing Targeted Operations vs. Broadcast Operations.哈希分片在分片集群中提供了更均匀的数据分布,代价是减少了目标操作与广播操作。Post-hash, documents with "close" shard key values are unlikely to be on the same chunk or shard - the哈希后,具有“close”分片键值的文档不太可能位于同一块或分片上——mongosis more likely to perform Broadcast Operations to fulfill a given ranged query.mongos更有可能执行广播操作来完成给定的范围查询。mongoscan target queries with equality matches to a single shard.可以将相等匹配的查询定位到单个分片。Hashed indexes compute the hash value of a single field as the index value; this value is used as your shard key.哈希索引计算单个字段的哈希值作为索引值;此值用作分片键。[1]Sharding on a Compound Hashed Index复合哈希索引上的分片MongoDB includes support for creating compound indexes with a single hashed field. To create a compound hashed index, specifyMongoDB支持使用单个哈希字段创建复合索引。要创建复合哈希索引,请在创建索引时将hashedas the value of any single index key when creating the index.hashed指定为任何单个索引键的值。Compound hashed index compute the hash value of a single field in the compound index; this value is used along with the other fields in the index as your shard key.复合哈希索引计算复合索引中单个字段的哈希值;此值与索引中的其他字段一起用作分片键。Compound hashed sharding supports features like zone sharding, where the prefix (i.e. first) non-hashed field or fields support zone ranges while the hashed field supports more even distribution of the sharded data.复合散列分片支持区域分片等功能,其中前缀(即第一个)非散列字段支持区域范围,而散列字段支持更均匀的分片数据分布。Compound hashed sharding also supports shard keys with a hashed prefix for resolving data distribution issues related to monotonically increasing fields.复合哈希分片还支持带有哈希前缀的分片键,用于解决与单调递增字段相关的数据分布问题。
Tip
MongoDB automatically computes the hashes when resolving queries using hashed indexes. Applications do not need to compute hashes.MongoDB在使用哈希索引解析查询时会自动计算哈希值。应用程序不需要计算哈希值。
Warning
MongoDB MongoDB哈希索引在哈希之前将浮点数截断为64位整数。hashed indexes truncate floating point numbers to 64-bit integers before hashing. For example, a 例如,hashed index would store the same value for a field that held a value of 2.3, 2.2, and 2.9. hashed索引将为值为2.3、2.2和2.9的字段存储相同的值。To prevent collisions, do not use a 为防止冲突,不要对无法可靠转换为64位整数(然后再转换回浮点)的浮点数使用hashed index for floating point numbers that cannot be reliably converted to 64-bit integers (and then back to floating point). hashed索引。MongoDB MongoDB哈希索引不支持大于253的浮点值。hashed indexes do not support floating point values larger than 2 53.
To see what the hashed value would be for a key, see 要查看键的哈希值,请参阅convertShardKeyToHashed().convertShardKeyToHashed()。
| [1] | mongosh provides the convertShardKeyToHashed() method.mongosh提供了convertShardKeyToHashed()方法。 |
Hashed Sharding Shard Key哈希分片键
The field you choose as your hashed shard key should have a good cardinality, or large number of different values. 您选择作为哈希分片键的字段应该具有良好的基数或大量不同的值。Hashed keys are ideal for shard keys with fields that change monotonically like ObjectId values or timestamps. 哈希键非常适合具有单调变化的字段(如ObjectId值或时间戳)的分片键。A good example of this is the default 一个很好的例子是默认的_id field, assuming it only contains ObjectId values._id字段,假设它只包含ObjectId值。
To shard a collection using a hashed shard key, see Shard a Collection.要使用哈希分片键对集合进行分片,请参阅分片集合。
Hashed vs Ranged Sharding哈希分片和范围分片
Given a collection using a monotonically increasing value 给定一个使用单调递增的值X as the shard key, using ranged sharding results in a distribution of incoming inserts similar to the following:X作为分片键的集合,使用范围分片会导致传入插入的分布类似于以下内容:
Since the value of 由于X is always increasing, the chunk with an upper bound of MaxKey receives the majority incoming writes. X的值总是在增加,因此上限为MaxKey的块会接收大多数传入写入。This restricts insert operations to the single shard containing this chunk, which reduces or removes the advantage of distributed writes in a sharded cluster.这将插入操作限制在包含此块的单个分片上,从而减少或消除了分片集群中分布式写入的优势。
By using a hashed index on 通过在X, the distribution of inserts is similar to the following:X上使用哈希索引,插入的分布类似于以下内容:
Since the data is now distributed more evenly, inserts are efficiently distributed throughout the cluster.由于数据现在分布得更加均匀,插入可以有效地分布在整个集群中。
Shard the Collection分割集合
Use the 使用sh.shardCollection() method, specifying the full namespace of the collection and the target hashed index to use as the shard key.sh.shardCollection()方法,指定集合的完整命名空间和用作分片键的目标哈希索引。
sh.shardCollection( "database.collection", { <field> : "hashed" } )
To shard a collection on a compound hashed index, specify the full namespace of the collection and the target compound hashed index to use as the shard key:要在复合哈希索引上对集合进行分片,请指定集合的完整命名空间和用作分片键的目标复合哈希索引:
sh.shardCollection(
"database.collection",
{ "fieldA" : 1, "fieldB" : 1, "fieldC" : "hashed" }
)
Important
Starting in MongoDB 5.0, you can reshard a collection by changing a collection's shard key.从MongoDB 5.0开始,您可以通过更改集合的分片键来重新分片集合。You can refine a shard key by adding a suffix field or fields to the existing shard key.您可以通过向现有分片键添加一个或多个后缀字段来细化分片键。
Shard a Populated Collection分片化集合
If you shard a populated collection using a hashed shard key:如果使用哈希分片键对已填充的集合进行分片:
The sharding operation creates an initial chunk to cover all of the shard key values.分片操作创建一个初始块来覆盖所有分片键值。After the initial chunk creation, the balancer moves ranges of the initial chunk when it needs to balance data.在初始块创建之后,平衡器在需要平衡数据时移动初始块的范围。
Shard an Empty Collection空的集合分片
The shard collection operation can perform an initial chunk creation and distribution for empty or non-existing collections if zones and zone ranges have been defined for the collection. 如果为集合定义了区域和区域范围,则分片集合操作可以为空或不存在的集合执行初始块创建和分发。Initial creation and distribution of chunk allows for faster setup of zoned sharding. After the initial distribution, the balancer manages the chunk distribution going forward per usual.块的初始创建和分发允许更快地设置分区分片。在初始分发之后,平衡器按照惯例管理块分发。
Sharding Empty Collection on Single Field Hashed Shard Key单字段哈希分片键上的空集合分片-
With no zones and zone ranges specified for the empty or non-existing collection:没有为空或不存在的集合指定区域和区域范围:The sharding operation creates an empty chunk to cover the entire range of the shard key values. Starting in version 8.0, the operation creates 1 chunk per shard by default and migrates across the cluster.分片操作创建了一个空块来覆盖分片键值的整个范围。从8.0版本开始,该操作默认为每个分片创建1个块,并在集群中迁移。You can use您可以使用numInitialChunksoption to specify a different number of initial chunks and cause an initial chunk distribution.numInitialChunks选项指定不同数量的初始块,并导致初始块分布。This initial creation and distribution of chunks allows for faster setup of sharding.这种块的初始创建和分发允许更快地设置分片。After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,平衡器管理接下来的块分发。
With zones and zone ranges specified for the empty or a non-existing collection:为空集合或不存在的集合指定区域和区域范围:The sharding operation creates empty chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. This initial creation and distribution of chunks allows for faster setup of zoned sharding.分片操作为定义的区域范围创建空块以及任何额外的块,以覆盖分片键值的整个范围,并根据区域范围执行初始块分布。这种块的初始创建和分发允许更快地设置分区分片。After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,平衡器管理接下来的块分发。
Sharding Empty Collection on Compound Hashed Shard Key with Hashed Field Prefix在具有哈希字段前缀的复合哈希分片键上分片空集合If the compound hashed shard key has the hashed field as the prefix (the hashed field is the first field in the shard key):如果复合哈希分片键的前缀是哈希字段(哈希字段是分片键中的第一个字段):With no zones and zone ranges specified for the empty or non-existing collection:没有为空或不存在的集合指定区域和区域范围:The sharding operation creates empty chunks to cover the entire range of the shard key values and performs an initial chunk distribution.分片操作创建空块以覆盖分片键值的整个范围,并执行初始块分布。The value of all non-hashed fields is所有非散列字段的值在每个分割点都是MinKeyat each split point.MinKey。Starting in version 8.0, the operation creates 1 chunk per shard by default and migrates across the cluster.从8.0版本开始,该操作默认为每个分片创建1个块,并在集群中迁移。You can use您可以使用numInitialChunksoption to specify a different number of initial chunks and cause an initial chunk distribution. This initial creation and distribution of chunks allows for faster setup of sharding.numInitialChunks选项指定不同数量的初始块,并导致初始块分布。这种块的初始创建和分发允许更快地设置分片。After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,平衡器管理接下来的块分发。
With a single zone with a range from为空集合或不存在的集合指定一个范围为MinKeytoMaxKeyspecified for the empty or a non-existing collection and thepresplitHashedZonesoption specified tosh.shardCollection():MinKey到MaxKey的单一区域,并为sh.shardCollection()指定presplitHashedZones选项:The sharding operation creates empty chunks for the defined zone range as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. This initial creation and distribution of chunks allows for faster setup of zoned sharding.分片操作为定义的区域范围创建空块以及任何其他块,以覆盖分片键值的整个范围,并根据区域范围执行初始块分布。这种块的初始创建和分发允许更快地设置分区分片。After the initial distribution, the balancer manages the chunk distribution going forward.在初始分发之后,平衡器管理接下来的块分发。
Sharding Empty Collection on Compound Hashed Shard Key with Non-Hashed Prefix使用非哈希前缀的复合哈希分片键对空集合进行分片If the compound hashed shard key has one or more non-hashed fields as the prefix (i.e. the hashed field is not the first field in the shard key):如果复合哈希分片键有一个或多个非哈希字段作为前缀(即哈希字段不是分片键中的第一个字段):With no zones and zone ranges specified for the empty or non-existing collection and preSplitHashedZones is由于没有为空或不存在的集合指定区域和区域范围,并且falseor omitted, MongoDB does not perform any initial chunk creation or distribution when sharding the collection.preSplitHashedZones为false或省略,MongoDB在对集合进行分片时不会执行任何初始块创建或分发。With no zones and zone ranges specified for the empty or non-existing collection and preSplitHashedZones,如果没有为空或不存在的集合和sh.shardCollection()/shardCollectionreturns an error.preSplitHashedZones指定区域和区域范围,shshardCollection()/sharedCollection将返回错误。With zones and zone ranges specified for the empty or a non-existing collection and the preSplitHashedZones option specified to为空集合或不存在的集合指定区域和区域范围,并为sh.shardCollection():sh.shardCollection()指定preSplitHashedZones选项:The sharding operation creates empty chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values.分片操作为定义的区域范围创建空块,以及覆盖整个分片键值范围的任何其他块。The sharding operation further subdivides the initial chunk for each range, such that each shard in the zone is allocated an equal number of chunks.分片操作进一步细分每个范围的初始块,以便为区域中的每个分片分配相等数量的块。This initial creation and distribution of chunks allows for faster setup of zoned sharding. After the initial distribution, the balancer manages the chunk distribution going forward.这种块的初始创建和分发允许更快地设置分区分片。在初始分发之后,平衡器管理接下来的块分发。
The defined ranges for each zone must meet certain requirements. For a description of the requirements and a complete example, see Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection.每个区域的定义范围必须满足某些要求。有关要求的描述和完整示例,请参阅空集合或不存在集合的预定义区域和区域范围。
Drop a Hashed Shard Key Index删除哈希分片键索引
Starting in MongoDB 7.0.3 (and 6.0.12 and 5.0.22), you can drop the index for a hashed shard key.从MongoDB 7.0.3(以及6.0.12和5.0.22)开始,您可以删除哈希分片键的索引。
This can speed up data insertion for collections sharded with a hashed shard key.这可以加速使用哈希分片键分片的集合的数据插入。
For details, see Drop a Hashed Shard Key Index.有关详细信息,请参阅删除哈希分片键索引。
Tip
To learn how to deploy a sharded cluster and implement hashed sharding, see Deploy a Self-Managed Sharded Cluster.要了解如何部署分片集群和实现哈希分片,请参阅部署自管理分片集群。