Choose a Shard Key选择分片键
On this page本页内容
The choice of shard key affects the creation and distribution of chunks across the available shards. 分片键的选择会影响区块在可用分片中的创建和分布。The distribution of data affects the efficiency and performance of operations within the sharded cluster.数据的分布会影响分片集群中操作的效率和性能。
The ideal shard key allows MongoDB to distribute documents evenly throughout the cluster while also facilitating common query patterns.理想的分片键允许MongoDB在集群中均匀地分布文档,同时也方便了常见的查询模式。
When you choose your shard key, consider:选择分片键时,请考虑:
the cardinality of the shard key分片键的基数the frequency with which shard key values occur分片键值出现的频率whether a potential shard key grows monotonically潜在的分片键是否单调增长Sharding Query Patterns分片查询模式Shard Key Limitations分片键限制
Starting in MongoDB 5.0, you can change your shard key and redistribute your data using the从MongoDB 5.0开始,您可以使用reshardCollection
command.reshardCollection
命令更改分片键并重新分发数据。Starting in MongoDB 4.4, you can use the从MongoDB 4.4开始,您可以使用refineCollectionShardKey
command to refine a collection's shard key.refineCollectionShardKey
命令来细化集合的分片键。TherefineCollectionShardKey
command adds a suffix field or fields to the existing key to create the new shard key.refineCollectionShardKey
命令向现有键添加一个或多个后缀字段,以创建新的分片键。In MongoDB 4.2 and earlier, once you shard a collection, the selection of the shard key is immutable.在MongoDB 4.2及更早版本中,一旦对集合进行了分片,则对分片键的选择是不可变的。Starting in MongoDB 4.2, you can update a document's shard key value unless the shard key field is the immutable从MongoDB 4.2开始,您可以更新文档的分片键值,除非分片键字段是不可变的_id
field._id
字段。
Shard Key Cardinality分片键基数
The cardinality of a shard key determines the maximum number of chunks the balancer can create. 分片键的基数决定了平衡器可以创建的最大块数。Where possible, choose a shard key with high cardinality. 在可能的情况下,选择具有高基数的分片键。A shard key with low cardinality reduces the effectiveness of horizontal scaling in the cluster.基数较低的分片键会降低集群中水平缩放的有效性。
Each unique shard key value can exist on no more than a single chunk at any given time. 在任何给定时间,每个唯一的分片键值最多只能存在于一个区块上。Consider a dataset that contains user data with a 考虑一个包含带有continent
field. continent
字段的用户数据的数据集。If you chose to shard on 如果您选择在continent
, the shard key would have a cardinality of 7
. continent
上进行分片,则分片键的基数为7
。A cardinality of 基数为7
means there can be no more than 7
chunks within the sharded cluster, each storing one unique shard key value. 7
意味着在分片集群中不能有超过7
个块,每个块存储一个唯一的分片键值。This constrains the number of effective shards in the cluster to 这也将集群中有效分片的数量限制为7
as well - adding more than seven shards would not provide any benefit.7
个——添加超过7个分片不会带来任何好处。
The following image illustrates a sharded cluster using the field 下图显示了使用字段X
as the shard key. X
作为分片键的分片群集。If 如果X
has low cardinality, the distribution of inserts may look similar to the following:X
的基数较低,则插入的分布可能类似于以下内容:
If your data model requires sharding on a key that has low cardinality, consider using an indexed compound of fields to increase cardinality.如果您的数据模型需要对基数较低的键进行分片,请考虑使用带索引的字段组合来增加基数。
A shard key with high cardinality does not, on its own, guarantee even distribution of data across the sharded cluster. 具有高基数的分片键本身并不能保证数据在分片集群中的均匀分布。The frequency of the shard key and the potential for monotonically changing shard key values also contribute to the distribution of the data.分片键的频率和单调改变分片键值的可能性也有助于数据的分布。
Shard Key Frequency分片键频率
The 分片键的frequency
of the shard key represents how often a given shard key value occurs in the data. frequency
表示给定分片键值在数据中出现的频率。If the majority of documents contain only a subset of the possible shard key values, then the chunks storing the documents with those values can become a bottleneck within the cluster. 如果大多数文档只包含可能的分片键值的一个子集,那么存储具有这些值的文档的块可能会成为集群中的瓶颈。Furthermore, as those chunks grow, they may become indivisible chunks as they cannot be split any further. 此外,随着这些区块的增长,它们可能会成为不可分割的区块,因为它们无法进一步分割。This reduces the effectiveness of horizontal scaling within the cluster.这降低了集群内水平缩放的有效性。
The following image illustrates a sharded cluster using the field 下图显示了使用字段X
as the shard key. X
作为分片键的分片群集。If a subset of values for 如果X
occur with high frequency, the distribution of inserts may look similar to the following:X
的值的子集以高频率出现,则插入的分布可能类似于以下内容:
If your data model requires sharding on a key that has high frequency values, consider using a compound index using a unique or low frequency value.如果数据模型需要对具有高频值的键进行分片,请考虑使用使用唯一或低频值的复合索引。
A shard key with low frequency does not, on its own, guarantee even distribution of data across the sharded cluster. 低频率的分片键本身并不能保证数据在分片集群中的均匀分布。The cardinality of the shard key and the potential for monotonically changing shard key values also contribute to the distribution of the data.分片键的基数和单调改变分片键值的可能性也有助于数据的分布。
Monotonically Changing Shard Keys单调更改分片键
A shard key on a value that increases or decreases monotonically is more likely to distribute inserts to a single chunk within the cluster.值单调增加或减少的分片键更有可能将插入分配到集群中的单个块。
This occurs because every cluster has a chunk that captures a range with an upper bound of 之所以会出现这种情况,是因为每个集群都有一个块,它捕获的范围的上限为MaxKey
. MaxKey
。maxKey
always compares as higher than all other values. 总是比较为高于所有其他值。Similarly, there is a chunk that captures a range with a lower bound of 类似地,还有一个块捕获下限为MinKey
. MinKey
的范围。minKey
always compares as lower than all other values.始终是比较为低于所有其他值。
If the shard key value is always increasing, all new inserts are routed to the chunk with 如果分片键值总是在增加,那么所有新插入都会路由到以maxKey
as the upper bound. maxKey
为上限的区块。If the shard key value is always decreasing, all new inserts are routed to the chunk with 如果分片键值总是在减少,那么所有新插入都会路由到以minKey
as the lower bound. minKey
为下限的区块。The shard containing that chunk becomes the bottleneck for write operations.包含该区块的分片成为写入操作的瓶颈。
To optimize data distribution, the chunks that contain the global 为了优化数据分布,包含全局maxKey
(or minKey
) do not stay on the same shard. When a chunk is split, the new chunk with the maxKey
(or minKey
) chunk is located on a different shard.maxKey
(或minKey
)的块不会停留在同一个分片上。分割区块时,具有maxKey
(或minKey
)区块的新区块位于不同的分片上。
The following image illustrates a sharded cluster using the field 下图显示了使用字段X
as the shard key. X
作为分片键的分片群集。If the values for 如果X
are monotonically increasing, the distribution of inserts may look similar to the following:X
的值单调增加,则插入的分布可能类似于以下内容:
If the shard key value was monotonically decreasing, then all inserts would route to 如果分片键值单调递减,那么所有插入都将路由到Chunk A
instead.Chunk A
。
If your data model requires sharding on a key that changes monotonically, consider using Hashed Sharding.如果您的数据模型需要对单调变化的键进行分片,请考虑使用哈希分片。
A shard key that does not change monotonically does not, on its own, guarantee even distribution of data across the sharded cluster. 不单调变化的分片键本身并不能保证数据在分片集群中的均匀分布。The cardinality and frequency of the shard key also contribute to the distribution of the data.分片键的基数和频率也有助于数据的分布。
Sharding Query Patterns分片查询模式
The ideal shard key distributes data evenly across the sharded cluster while also facilitating common query patterns. 理想的分片键在分片集群中均匀地分布数据,同时也促进了常见的查询模式。When you choose a shard key, consider your most common query patterns and whether a given shard key covers them.当您选择一个分片键时,请考虑您最常见的查询模式,以及给定的分片键是否覆盖了它们。
In a sharded cluster, the 在分片集群中,如果查询包含分片键,mongos
routes queries to only the shards that contain the relevant data if the queries contain the shard key. mongos
将查询路由到仅包含相关数据的分片。When the queries do not contain the shard key, the queries are broadcast to all shards for evaluation. 当查询不包含分片键时,查询将广播到所有分片进行评估。These types of queries are called scatter-gather queries. 这些类型的查询称为分散-聚集查询。Queries that involve multiple shards for each request are less efficient and do not scale linearly when more shards are added to the cluster.对于每个请求涉及多个分片的查询效率较低,并且在向集群添加更多分片时不会线性扩展。
This does not apply for aggregation queries that operate on a large amount of data. 这不适用于对大量数据进行操作的聚合查询。In these cases, scatter-gather can be a useful approach that allows the query to run in parallel on all shards.在这些情况下,散射-聚集可能是一种有用的方法,它允许查询在所有分片上并行运行。