On this page本页内容
The choice of shard key affects the creation and distribution of chunks across the available shards. 分片键的选择会影响块在可用分片中的创建和分布。The distribution of data affects the efficiency and performance of operations within the sharded cluster.数据的分布会影响分片集群内操作的效率和性能。
The ideal shard key allows MongoDB to distribute documents evenly throughout the cluster while also facilitating common query patterns.理想的分片键允许MongoDB在整个集群中均匀分布文档,同时也方便了常见的查询模式。
When you choose your shard key, consider:选择分片键时,请考虑:
reshardCollection
command.reshardCollection
命令更改分片键并重新分发数据。refineCollectionShardKey
command to refine a collection's shard key. refineCollectionShardKey
命令优化集合的分片键。refineCollectionShardKey
command adds a suffix field or fields to the existing key to create the new shard key.refineCollectionShardKey
命令向现有键添加一个或多个后缀字段,以创建新的分片键。_id
field._id
字段。The cardinality of a shard key determines the maximum number of chunks the balancer can create. 分片键的基数决定了平衡器可以创建的最大块数。Where possible, choose a shard key with high cardinality. 在可能的情况下,选择基数较高的分片键。A shard key with low cardinality reduces the effectiveness of horizontal scaling in the cluster.基数较低的分片键会降低集群中水平缩放的效率。
Each unique shard key value can exist on no more than a single chunk at any given time. 在任何给定时间,每个唯一的分片键值只能存在于单个块上。Consider a dataset that contains user data with a 考虑一个包含带有code>continent(大陆)字段的用户数据的数据集。continent
field. If you chose to shard on 如果选择在code>continent上分片,则分片键的基数为7。continent
, the shard key would have a cardinality of 7
. A cardinality of 基数为7
means there can be no more than 7
chunks within the sharded cluster, each storing one unique shard key value. 7
意味着分片集群中不能有超过7
个块,每个块存储一个唯一的分片键值。This constrains the number of effective shards in the cluster to 这也限制了集群中有效分片的数量为7
as well - adding more than seven shards would not provide any benefit.7
个-添加7个以上的分片不会带来任何好处。
The following image illustrates a sharded cluster using the field 下图显示了使用字段X
as the shard key. X
作为分片键的分片群集。If 如果X
has low cardinality, the distribution of inserts may look similar to the following:X
的基数较低,则插入的分布可能类似于以下内容:
If your data model requires sharding on a key that has low cardinality, consider using an indexed compound of fields to increase cardinality.如果您的数据模型需要对基数较低的键进行分片,请考虑使用字段的索引组合来增加基数。
A shard key with high cardinality does not, on its own, guarantee even distribution of data across the sharded cluster. 具有高基数的分片键本身并不能保证数据在分片集群中均匀分布。The frequency of the shard key and the potential for monotonically changing shard key values also contribute to the distribution of the data.分片键的频率和分片键值单调变化的可能性也会影响数据的分布。
The 分片键的frequency
of the shard key represents how often a given shard key value occurs in the data. frequency
(频率)表示给定的分片键值在数据中出现的频率。If the majority of documents contain only a subset of the possible shard key values, then the chunks storing the documents with those values can become a bottleneck within the cluster. 如果大多数文档只包含可能的shard键值的子集,那么使用这些值存储文档的块可能会成为集群中的瓶颈。Furthermore, as those chunks grow, they may become indivisible chunks as they cannot be split any further. 此外,随着这些块的增长,它们可能会成为不可分割的块,因为它们无法进一步分割。This reduces the effectiveness of horizontal scaling within the cluster.这会降低集群内水平扩展的有效性。
The following image illustrates a sharded cluster using the field 下图显示了使用字段X
as the shard key. X
作为分片键的分片群集。If a subset of values for 如果X
occur with high frequency, the distribution of inserts may look similar to the following:X
值的子集出现频率很高,则插入的分布可能类似于以下情况:
If your data model requires sharding on a key that has high frequency values, consider using a compound index using a unique or low frequency value.如果您的数据模型需要对具有高频率值的键进行分片,请考虑使用使用唯一或低频率值的复合索引。
A shard key with low frequency does not, on its own, guarantee even distribution of data across the sharded cluster. 频率较低的分片键本身不能保证数据在分片集群中均匀分布。The cardinality of the shard key and the potential for monotonically changing shard key values also contribute to the distribution of the data.分片键的基数和分片键值单调变化的可能性也有助于数据的分布。
A shard key on a value that increases or decreases monotonically is more likely to distribute inserts to a single chunk within the cluster.单调递增或递减的值上的分片键更有可能将插入分布到集群中的单个块。
This occurs because every cluster has a chunk that captures a range with an upper bound of maxKey. 出现这种情况是因为每个集群都有一个块,该块捕获的范围的上限为maxKey
。maxKey
always compares as higher than all other values. 始终比所有其他值都高。Similarly, there is a chunk that captures a range with a lower bound of minKey. 类似地,有一个块捕获一个范围,其下限为minKey
。minKey
always compares as lower than all other values.minKey
总是比所有其他值都低。
If the shard key value is always increasing, all new inserts are routed to the chunk with 如果分片键值始终在增加,则所有新插入都将路由到以maxKey
as the upper bound. maxKey
为上限的区块。If the shard key value is always decreasing, all new inserts are routed to the chunk with 如果分片键值始终在减小,则所有新插入都将路由到以minKey
as the lower bound. minKey
作为下限的区块。The shard containing that chunk becomes the bottleneck for write operations.包含该区块的分片成为写入操作的瓶颈。
To optimize data distribution, the chunks that contain the global 为了优化数据分布,包含全局maxKey
(or minKey
) do not stay on the same shard. maxKey
(或minKey
)的块不会保留在同一个分片上。When a chunk is split, the new chunk with the 分割区块时,具有maxKey
(or minKey
) chunk is located on a different shard.maxKey
(或minKey
)区块的新区块位于不同的分片上。
The following image illustrates a sharded cluster using the field 下图显示了使用字段X
as the shard key. X
作为分片键的分片群集。If the values for 如果X
are monotonically increasing, the distribution of inserts may look similar to the following:X
的值是单调递增的,则插入的分布可能类似于以下情况:
If the shard key value was monotonically decreasing, then all inserts would route to 如果分片键值是单调递减的,那么所有插入都将路由到Chunk A
instead.Chunk A
。
If your data model requires sharding on a key that changes monotonically, consider using Hashed Sharding.如果您的数据模型需要对单调变化的键进行分片,请考虑使用哈希分片。
A shard key that does not change monotonically does not, on its own, guarantee even distribution of data across the sharded cluster. 不单调变化的分片键本身并不能保证数据在分片集群中均匀分布。The cardinality and frequency of the shard key also contribute to the distribution of the data.分片键的基数和频率也有助于数据的分布。
The ideal shard key distributes data evenly across the sharded cluster while also facilitating common query patterns. 理想的分片键将数据均匀地分布在分片的集群中,同时也方便了常见的查询模式。When you choose a shard key, consider your most common query patterns and whether a given shard key covers them.选择分片键时,请考虑最常见的查询模式,以及给定的分片键是否覆盖它们。
In a sharded cluster, the 在分片集群中,如果查询包含分片键,mongos
routes queries to only the shards that contain the relevant data if the queries contain the shard key. mongos
将仅将查询路由到包含相关数据的分片。When the queries do not contain the shard key, the queries are broadcast to all shards for evaluation. 当查询不包含分片键时,查询将广播到所有分片以进行评估。These types of queries are called scatter-gather queries. 这些类型的查询称为分散-聚集查询。Queries that involve multiple shards for each request are less efficient and do not scale linearly when more shards are added to the cluster.对于每个请求都涉及多个分片的查询效率较低,并且当向集群中添加更多分片时,查询不会线性扩展。
This does not apply for aggregation queries that operate on a large amount of data. 这不适用于对大量数据进行操作的聚合查询。In these cases, scatter-gather can be a useful approach that allows the query to run in parallel on all shards.在这些情况下,分散聚集是一种有用的方法,它允许查询在所有分片上并行运行。