Database Manual / Reference / Database Commands / Sharding

analyzeShardKey (database command数据库命令)

Definition定义

analyzeShardKey

New in version 7.0.在版本7.0中新增。

Calculates metrics for evaluating a shard key for an unsharded or sharded collection. Metrics are based on sampled queries. You can use configureQueryAnalyzer to configure query sampling on a collection.计算用于评估未分片或分片集合的分片键的指标。度量基于抽样查询。您可以使用configureQueryAnalyzer在集合上配置查询采样。

Compatibility兼容性

This command is available in deployments hosted in the following environments:此命令在以下环境中托管的部署中可用:

  • MongoDB Atlas: The fully managed service for MongoDB deployments in the cloud:云中MongoDB部署的完全托管服务

Note

This command is supported in all MongoDB Atlas clusters. 所有MongoDB Atlas集群都支持此命令。For information on Atlas support for all commands, see Unsupported Commands.有关Atlas支持所有命令的信息,请参阅不支持的命令

  • MongoDB Enterprise: The subscription-based, self-managed version of MongoDB:MongoDB的基于订阅的自我管理版本
  • MongoDB Community: The source-available, free-to-use, and self-managed version of MongoDB:MongoDB的源代码可用、免费使用和自我管理版本

Syntax语法

analyzeShardKey has this syntax:具有以下语法:

db.collection.analyzeShardKey(
<shardKey>,
{
keyCharacteristics: <bool>,
readWriteDistribution: <bool>,
sampleRate: <double>,
sampleSize: <int>
}
)

Command Fields命令字段

Field字段Type类型Necessity必要性Description描述
shardKeydocument文档Required必需

Shard key to analyze. This can be a candidate shard key for an unsharded collection or sharded collection or the current shard key for a sharded collection.要分析的分片键。这可以是未分片集合或分片集合的候选分片键,也可以是分片集合中的当前分片键。

There is no default value.没有默认值。

keyCharacteristicsboolean布尔值Optional可选

Whether or not the metrics about the characteristics of the shard key are calculated. For details, see keyCharacteristics.是否计算了关于分片键特征的度量。有关详细信息,请参阅keyCharacteristics

Defaults to true.默认为true

readWriteDistributionboolean布尔值Optional可选

Whether or not the metrics about the read and write distribution are calculated. For details, see readWriteDistribution.是否计算了关于读写分布的度量。有关详细信息,请参阅readWriteDistribution

Defaults to true.默认为true

To return read and write distribution metrics for a collection using analyzeShardKey, you must configure the query analyzer to sample the queries run on the collection. 要使用analyzeShardKey返回集合的读写分布指标,您必须配置查询分析器以对在集合上运行的查询进行采样。Otherwise, analyzeShardKey returns the read and write distribution metrics as 0 values. 否则,analyzeShardKey将读写分布指标返回为0值。To configure the query analyzer, see configureQueryAnalyzer (database command).要配置查询分析器,请参阅configureQueryAnalyzer(数据库命令)。

sampleRatedouble双精度浮点数Optional可选

The proportion of the documents in the collection to sample when calculating the metrics about the characteristics of the shard key. 在计算分片键特征的度量时,集合中的文档与样本的比例。If you set sampleRate, you cannot set sampleSize.如果设置sampleRate,则无法设置sampleSize

Must greater than 0, up to and including 1.必须大于0,最多为1

There is no default value.没有默认值。

sampleSizeinteger整数Optional可选

The number of documents to sample when calculating the metrics about the characteristics of the shard key. If you set sampleSize, you cannot set sampleRate.计算分片键特征的度量时要采样的文档数量。如果设置了sampleSize,则无法设置sampleRate

If not specified and sampleRate is not specified, the sample size defaults to sample size set by analyzeShardKeyCharacteristicsDefaultSampleSize.如果未指定且未指定sampleRate,则样本大小默认为analyzeShardKeyCharacteristicsDefaultSampleSize设置的样本大小。

Behavior行为

analyzeShardKey returns different metrics depending on the keyCharacteristic and readWriteDistribution values you specify when you run the method.根据运行该方法时指定的keyCharacteristicreadWriteDistribution值,返回不同的度量。

Metrics About Shard Key Characteristics关于分片关键特征的指标

keyCharacteristic consists of the metrics about the cardinality, frequency, and monotonicity of the shard key. These metrics are only returned when keyCharacteristics is true.由关于分片键的基数频率单调性的度量组成。只有当keyCharacteristicstrue时,才会返回这些指标。

The metrics are calculated when analyzeShardKey is run based on documents sampled from the collection. The calculation requires the shard key to have a supporting index. analyzeShardKey基于从集合中采样的文档运行时,会计算这些指标。计算需要分片键有一个支持索引If there is no supporting index, no metrics are returned.如果没有支持索引,则不会返回任何指标。

You can configure sampling with the sampleRate and sampleSize fields. 您可以使用sampleRatesampleSize字段配置采样。Both are optional, but only one can be specified. 两者都是可选的,但只能指定一个。When both sampleRate and sampleSize are unspecified, MongoDB uses the value of the analyzeShardKeyCharacteristicsDefaultSampleSize parameter, which has a default value of 10 million.当未指定sampleRatesampleSize时,MongoDB使用analyzeShardKeyCharacteristicsDefaultSampleSize参数的值,该参数的默认值为1000万。

To calculate metrics based on all documents in the collection, set the sampleRate to 1.要基于集合中的所有文档计算度量,请将sampleRate设置为1

Metrics About the Read and Write Distribution读写分布的度量

readWriteDistribution contains metrics about the query routing patterns and the hotness of shard key ranges. These metrics are based on sampled queries.包含有关查询路由模式和分片键范围热度的指标。这些指标基于抽样查询。

To configure query sampling for a collection, use the configureQueryAnalyzer command. 要为集合配置查询采样,请使用configureQueryAnalyzer命令。The read and write distribution metrics are only returned if readWriteDistribution is true. 只有当readWriteDistributiontrue时,才会返回读写分布度量。The metrics are calculated when analyzeShardKey is run and the metrics use the sampled read and write queries. If there are no sampled queries, read and write distribution metrics aren't returned.在运行analyzeShardKey时计算度量,度量使用采样的读写查询。如果没有采样查询,则不会返回读写分布指标。

  • If there are no sampled read queries, the command returns writeDistribution but omits readDistribution.如果没有采样读取查询,则该命令返回writeDistribution,但省略readDistribution
  • If there are no sampled write queries, the command returns readDistribution but omits writeDistribution.如果没有采样写查询,则该命令返回readDistribution,但省略writeDistribution

To return read and write distribution metrics for a collection using analyzeShardKey, you must configure the query analyzer to sample the queries run on the collection. 要使用analyzeShardKey返回集合的读写分布指标,您必须配置查询分析器以对在集合上运行的查询进行采样。Otherwise, analyzeShardKey returns the read and write distribution metrics as 0 values. To configure the query analyzer, see configureQueryAnalyzer (database command).否则,analyzeShardKey将读写分布指标返回为0值。要配置查询分析器,请参阅configureQueryAnalyzer(数据库命令)

keyCharacteristics ValuereadWriteDistribution ValueResults Returned结果返回
truefalse
falsetrueanalyzeShardKey returns readWriteDistribution metrics and omits keyCharacteristics metrics.返回readWriteDistribution度量并省略keyCharacteristics度量。
truetrue
  • analyzeShardKey returns both readWriteDistribution metrics and keyCharacteristics metrics.返回readWriteDistribution指标和keyCharacteristics指标。
  • If the shard key doesn't have a supporting index, analyzeShardKey returns readWriteDistribution metrics and omits keyCharacteristics metrics.如果分片键没有支持索引,analyzeShardKey将返回readWriteDistribution度量并省略keyCharacteristics度量。

Non-Blocking Behavior非阻塞行为

analyzeShardKey does not block reads or writes to the collection.不会阻止对集合的读取或写入。

Query Sampling查询采样

The quality of the metrics about the read and write distribution is determined by how representative the workload is when query sampling occurs. For some applications, returning representative metrics may require leaving query sampling on for several days.关于读写分布的度量质量取决于查询采样时工作负载的代表性。对于某些应用程序,返回代表性指标可能需要让查询采样持续几天。

Supporting Indexes支持指标

The supporting index required by analyzeShardKey is different from the supporting index required by the shardCollection command.analyzeShardKey所需的支持索引与shardCollection命令所需的支撑索引不同。

This table shows the supporting indexes for the same shard key for both analyzeShardKey and shardCollection:此表显示了analyzeShardKeyshardCollection的同一分片键的支持索引:

Command命令Shard Key分片钥匙Supporting Indexes支持指标
analyzeShardKey{ a.x: 1, b: "hashed" }
  • { a.x: 1, b: 1, ... }
  • { a.x: "hashed", b: 1, ... }
  • { a.x: 1, b: "hashed", ... }
  • { a.x: "hashed", b: "hashed", ...}
shardCollection{ a.x: 1, b: "hashed" }{ a.x: 1, b: "hashed", ... }

This allows you to analyze a shard key that may not yet have a supporting index required for sharding it.这允许您分析可能还没有分片所需的支持索引的分片键。

Both analyzeShardKey and shardCollection have the following index requirements:analyzedShardKeyshardCollection都有以下索引要求:

To create supporting indexes, use the db.collection.createIndex() method.要创建支持索引,请使用db.collection.createIndex()方法。

Read Preference读取首选项

To minimize the performance, run analyzeShardKey with the secondary or secondaryPreferred read preference. 为了最大限度地降低性能,请使用secondarysecondaryPreferred读取项运行analyzeShardKeyOn a sharded cluster, mongos automatically sets the read preference to secondaryPreferred if not specified.在分片集群上,如果没有指定,mongos会自动将读取首选项设置为secondaryPreferred

Limitations局限性

  • You cannot run analyzeShardKey on Atlas flex clusters.您无法在Atlas flex集群上运行analyzeShardKey
  • You cannot run analyzeShardKey on standalone deployments.您无法在独立部署上运行analyzeShardKey
  • You cannot run analyzeShardKey directly against a --shardsvr replica set. When running on a sharded cluster, analyzeShardKey must run against a mongos.您不能直接对--shardsvr副本集运行analyzeShardKey。在分片集群上运行时,analyzeShardKey必须与mongos运行。
  • You cannot run analyzeShardKey against time series collections.您无法对时间序列集合运行analyzeShardKey
  • You cannot run analyzeShardKey against collections with Queryable Encryption.您无法对具有可查询加密的集合运行analyzeShardKey

Access Control访问控制

analyzeShardKey requires one of these roles:需要以下角色之一:

  • enableSharding privilege action against the collection being analyzed.针对正在分析的集合的权限操作。
  • clusterManager role against the cluster.针对集群的角色。

Output输出

analyzeShardKey returns information regarding keyCharacteristics and readWriteDistribution.返回有关keyCharacteristicsreadWriteDistribution的信息。

  • keyCharacteristics provides metrics about the cardinality, frequency, and monotonicity of the shard key.提供有关分片键的基数、频率和单调性的度量。
  • readWriteDistribution provides metrics about query routing patterns and the hotness of shard key ranges.提供有关查询路由模式和分片键范围热度的指标。

keyCharacteristics

This is the structure of the keyCharacteristics document that is returned when keyCharacteristics is set to true:这是当keyCharacteristics设置为true时返回的keyCharacteristics文档的结构:

{
keyCharacteristics: {
numDocsTotal: <integer>,
numOrphanDocs: <integer>,
avgDocSizeBytes: <integer>,
numDocsSampled: <integer>,
isUnique: <bool>,
numDistinctValues: <integer>,
mostCommonValues: [
{ value: <shardkeyValue>, frequency: <integer> },
...
],
monotonicity: {
recordIdCorrelationCoefficient: <double>,
type: "monotonic"|"not monotonic"|"unknown",
}
}
}
Field字段Type类型Description描述Usage用法
numDocsTotalinteger整数The number of documents in the collection.集合中的文档数量。
numOrphanDocsinteger整数The number of orphan documents.孤儿文件的数量。Orphan documents are not excluded from metrics calculation for performance reasons. 出于性能原因,孤立文档不排除在指标计算之外。If numOrphanDocs is large relative to numDocsTotal, consider waiting until the number of orphan documents is very small compared to the total number of documents in the collection to run the command.如果numOrphantDocs相对于numDocsTotal较大,请考虑等待孤立文档的数量与集合中的文档总数相比非常小,然后运行该命令。
avgDocSizeBytesinteger整数The average size of documents in the collection, in bytes.集合中文档的平均大小,以字节为单位。If numDocsTotal is comparable to numDocsSampled, you can estimate the size of the largest chunks by multiplying the frequency of each mostCommonValues by avgDocSizeBytes.如果numDocsTotalnumDocsSampled相当,则可以通过将每个mostCommonValues的频率乘以avgDocSizeBytes来估计最大块的大小。
numDocsSampledinteger整数The number of sampled documents.抽样文件的数量。
numDistinctValuesinteger整数The number of distinct shard key values.不同分片键值的数量。Choose a shard key with a large numDistinctValues since the number of distinct shard key values is the maximum number of chunks that the balancer can create.选择一个numDistinctValues较大的分片键,因为不同分片键值的数量是平衡器可以创建的最大块数。
isUniqueboolean布尔值Indicates whether the shard key is unique. This is only set to true if there is a unique index for the shard key.指示分片键是否唯一。只有当分片键有唯一索引时,才会将其设置为trueIf the shard key is unique, then the number of distinct values is equal to the number of documents.如果分片键是唯一的,那么不同值的数量等于文档的数量。
mostCommonValuesarray of documents文档数组An array of value and frequency (number of documents) of the top most common shard key values.最常见的分片键值的值和frequency(文档数量)数组。

The frequency of a shard key value is the minimum number of documents in the chunk containing that value. If the frequency is large, then the chunk can become a bottleneck for storage, reads and writes. Choose a shard key where the frequency for each most common value is low relative to numDocsSampled.分片键值的频率是包含该值的块中文档的最小数量。如果频率很高,那么块可能会成为存储、读取和写入的瓶颈。选择一个分片键,其中每个最常见值的频率相对于numDocsSampled较低。

The number of most common shard key values can be configured by setting analyzeShardKeyNumMostCommonValues which defaults to 5. 最常见的分片键值的数量可以通过设置analyzeShardKeyNumMostCommonValues来配置,默认值为5To avoid exceeding the 16MB BSON size limit for the response, each value is set to "truncated" if its size exceeds 15MB / analyzeShardKey NumMostCommonValues.为了避免超过响应的16MB BSON大小限制,如果每个值的大小超过15MB/analysizeShardKey NumMostCommonValues,则将其设置为“截断”。

mostCommonValues[n].valuedocument文档The shard key.分片键
mostCommonValues[n].frequencyinteger整数The number of documents for a given shard key.给定分片键的文档数。Choose a shard key where the frequency for each most common value is low relative to numDocsSampled.选择一个分片键,其中每个最常见值的频率相对于numDocsSampled较低。

monotonicity. recordIdCorrelationCoefficient

double双精度浮点数Only set if the monotonicity is known.仅当单调性已知时设置。

This is set to "unknown" when the one of the following is true:当以下之一为真时,此设置为"unknown"

  • The shard key does not have a supporting index per shardCollection definition.分片键没有每个shardCollection定义的支持索引。
  • The collection is clustered.该集合是群集的
  • The shard key is a hashed compound shard key where the hashed field is not the first field.分片键是一个哈希复合分片键,其中哈希字段不是第一个字段。

The monotonicity check can return an incorrect result if the collection has gone through chunk migrations. Chunk migration deletes documents from the donor shard and re-inserts them on the recipient shard. There is no guarantee that the insertion order from the client is preserved.如果集合经历了块迁移,单调性检查可能会返回不正确的结果。块迁移从捐赠者分片中删除文档,并将其重新插入到接受者分片中。无法保证保留来自客户端的插入顺序。

You can configure the threshold for the correlation coefficient with analyzeShardKeyMonotonicity CorrelationCoefficientThreshold.您可以使用analyzeShardKeyMonotonicity CorrelationCoefficientThreshold配置相关系数的阈值。

monotoncity.typestring字符串

Can be one of:可以是以下之一:

"monotonic", "not monotonic", "unknown"

Avoid a shard key with type "monotonic" unless you do not expect to insert new documents often.避免使用类型为"monotonic"的分片键,除非您不希望经常插入新文档。

If a collection is sharded on a shard key that is monotonically increasing or decreasing, new documents will be inserted onto the shard that owns the MaxKey or MinKey chunk. 如果一个集合在单调递增或递减的分片键上被分片,新文档将被插入到拥有MaxKeyMinKey块的分片上。That shard can become the bottleneck for inserts and the data will likely be unbalanced most of the time since the balancer will need to compete with the inserts that come in.该分片可能会成为插入的瓶颈,数据在大多数情况下可能会不平衡,因为平衡器需要与进来的插入竞争。

readWriteDistribution

This is the structure of the document that is returned when readWriteDistribution is set to true:这是当readWriteDistribution设置为true时返回的文档结构:

{
readDistribution: {
sampleSize: {
total: <integer>,
find: <integer>,
aggregate: <integer>,
count: <integer>,
distinct: <integer>
},
percentageOfSingleShardReads: <double>,
percentageOfMultiShardReads: <double>,
percentageOfScatterGatherReads: <double>,
numReadsByRange: [
<integer>,
...
]
},
writeDistribution: {
sampleSize: {
total: <integer>,
update: <integer>,
delete: <integer>,
findAndModify: <integer>
},
percentageOfSingleShardWrites: <double>,
percentageOfMultiShardWrites: <double>,
percentageOfScatterGatherWrites: <double>,
numWritesByRange: [
<integer>,
...
],
percentageOfShardKeyUpdates: <double>,
percentageOfSingleWritesWithoutShardKey: <double>,
percentageOfMultiWritesWithoutShardKey: <double>
}
}

To return read and write distribution metrics for a collection using analyzeShardKey, you must configure the query analyzer to sample the queries run on the collection. 要使用analyzeShardKey返回集合的读写分布指标,您必须配置查询分析器以对在集合上运行的查询进行采样。Otherwise, analyzeShardKey returns the read and write distribution metrics as 0 values. 否则,analyzeShardKey将读写分布指标返回为0值。To configure the query analyzer, see configureQueryAnalyzer (database command).要配置查询分析器,请参阅configureQueryAnalyzer(数据库命令)

readDistribution Fields字段

Field字段Type类型Description描述Usage
sampleSize.totalinteger整数Total number of sampled read queries.采样读取查询的总数。
sampleSize.findinteger整数Total number of sampled find queries.采样find查询的总数。
sampleSize.aggregateinteger整数Total number of sampled aggregate queries.aggregate合查询的总数。
sampleSize.countinteger整数Total number of sampled count queries.采样count查询的总数。
sampleSize.distinctinteger整数Total number of sampled distinct queries.采样的distinct查询的总数。
percentageOfSingleShardReadsdouble双精度浮点数Percentage of reads that target a single shard, regardless of how the data is distributed.针对单个分片的读取百分比,无论数据如何分布。
percentageOfMultiShardReadsdouble双精度浮点数Percentage of reads that target multiple shards.针对多个分片的读取百分比。

This category includes the reads that may target only a single shard if the data is distributed such that the values targeted by the read fall under a single shard.此类别包括可能仅针对单个分片的读取,如果数据是分布式的,则读取的目标值落在单个分片下。

If the queries operate on a large amount of data, then targeting multiple shards instead of one may result in a decrease in latency due to the parallel query execution.如果查询对大量数据进行操作,那么针对多个分片而不是一个分片可能会由于并行查询执行而减少延迟。

percentageOfScatterGatherReadsdouble双精度浮点数Percentage of reads that are scatter-gather, regardless of how the data is distributed.分散聚集的读取百分比,无论数据如何分布。

Avoid a shard key with high value for this metric. While scatter-gather queries are low-impact on the shards that do not have the target data, they still have some performance impact.避免使用此度量值较高的分片键。虽然分散集合查询对没有目标数据的分片的影响很小,但它们仍然会对性能产生一些影响。

On a cluster with a large number of shards, scatter-gather queries perform significantly worse than queries that target a single shard.在具有大量分片的集群上,分散-聚集查询的性能明显不如针对单个分片的查询。

numReadsByRangearray of integers整数数组Array of numbers representing the number of times that each range sorted from MinKey to MaxKey is targeted.一组数字,表示从MinKeyMaxKey排序的每个范围被定位的次数。

Avoid a shard key where the distribution of numReadsByRange is very skewed since that implies that there is likely to be one or more hot shards for reads.避免使用numReadsByRange分布非常偏斜的分片键,因为这意味着可能有一个或多个热分片用于读取。

Choose a shard key where the sum of numReadsByRange is similar to sampleSize.total.选择一个分片键,其中numReadsByRange的总和类似于sampleSizetotal

The number of ranges can be configured using the analyzeShardKeyNumRanges parameter which defaults to 100. The value is 100 because the goal is to find a shard key that scales up to 100 shards.范围的数量可以使用analyzeShardKeyNumRanges参数配置,该参数默认为100。该值为100,因为目标是找到一个可扩展到100个分片的分片键。

writeDistribution Fields字段

Field字段Type类型Description描述Usage
sampleSize.totalinteger整数Total number of sampled write queries.采样写入查询的总数。
sampleSize.updateinteger整数Total number of sampled update queries.采样update查询的总数。
sampleSize.deleteinteger整数Total number of sampled delete queries.采样delete查询的总数。
sampleSize.findAndModifyinteger整数Total number of sampled findAndModify queries.采样的findAndModify查询总数。
percentageOfSingleShardWritesdouble双精度浮点数Percentage of writes that target a single shard, regardless of how the data is distributed.针对单个分片的写入百分比,无论数据如何分布。
percentageOfMultiShardWritesdouble双精度浮点数Percentage of writes that target multiple shards.针对多个分片的写入百分比。This category includes the writes that may target only a single shard if the data is distributed such that the values targeted by the write fall under a single shard.此类别包括可能仅针对单个分片的写入,如果数据是分布式的,则写入的目标值落在单个分片下。
percentageOfScatterGatherWritesdouble双精度浮点数Percentage of writes that are scatter-gather, regardless of how the data is distributed.无论数据如何分布,分散聚集的写入百分比。Avoid a shard key with a high value for this metric because it is generally more performant for a write to target a single shard.避免使用此度量值较高的分片键,因为针对单个分片的写入通常更具性能。
numWritesByRangearray of integers整数数组Array of numbers representing the number of times that each range sorted from MinKey to MaxKey is targeted.一组数字,表示从MinKeyMaxKey排序的每个范围被定位的次数。

Avoid a shard key where the distribution of numWritesByRange is a very skewed since that implies that there is likely to be one or more hot shards for writes.避免使用numWritesByRange的分布非常偏斜的分片键,因为这意味着可能有一个或多个热分片用于写入。

Choose a shard key where the sum of numWritesByRange is similar to sampleSize.total.选择一个分片键,其中numWritesByRange的总和类似于sampleSize.total

The number of ranges can be configured using the analyzeShardKeyNumRanges parameter which defaults to 100. The value is 100 because the goal is to find a shard key that scales up to 100 shards.范围的数量可以使用analyzeShardKeyNumRanges参数配置,该参数默认为100。该值为100,因为目标是找到一个可扩展到100个分片的分片键。

percentageOfShardKeyUpdatesdouble双精度浮点数Percentage of write queries that update a document's shard key value.更新文档分片键值的写入查询的百分比。

Avoid a shard key with a high percentageOfShardKeyUpdates. Updates to a document's shard key value may cause the document to move to a different shard, which requires executing an internal transaction on the shard that the query targets. 避免使用percentageOfShardKeyUpdates较高的分片键。对文档分片键值的更新可能会导致文档移动到不同的分片,这需要在查询所针对的分片上执行内部事务。For details on changing a document's shard key value, see Change a Shard Key.有关更改文档分片键值的详细信息,请参阅更改分片键值

Updates are currently only supported as retryable writes or in a transaction, and have a batch size limit of 1.目前,更新仅在可重试写入或事务中受支持,批大小限制为1

percentageOfSingleWritesWithoutShardKeydouble双精度浮点数The percentage of write queries that are multi=false and not targetable to a single shard.multi=false且不能针对单个分片的写入查询的百分比。

Avoid a shard key with a high value for this metric.避免使用此度量值较高的分片键。

Performing this type of write is expensive because they can involve running internal transactions.执行这种类型的写入是昂贵的,因为它们可能涉及运行内部事务。

percentageOfMultiWritesWithoutShardKeydouble双精度浮点数The percentage of write queries that are multi=true and not targetable to a single shard.multi=true且不能针对单个分片的写入查询的百分比。Avoid a shard key with a high value for this metric.避免使用此度量值较高的分片键。

Examples示例

Consider a simplified version of a social media app. The collection we are trying to shard is the post collection.考虑一个社交媒体应用程序的简化版本。我们试图分片的集合是post集合。

Documents in the post collection have the following schema:post集合中的文档具有以下模式:

{
userId: <uuid>,
firstName: <string>,
lastName: <string>,
body: <string>, // the field that can be modified.
date: <date>, // the field that can be modified.
}

Background Information背景信息

  • The app has 1500 users.该应用程序有1500名用户。
  • There are 30 last names and 45 first names, some more common than others.有30个姓氏和45个名字,有些比其他名字更常见。
  • There are three celebrity users.有三位名人用户。
  • Each user follows exactly five other users and has a very high probability of following at least one celebrity user.每个用户只关注另外五个用户,并且很有可能关注至少一个名人用户。

Sample Workload示例工作量

  • Each user posts about two posts a day at random times. They edit each post once, right after it is posted.每个用户每天随机发布大约两条帖子。他们在每篇帖子发布后立即编辑一次。
  • Each user logs in every six hours to read their own profile and posts by the users they follow from the past 24 hours. They also reply under a random post from the past three hours.每个用户每六个小时登录一次,阅读他们自己的个人资料以及他们在过去24小时内关注的用户的帖子。他们还随机回复了过去三个小时的帖子。
  • For every user, the app removes posts that are more than three days old at midnight.对于每个用户,该应用程序都会在午夜删除超过三天的帖子。

Workload Query Patterns工作负载查询模式

This workload has the following query patterns:此工作负载具有以下查询模式:

  • find command with filter 带筛选器的命令{ userId: , firstName: , lastName: }
  • find command with filter 带筛选器的命令{ $or: [{ userId: , firstName: , lastName:, date: { $gte: }, ] }
  • findAndModify command with filter { userId: , firstName: , lastName: , date: } to update the body and date field.使用筛选器{ userId: , firstName: , lastName: , date: }的命令更新正文和日期字段。
  • update command with multi: false and filter { userId: , firstName: , lastName: , date: { $gte: , $lt: } } to update the body and date field.使用multi:false和筛选器{ userId: , firstName: , lastName: , date: { $gte: , $lt: } }update命令来更新正文和日期字段。
  • delete command with multi: true and filter { userId: , firstName: , lastName: , date: { $lt: } }multi:true和筛选器{ userId: , firstName: , lastName: , date: { $lt: } }delete命令

Below are example metrics returned by analyzeShardKey command for some candidate shard keys, with sampled queries collected from seven days of workload.下面是analyzeShardKey命令返回的一些候选分片键的示例指标,其中包含从七天的工作负载中集合的采样查询。

Note

Before you run analyzeShardKey commands, read the Supporting Indexes section earlier on this page. 在运行analyzeShardKey命令之前,请阅读本页前面的支持索引部分。If you require supporting indexes for the shard key you are analyzing, use the db.collection.createIndex() method to create the indexes.如果您需要为正在分析的分片键提供支持索引,请使用db.collection.createIndex()方法创建索引。

{ _id: 1 } keyCharacteristics

This example uses the analyzeShardKey command to provide metrics on the { _id: 1 } shard key on the social.post collection.此示例使用analyzeShardKey命令提供social.post集合上{ _id: 1 }分片键的度量。

The following code block uses db.collection.configureQueryAnalyzer() to turn on query sampling:以下代码块使用db.collection.configureQueryAnalyzer()打开查询采样:

use social
db.post.configureQueryAnalyzer(
{
mode: "full",
samplesPerSecond: 5
}
)

After db.collection.configureQueryAnalyzer() collects query samples, the following code block uses the analyzeShardKey command to sample 10,000 documents and calculate results:db.collection.configureQueryAnalyzer()集合查询样本后,以下代码块使用analyzeShardKey命令对10000个文档进行采样并计算结果:

use social
db.post.analyzeShardKey(
{ _id: 1 },
{
keyCharacteristics: true,
readWriteDistribution: false,
sampleSize: 10000
}
)

{ lastName: 1 } keyCharacteristics

This analyzeShardKey command provides metrics on the { lastName: 1 } shard key on the social.post collection:analyzeShardKey命令提供social.post集合上{ lastName: 1 }分片键的度量:

use social
db.post.analyzeShardKey(
{ lastName: 1 },
{
keyCharacteristics: true,
readWriteDistribution: false
}
)

The output for this example resembles the following:此示例的输出类似于以下内容:

{
"keyCharacteristics": {
"numDocsTotal" : 9039,
"avgDocSizeBytes" : 153,
"numDocsSampled" : 9039,
"isUnique" : false,
"numDistinctValues" : 30,
"mostCommonValues" : [
{
"value" : {
"lastName" : "Smith"
},
"frequency" : 1013
},
{
"value" : {
"lastName" : "Johnson"
},
"frequency" : 984
},
{
"value" : {
"lastName" : "Jones"
},
"frequency" : 962
},
{
"value" : {
"lastName" : "Brown"
},
"frequency" : 925
},
{
"value" : {
"lastName" : "Davies"
},
"frequency" : 852
}
],
"monotonicity" : {
"recordIdCorrelationCoefficient" : 0.0771959161,
"type" : "not monotonic"
},
}
}

{ userId: 1 } keyCharacteristics

This analyzeShardKey command provides metrics on the { userId: 1 } shard key on the social.post collection:analyzeShardKey命令提供social.post集合上{ userId: 1 }分片键的度量:

use social
db.post.analyzeShardKey(
{ userId: 1 },
{
keyCharacteristics: true,
readWriteDistribution: false
}
)

The output for this example resembles the following:此示例的输出类似于以下内容:

{
"keyCharacteristics": {
"numDocsTotal" : 9039,
"avgDocSizeBytes" : 162,
"numDocsSampled" : 9039,
"isUnique" : false,
"numDistinctValues" : 1495,
"mostCommonValues" : [
{
"value" : {
"userId" : UUID("aadc3943-9402-4072-aae6-ad551359c596")
},
"frequency" : 15
},
{
"value" : {
"userId" : UUID("681abd2b-7a27-490c-b712-e544346f8d07")
},
"frequency" : 14
},
{
"value" : {
"userId" : UUID("714cb722-aa27-420a-8d63-0d5db962390d")
},
"frequency" : 14
},
{
"value" : {
"userId" : UUID("019a4118-b0d3-41d5-9c0a-764338b7e9d1")
},
"frequency" : 14
},
{
"value" : {
"userId" : UUID("b9c9fbea-3c12-41aa-bc69-eb316047a790")
},
"frequency" : 14
}
],
"monotonicity" : {
"recordIdCorrelationCoefficient" : -0.0032039729,
"type" : "not monotonic"
},
}
}

{ userId: 1 } readWriteDistribution

This analyzeShardKey command provides metrics on the { userId: 1 } shard key on the social.post collection:analyzeShardKey命令提供social.post集合上{ userId: 1 }分片键的度量:

use social
db.post.analyzeShardKey(
{ userId: 1 },
{
keyCharacteristics: false,
readWriteDistribution: true
}
)

The output for this example resembles the following:此示例的输出类似于以下内容:

{
"readDistribution" : {
"sampleSize" : {
"total" : 61363,
"find" : 61363,
"aggregate" : 0,
"count" : 0,
"distinct" : 0
},
"percentageOfSingleShardReads" : 50.0008148233,
"percentageOfMultiShardReads" : 49.9991851768,
"percentageOfScatterGatherReads" : 0,
"numReadsByRange" : [
688,
775,
737,
776,
652,
671,
1332,
1407,
535,
428,
985,
573,
1496,
...
],
},
"writeDistribution" : {
"sampleSize" : {
"total" : 49638,
"update" : 30680,
"delete" : 7500,
"findAndModify" : 11458
},
"percentageOfSingleShardWrites" : 100,
"percentageOfMultiShardWrites" : 0,
"percentageOfScatterGatherWrites" : 0,
"numWritesByRange" : [
389,
601,
430,
454,
462,
421,
668,
833,
493,
300,
683,
460,
...
],
"percentageOfShardKeyUpdates" : 0,
"percentageOfSingleWritesWithoutShardKey" : 0,
"percentageOfMultiWritesWithoutShardKey" : 0
}
}

Learn More了解更多