Definition定义
analyzeShardKeyNew in version 7.0.在版本7.0中新增。Calculates metrics for evaluating a shard key for an unsharded or sharded collection. Metrics are based on sampled queries. You can use计算用于评估未分片或分片集合的分片键的指标。度量基于抽样查询。您可以使用configureQueryAnalyzerto configure query sampling on a collection.configureQueryAnalyzer在集合上配置查询采样。
Compatibility兼容性
This command is available in deployments hosted in the following environments:此命令在以下环境中托管的部署中可用:
- MongoDB Atlas
: The fully managed service for MongoDB deployments in the cloud:云中MongoDB部署的完全托管服务
Note
This command is supported in all MongoDB Atlas clusters. 所有MongoDB Atlas集群都支持此命令。For information on Atlas support for all commands, see Unsupported Commands.有关Atlas支持所有命令的信息,请参阅不支持的命令。
- MongoDB Enterprise
: The subscription-based, self-managed version of MongoDB:MongoDB的基于订阅的自我管理版本 - MongoDB Community
: The source-available, free-to-use, and self-managed version of MongoDB:MongoDB的源代码可用、免费使用和自我管理版本
Syntax语法
analyzeShardKey has this syntax:具有以下语法:
db.collection.analyzeShardKey(
<shardKey>,
{
keyCharacteristics: <bool>,
readWriteDistribution: <bool>,
sampleRate: <double>,
sampleSize: <int>
}
)Command Fields命令字段
shardKey |
| ||
keyCharacteristics |
| ||
readWriteDistribution |
| ||
sampleRate |
| ||
sampleSize |
|
Behavior行为
analyzeShardKey returns different metrics depending on the 根据运行该方法时指定的keyCharacteristic and readWriteDistribution values you specify when you run the method.keyCharacteristic和readWriteDistribution值,返回不同的度量。
Metrics About Shard Key Characteristics关于分片关键特征的指标
keyCharacteristic consists of the metrics about the cardinality, frequency, and monotonicity of the shard key. These metrics are only returned when 由关于分片键的基数、频率和单调性的度量组成。只有当keyCharacteristics is true.keyCharacteristics为true时,才会返回这些指标。
The metrics are calculated when 当analyzeShardKey is run based on documents sampled from the collection. The calculation requires the shard key to have a supporting index. analyzeShardKey基于从集合中采样的文档运行时,会计算这些指标。计算需要分片键有一个支持索引。If there is no supporting index, no metrics are returned.如果没有支持索引,则不会返回任何指标。
You can configure sampling with the 您可以使用sampleRate and sampleSize fields. sampleRate和sampleSize字段配置采样。Both are optional, but only one can be specified. 两者都是可选的,但只能指定一个。When both 当未指定sampleRate and sampleSize are unspecified, MongoDB uses the value of the analyzeShardKeyCharacteristicsDefaultSampleSize parameter, which has a default value of 10 million.sampleRate和sampleSize时,MongoDB使用analyzeShardKeyCharacteristicsDefaultSampleSize参数的值,该参数的默认值为1000万。
To calculate metrics based on all documents in the collection, set the 要基于集合中的所有文档计算度量,请将sampleRate to 1.sampleRate设置为1。
Metrics About the Read and Write Distribution读写分布的度量
readWriteDistribution contains metrics about the query routing patterns and the hotness of shard key ranges. These metrics are based on sampled queries.包含有关查询路由模式和分片键范围热度的指标。这些指标基于抽样查询。
To configure query sampling for a collection, use the 要为集合配置查询采样,请使用configureQueryAnalyzer command. configureQueryAnalyzer命令。The read and write distribution metrics are only returned if 只有当readWriteDistribution is true. readWriteDistribution为true时,才会返回读写分布度量。The metrics are calculated when 在运行analyzeShardKey is run and the metrics use the sampled read and write queries. If there are no sampled queries, read and write distribution metrics aren't returned.analyzeShardKey时计算度量,度量使用采样的读写查询。如果没有采样查询,则不会返回读写分布指标。
If there are no sampled read queries, the command returns如果没有采样读取查询,则该命令返回writeDistributionbut omitsreadDistribution.writeDistribution,但省略readDistribution。If there are no sampled write queries, the command returns如果没有采样写查询,则该命令返回readDistributionbut omitswriteDistribution.readDistribution,但省略writeDistribution。
To return read and write distribution metrics for a collection using 要使用analyzeShardKey, you must configure the query analyzer to sample the queries run on the collection. analyzeShardKey返回集合的读写分布指标,您必须配置查询分析器以对在集合上运行的查询进行采样。Otherwise, 否则,analyzeShardKey returns the read and write distribution metrics as 0 values. To configure the query analyzer, see configureQueryAnalyzer (database command).analyzeShardKey将读写分布指标返回为0值。要配置查询分析器,请参阅configureQueryAnalyzer(数据库命令)。
keyCharacteristics | readWriteDistribution | |
|---|---|---|
true | false |
|
false | true | analyzeShardKeyreadWriteDistribution metrics and omits keyCharacteristics metrics.readWriteDistribution度量并省略keyCharacteristics度量。 |
true | true |
|
Non-Blocking Behavior非阻塞行为
analyzeShardKey does not block reads or writes to the collection.不会阻止对集合的读取或写入。
Query Sampling查询采样
The quality of the metrics about the read and write distribution is determined by how representative the workload is when query sampling occurs. For some applications, returning representative metrics may require leaving query sampling on for several days.关于读写分布的度量质量取决于查询采样时工作负载的代表性。对于某些应用程序,返回代表性指标可能需要让查询采样持续几天。
Supporting Indexes支持指标
The supporting index required by analyzeShardKey is different from the supporting index required by the shardCollection command.analyzeShardKey所需的支持索引与shardCollection命令所需的支撑索引不同。
This table shows the supporting indexes for the same shard key for both 此表显示了analyzeShardKey and shardCollection:analyzeShardKey和shardCollection的同一分片键的支持索引:
analyzeShardKey | { a.x: 1, b: "hashed" } |
|
shardCollection | { a.x: 1, b: "hashed" } | { a.x: 1, b: "hashed", ... } |
This allows you to analyze a shard key that may not yet have a supporting index required for sharding it.这允许您分析可能还没有分片所需的支持索引的分片键。
Both analyzeShardKey and shardCollection have the following index requirements:analyzedShardKey和shardCollection都有以下索引要求:
Index has a simple collation索引有一个简单的排序规则Index is not multi-key索引不是多键Index is not sparse索引不是稀疏的Index is not partial索引不是部分的
To create supporting indexes, use the 要创建支持索引,请使用db.collection.createIndex() method.db.collection.createIndex()方法。
Read Preference读取首选项
To minimize the performance, run 为了最大限度地降低性能,请使用analyzeShardKey with the secondary or secondaryPreferred read preference. secondary或secondaryPreferred读取项运行analyzeShardKey。On a sharded cluster, 在分片集群上,如果没有指定,mongos automatically sets the read preference to secondaryPreferred if not specified.mongos会自动将读取首选项设置为secondaryPreferred。
Limitations局限性
You cannot run您无法在Atlas flex集群上运行analyzeShardKeyon Atlas flex clusters.analyzeShardKey。You cannot run您无法在独立部署上运行analyzeShardKeyon standalone deployments.analyzeShardKey。You cannot run您不能直接对analyzeShardKeydirectly against a--shardsvrreplica set. When running on a sharded cluster,analyzeShardKeymust run against amongos.--shardsvr副本集运行analyzeShardKey。在分片集群上运行时,analyzeShardKey必须与mongos运行。You cannot run您无法对时间序列集合运行analyzeShardKeyagainst time series collections.analyzeShardKey。You cannot run您无法对具有可查询加密的集合运行analyzeShardKeyagainst collections with Queryable Encryption.analyzeShardKey。
Access Control访问控制
analyzeShardKey requires one of these roles:需要以下角色之一:
enableShardingprivilege action against the collection being analyzed.针对正在分析的集合的权限操作。clusterManagerrole against the cluster.针对集群的角色。
Output输出
analyzeShardKey returns information regarding keyCharacteristics and readWriteDistribution.返回有关keyCharacteristics和readWriteDistribution的信息。
keyCharacteristicsprovides metrics about the cardinality, frequency, and monotonicity of the shard key.提供有关分片键的基数、频率和单调性的度量。readWriteDistributionprovides metrics about query routing patterns and the hotness of shard key ranges.提供有关查询路由模式和分片键范围热度的指标。
keyCharacteristics
This is the structure of the 这是当keyCharacteristics document that is returned when keyCharacteristics is set to true:keyCharacteristics设置为true时返回的keyCharacteristics文档的结构:
{
keyCharacteristics: {
numDocsTotal: <integer>,
numOrphanDocs: <integer>,
avgDocSizeBytes: <integer>,
numDocsSampled: <integer>,
isUnique: <bool>,
numDistinctValues: <integer>,
mostCommonValues: [
{ value: <shardkeyValue>, frequency: <integer> },
...
],
monotonicity: {
recordIdCorrelationCoefficient: <double>,
type: "monotonic"|"not monotonic"|"unknown",
}
}
}
numDocsTotal | |||
numOrphanDocs | numOrphanDocs is large relative to numDocsTotal, consider waiting until the number of orphan documents is very small compared to the total number of documents in the collection to run the command.numOrphantDocs相对于numDocsTotal较大,请考虑等待孤立文档的数量与集合中的文档总数相比非常小,然后运行该命令。 | ||
avgDocSizeBytes | numDocsTotal is comparable to numDocsSampled, you can estimate the size of the largest chunks by multiplying the frequency of each mostCommonValues by avgDocSizeBytes.numDocsTotal与numDocsSampled相当,则可以通过将每个mostCommonValues的频率乘以avgDocSizeBytes来估计最大块的大小。 | ||
numDocsSampled | |||
numDistinctValues | numDistinctValues since the number of distinct shard key values is the maximum number of chunks that the balancer can create.numDistinctValues较大的分片键,因为不同分片键值的数量是平衡器可以创建的最大块数。 | ||
isUnique | true if there is a unique index for the shard key.true。 | ||
mostCommonValues | frequency (number of documents) of the top most common shard key values.frequency(文档数量)数组。 |
| |
mostCommonValues[n].value | |||
mostCommonValues[n].frequency | numDocsSampled.numDocsSampled较低。 | ||
|
| ||
monotoncity.type |
|
|
readWriteDistribution
This is the structure of the document that is returned when 这是当readWriteDistribution is set to true:readWriteDistribution设置为true时返回的文档结构:
{
readDistribution: {
sampleSize: {
total: <integer>,
find: <integer>,
aggregate: <integer>,
count: <integer>,
distinct: <integer>
},
percentageOfSingleShardReads: <double>,
percentageOfMultiShardReads: <double>,
percentageOfScatterGatherReads: <double>,
numReadsByRange: [
<integer>,
...
]
},
writeDistribution: {
sampleSize: {
total: <integer>,
update: <integer>,
delete: <integer>,
findAndModify: <integer>
},
percentageOfSingleShardWrites: <double>,
percentageOfMultiShardWrites: <double>,
percentageOfScatterGatherWrites: <double>,
numWritesByRange: [
<integer>,
...
],
percentageOfShardKeyUpdates: <double>,
percentageOfSingleWritesWithoutShardKey: <double>,
percentageOfMultiWritesWithoutShardKey: <double>
}
}
To return read and write distribution metrics for a collection using 要使用analyzeShardKey, you must configure the query analyzer to sample the queries run on the collection. analyzeShardKey返回集合的读写分布指标,您必须配置查询分析器以对在集合上运行的查询进行采样。Otherwise, 否则,analyzeShardKey returns the read and write distribution metrics as 0 values. analyzeShardKey将读写分布指标返回为0值。To configure the query analyzer, see configureQueryAnalyzer (database command).要配置查询分析器,请参阅configureQueryAnalyzer(数据库命令)。
readDistribution Fields字段
| Usage | |||
|---|---|---|---|
sampleSize.total | |||
sampleSize.find | find queries.find查询的总数。 | ||
sampleSize.aggregate | aggregate queries.aggregate合查询的总数。 | ||
sampleSize.count | count queries.count查询的总数。 | ||
sampleSize.distinct | distinct queries.distinct查询的总数。 | ||
percentageOfSingleShardReads | |||
percentageOfMultiShardReads |
| ||
percentageOfScatterGatherReads |
| ||
numReadsByRange | MinKey to MaxKey is targeted.MinKey到MaxKey排序的每个范围被定位的次数。 |
|
writeDistribution Fields字段
| Usage | |||
|---|---|---|---|
sampleSize.total | |||
sampleSize.update | update queries.update查询的总数。 | ||
sampleSize.delete | delete queries.delete查询的总数。 | ||
sampleSize.findAndModify | findAndModify queries.findAndModify查询总数。 | ||
percentageOfSingleShardWrites | |||
percentageOfMultiShardWrites | |||
percentageOfScatterGatherWrites | |||
numWritesByRange | MinKey to MaxKey is targeted.MinKey到MaxKey排序的每个范围被定位的次数。 |
| |
percentageOfShardKeyUpdates |
| ||
percentageOfSingleWritesWithoutShardKey | multi=false and not targetable to a single shard.multi=false且不能针对单个分片的写入查询的百分比。 |
| |
percentageOfMultiWritesWithoutShardKey | multi=true and not targetable to a single shard.multi=true且不能针对单个分片的写入查询的百分比。 |
Examples示例
Consider a simplified version of a social media app. The collection we are trying to shard is the 考虑一个社交媒体应用程序的简化版本。我们试图分片的集合是post collection.post集合。
Documents in the post collection have the following schema:post集合中的文档具有以下模式:
{
userId: <uuid>,
firstName: <string>,
lastName: <string>,
body: <string>, // the field that can be modified.
date: <date>, // the field that can be modified.
}
Background Information背景信息
The app has 1500 users.该应用程序有1500名用户。There are 30 last names and 45 first names, some more common than others.有30个姓氏和45个名字,有些比其他名字更常见。There are three celebrity users.有三位名人用户。Each user follows exactly five other users and has a very high probability of following at least one celebrity user.每个用户只关注另外五个用户,并且很有可能关注至少一个名人用户。
Sample Workload示例工作量
Each user posts about two posts a day at random times. They edit each post once, right after it is posted.每个用户每天随机发布大约两条帖子。他们在每篇帖子发布后立即编辑一次。Each user logs in every six hours to read their own profile and posts by the users they follow from the past 24 hours. They also reply under a random post from the past three hours.每个用户每六个小时登录一次,阅读他们自己的个人资料以及他们在过去24小时内关注的用户的帖子。他们还随机回复了过去三个小时的帖子。For every user, the app removes posts that are more than three days old at midnight.对于每个用户,该应用程序都会在午夜删除超过三天的帖子。
Workload Query Patterns工作负载查询模式
This workload has the following query patterns:此工作负载具有以下查询模式:
findcommand with filter带筛选器的命令{ userId: , firstName: , lastName: }findcommand with filter带筛选器的命令{ $or: [{ userId: , firstName: , lastName:, date: { $gte: }, ] }findAndModifycommand with filter使用筛选器{ userId: , firstName: , lastName: , date: }to update the body and date field.{ userId: , firstName: , lastName: , date: }的命令更新正文和日期字段。使用updatecommand withmulti: falseand filter{ userId: , firstName: , lastName: , date: { $gte: , $lt: } }to update the body and date field.multi:false和筛选器{ userId: , firstName: , lastName: , date: { $gte: , $lt: } }的update命令来更新正文和日期字段。带deletecommand withmulti: trueand filter{ userId: , firstName: , lastName: , date: { $lt: } }multi:true和筛选器{ userId: , firstName: , lastName: , date: { $lt: } }的delete命令
Below are example metrics returned by 下面是analyzeShardKey command for some candidate shard keys, with sampled queries collected from seven days of workload.analyzeShardKey命令返回的一些候选分片键的示例指标,其中包含从七天的工作负载中集合的采样查询。
Note
Before you run 在运行analyzeShardKey commands, read the Supporting Indexes section earlier on this page. analyzeShardKey命令之前,请阅读本页前面的支持索引部分。If you require supporting indexes for the shard key you are analyzing, use the 如果您需要为正在分析的分片键提供支持索引,请使用db.collection.createIndex() method to create the indexes.db.collection.createIndex()方法创建索引。
{ _id: 1 } keyCharacteristics
This example uses the 此示例使用analyzeShardKey command to provide metrics on the { _id: 1 } shard key on the social.post collection.analyzeShardKey命令提供social.post集合上{ _id: 1 }分片键的度量。
The following code block uses 以下代码块使用db.collection.configureQueryAnalyzer() to turn on query sampling:db.collection.configureQueryAnalyzer()打开查询采样:
use social
db.post.configureQueryAnalyzer(
{
mode: "full",
samplesPerSecond: 5
}
)
After db.collection.configureQueryAnalyzer() collects query samples, the following code block uses the analyzeShardKey command to sample 10,000 documents and calculate results:db.collection.configureQueryAnalyzer()集合查询样本后,以下代码块使用analyzeShardKey命令对10000个文档进行采样并计算结果:
use social
db.post.analyzeShardKey(
{ _id: 1 },
{
keyCharacteristics: true,
readWriteDistribution: false,
sampleSize: 10000
}
){ lastName: 1 } keyCharacteristics
This 此analyzeShardKey command provides metrics on the { lastName: 1 } shard key on the social.post collection:analyzeShardKey命令提供social.post集合上{ lastName: 1 }分片键的度量:
use social
db.post.analyzeShardKey(
{ lastName: 1 },
{
keyCharacteristics: true,
readWriteDistribution: false
}
)
The output for this example resembles the following:此示例的输出类似于以下内容:
{
"keyCharacteristics": {
"numDocsTotal" : 9039,
"avgDocSizeBytes" : 153,
"numDocsSampled" : 9039,
"isUnique" : false,
"numDistinctValues" : 30,
"mostCommonValues" : [
{
"value" : {
"lastName" : "Smith"
},
"frequency" : 1013
},
{
"value" : {
"lastName" : "Johnson"
},
"frequency" : 984
},
{
"value" : {
"lastName" : "Jones"
},
"frequency" : 962
},
{
"value" : {
"lastName" : "Brown"
},
"frequency" : 925
},
{
"value" : {
"lastName" : "Davies"
},
"frequency" : 852
}
],
"monotonicity" : {
"recordIdCorrelationCoefficient" : 0.0771959161,
"type" : "not monotonic"
},
}
}{ userId: 1 } keyCharacteristics
This 此analyzeShardKey command provides metrics on the { userId: 1 } shard key on the social.post collection:analyzeShardKey命令提供social.post集合上{ userId: 1 }分片键的度量:
use social
db.post.analyzeShardKey(
{ userId: 1 },
{
keyCharacteristics: true,
readWriteDistribution: false
}
)
The output for this example resembles the following:此示例的输出类似于以下内容:
{
"keyCharacteristics": {
"numDocsTotal" : 9039,
"avgDocSizeBytes" : 162,
"numDocsSampled" : 9039,
"isUnique" : false,
"numDistinctValues" : 1495,
"mostCommonValues" : [
{
"value" : {
"userId" : UUID("aadc3943-9402-4072-aae6-ad551359c596")
},
"frequency" : 15
},
{
"value" : {
"userId" : UUID("681abd2b-7a27-490c-b712-e544346f8d07")
},
"frequency" : 14
},
{
"value" : {
"userId" : UUID("714cb722-aa27-420a-8d63-0d5db962390d")
},
"frequency" : 14
},
{
"value" : {
"userId" : UUID("019a4118-b0d3-41d5-9c0a-764338b7e9d1")
},
"frequency" : 14
},
{
"value" : {
"userId" : UUID("b9c9fbea-3c12-41aa-bc69-eb316047a790")
},
"frequency" : 14
}
],
"monotonicity" : {
"recordIdCorrelationCoefficient" : -0.0032039729,
"type" : "not monotonic"
},
}
}{ userId: 1 } readWriteDistribution
This 此analyzeShardKey command provides metrics on the { userId: 1 } shard key on the social.post collection:analyzeShardKey命令提供social.post集合上{ userId: 1 }分片键的度量:
use social
db.post.analyzeShardKey(
{ userId: 1 },
{
keyCharacteristics: false,
readWriteDistribution: true
}
)
The output for this example resembles the following:此示例的输出类似于以下内容:
{
"readDistribution" : {
"sampleSize" : {
"total" : 61363,
"find" : 61363,
"aggregate" : 0,
"count" : 0,
"distinct" : 0
},
"percentageOfSingleShardReads" : 50.0008148233,
"percentageOfMultiShardReads" : 49.9991851768,
"percentageOfScatterGatherReads" : 0,
"numReadsByRange" : [
688,
775,
737,
776,
652,
671,
1332,
1407,
535,
428,
985,
573,
1496,
...
],
},
"writeDistribution" : {
"sampleSize" : {
"total" : 49638,
"update" : 30680,
"delete" : 7500,
"findAndModify" : 11458
},
"percentageOfSingleShardWrites" : 100,
"percentageOfMultiShardWrites" : 0,
"percentageOfScatterGatherWrites" : 0,
"numWritesByRange" : [
389,
601,
430,
454,
462,
421,
668,
833,
493,
300,
683,
460,
...
],
"percentageOfShardKeyUpdates" : 0,
"percentageOfSingleWritesWithoutShardKey" : 0,
"percentageOfMultiWritesWithoutShardKey" : 0
}
}