Database Manual / Reference / Database Commands / Sharding

shardCollection (database command数据库命令)

Definition定义

shardCollection

Shards a collection to distribute its documents across shards. The shardCollection command must be run against the admin database.将集合分片,以便在分片之间分发其文档。必须对admin数据库运行shardCollection命令。

Note

Changed in version 6.0.在版本6.0中的更改。

Starting in MongoDB 6.0, sharding a collection does not require you to first run the enableSharding command to configure the database.从MongoDB 6.0开始,对集合进行分片不需要您首先运行enableSharding命令来配置数据库。

Tip

In mongosh, this command can also be run through the sh.shardCollection() helper method.mongosh中,此命令也可以通过sh.shardCollection()辅助方法运行。

Helper methods are convenient for mongosh users, but they may not return the same level of information as database commands. 助手方法对mongosh用户来说很方便,但它们可能不会返回与数据库命令相同级别的信息。 In cases where the convenience is not needed or the additional return fields are required, use the database command.如果不需要便利性或需要额外的返回字段,请使用数据库命令。

Compatibility兼容性

This command is available in deployments hosted in the following environments:此命令在以下环境中托管的部署中可用:

  • MongoDB Atlas: The fully managed service for MongoDB deployments in the cloud:云中MongoDB部署的完全托管服务
  • MongoDB Enterprise: The subscription-based, self-managed version of MongoDB:MongoDB的基于订阅的自我管理版本
  • MongoDB Community: The source-available, free-to-use, and self-managed version of MongoDB:MongoDB的源代码可用、免费使用和自我管理版本

Syntax语法

To run shardCollection, use the db.runCommand( { <command> } ) method.

The command has the following form:该命令具有以下形式:

db.adminCommand(
{
shardCollection: "<database>.<collection>",
key: { <field1>: <1|"hashed">, ... },
unique: <boolean>,
presplitHashedZones: <boolean>,
collation: { locale: "simple" },
timeseries: <object>
}
)

Command Fields命令字段

The command takes the following fields:该命令包含以下字段:

Field字段Type类型Description描述
shardCollectionstring字符串The namespace of the collection to shard in the form <database>.<collection>.<database>.<collection>的形式将集合的命名空间分片。
keydocument文档

The document that specifies the field or fields to use as the shard key.指定用作分片键的一个或多个字段的文档。

{ <field1>: <1|"hashed">, ... }

Set the field values to either:将字段值设置为:

shard key must be supported by an index. Unless the collection is empty, the index must exist prior to the shardCollection command. If the collection is empty, MongoDB creates the index prior to sharding the collection if the index that can support the shard key does not already exist.

See also Shard Key Indexes

uniqueboolean布尔值

Specify true to ensure that the underlying index enforces a unique constraint. Defaults to false.

You cannot specify true when using hashed shard keys.

collationdocument文档Optional. 可选。If the collection specified to shardCollection has a default collation, you must include a collation document with { locale : "simple" }, or the shardCollection command fails. At least one of the indexes whose fields support the shard key pattern must have the simple collation.
presplitHashedZonesboolean布尔值

Optional. 可选。Specify true to perform initial chunk creation and distribution for an empty or non-existing collection based on the defined zones and zone ranges for the collection. 指定true可根据为集合定义的区域和区域范围,为空或不存在的集合执行初始块创建和分发。For hashed sharding only.仅适用于哈希分片

shardCollection with presplitHashedZones: true returns an error if any of the following are true:

timeseriesobject

Optional. 可选。Specify this option to create a new sharded time series collection.指定此选项以创建新的分片时间序列集合

To shard an existing time series collection, omit this parameter.要对现有的时间序列集合进行分片,请省略此参数。

When the collection specified to shardCollection is a time series collection and the timeseries option is not specified, MongoDB uses the values that define the existing time series collection to populate the timeseries field.当指定给shardCollection的集合是时间序列集合并且未指定timeseries选项时,MongoDB使用定义现有时间序列集合的值来填充timeseries字段。

For detailed syntax, see Time Series Options.有关详细语法,请参阅时间序列选项

New in version 5.1.在版本5.1中新增。

Time Series Options时间序列选项

New in version 5.1.在版本5.1中新增。

To create a new time series collection that is sharded, specify the timeseries option to shardCollection.要创建一个新的分片时间序列集合,请为shardCollection指定timeseries选项

The timeseries option takes the following fields:timeseries选项包含以下字段:

Field字段Type类型Description描述
timeFieldstring字符串

Required. The name of the field which contains the date in each time series document. Documents in a time series collection must have a valid BSON date as the value for the timeField.必需的。每个时间序列文档中包含日期的字段的名称。时间序列集合中的文档必须具有有效的BSON日期作为timeField的值。

Warning

Starting in MongoDB 8.0, use of the timeField as a shard key in a time series collection is deprecated.从MongoDB 8.0开始,不推荐将timeField用作时间序列集合中的分片键。

metaFieldstring字符串

Optional. 可选。The name of the field which contains metadata in each time series document. The metadata in the specified field should be data that is used to label a unique series of documents. 每个时间序列文档中包含元数据的字段的名称。指定字段中的元数据应该是用于标记一系列唯一文档的数据。The metadata should rarely, if ever, change The name of the specified field may not be _id or the same as the timeseries.timeField. The field can be of any data type.元数据应该很少(如果有的话)更改。指定字段的名称可能不是_id或与timeseries.timeField不同。该字段可以是任何数据类型。

Although the metaField field is optional, using metadata can improve query optimization. For example, MongoDB automatically creates a compound index on the metaField and timeField fields for new collections. If you do not provide a value for this field, the data is bucketed solely based on time.如果不为此字段提供值,则数据将仅基于时间进行分组。

granularitystring字符串

Optional. 可选。Possible values are:可能的值有:

  • "seconds"
  • "minutes"
  • "hours"

By default, MongoDB sets the granularity to "seconds" for high-frequency ingestion.

Manually set the granularity parameter to improve performance by optimizing how data in the time series collection is stored internally. To select a value for granularity, choose the closest match to the time span between consecutive incoming measurements.

If you specify the timeseries.metaField, consider the time span between consecutive incoming measurements that have the same unique value for the metaField field. Measurements often have the same unique value for the metaField field if they come from the same source.

If you do not specify timeseries.metaField, consider the time span between all measurements that are inserted in the collection.

If you set the granularity parameter, you can't set the bucketMaxSpanSeconds and bucketRoundingSeconds parameters.如果设置了granularity参数,则无法设置bucketMaxSpanSecondsbucketRoundingSeconds参数。

Considerations注意事项

Shard Keys

While you can change your shard key later, it is important to carefully consider your shard key choice to avoid scalability and perfomance issues.虽然您可以稍后更改分片键,但重要的是要仔细考虑分片键的选择,以避免可扩展性和性能问题。

Shard Keys on Time Series Collections时间序列集合的分片钥匙

When sharding time series collections, you can only specify the following fields in the shard key:在对时间序列集合进行分片时,您只能在分片键中指定以下字段:

  • The metaField
  • Sub-fields of metaField
  • The timeField

You may specify combinations of these fields in the shard key. No other fields, including _id, are allowed in the shard key pattern.您可以在分片键中指定这些字段的组合。分片键模式中不允许有其他字段,包括_id

When you specify the shard key:当您指定分片键时:

Tip

Avoid specifying only the timeField as the shard key. Since the timeField increases monotonically, it may result in all writes appearing on a single chunk within the cluster. Ideally, data is evenly distributed across chunks.

To learn how to best choose a shard key, see:要了解如何最好地选择分片键,请参阅:

Warning

Starting in MongoDB 8.0, use of the timeField as a shard key in a time series collection is deprecated.从MongoDB 8.0开始,不推荐将timeField用作时间序列集合中的分片键。

Hashed Shard Keys哈希分片键

Hashed shard keys use a hashed index or a compound hashed index as the shard key.

Use the form field: "hashed" to specify a hashed shard key field.使用形式field: "hashed"指定哈希分片键字段。

Note

If chunk migrations are in progress while creating a hashed shard key collection, the initial chunk distribution may be uneven until the balancer automatically balances the collection.如果在创建哈希分片键集合时正在进行块迁移,则初始块分布可能会不均匀,直到平衡器自动平衡集合。

Reshard to Balance重新努力平衡

When you run the shardCollection command, the balancer begins distributing the collection data to other shards in the cluster. A single shard can only participate in one chunk migration at a time, When MongoDB succeeds in copying a range of data from one shard to another, the range on the donor shard is marked for removal by the range deleter. This process is slow and resource intensive.

Starting in MongoDB 8.0, if your deployment meets the resource requirements, it's recommended you use the reshardCollection command to perform this initial balancing of data by resharding to the same key. This causes MongoDB to rebalance data across the shards without waiting on the balancer.

To use the reshardCollection command to perform the initial balancing:

  1. Use the shardCollection command to configure the collection as a sharded collection.
  2. Use the reshardCollection command to reshard to the same shard key by setting the forceRedistribution option to true. MongoDB then balances the data across the shards.

For more information, see Reshard to the Same Shard Key.

Zone Sharding and Initial Chunk Distribution区域划分和初始块体分布

The shard collection operation (i.e. shardCollection command and the sh.shardCollection() helper) can perform initial chunk creation and distribution for an empty or a non-existing collection if zones and zone ranges have been defined for the collection. 如果已经为集合定义了区域和区域范围,则分片集合操作(即shardCollection命令和sh.shardCollection()助手)可以为空或不存在的集合执行初始块创建和分发。Initial chunk distribution allows for a faster setup of zoned sharding. After the initial distribution, the balancer manages the chunk distribution going forward per usual.初始块分布允许更快地设置分区分片。在初始分发之后,平衡器按照惯例管理块分发。

To shard a collection using a compound hashed index, see Zone Sharding and Compound Hashed Indexes.

Zone Sharding and Compound Hashed Indexes区域分片和复合哈希索引

MongoDB supports sharding collections on compound hashed indexes. When sharding an empty or non-existing collection using a compound hashed shard key, additional requirements apply in order for MongoDB to perform initial chunk creation and distribution.

See Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection for an example.

Uniqueness独特性

If specifying unique: true:

  • If the collection is empty, shardCollection creates the unique index on the shard key if such an index does not already exist.
  • If the collection is not empty, you must create the index first before using shardCollection.

Although you can have a unique compound index where the shard key is a prefix, if using unique parameter, the collection must have a unique index that is on the shard key.

See also Sharded Collection and Unique Indexes

Collation排序规则

If the collection has a default collation, the shardCollection command must include a collation parameter with the value { locale: "simple" }. For non-empty collections with a default collation, you must have at least one index with the simple collation whose fields support the shard key pattern.

You do not need to specify the collation option for collections without a collation. If you do specify the collation option for a collection with no collation, it will have no effect.

Write Concern写关注

mongos uses "majority" for the write concern of the shardCollection command, its helper sh.shardCollection(), and the sh.shardAndDistributeCollection() method.

Example示例

The following operation enables sharding for the people collection in the records database and uses the zipcode field as the shard key:以下操作允许对records数据库中的people集合进行分片,并使用zipcode字段作为分片键

db.adminCommand( { shardCollection: "records.people", key: { zipcode: 1 } } )

Learn More了解更多