sh.shardCollection()

On this page

Definition
Considerations
Examples

Definition

sh.shardCollection(namespace, key, unique, options)

Shards a collection using the key as a the shard key. The shard key determines how MongoDB distributes the collection's documents among the shards.

Note

Changed in version 6.0.

Starting in MongoDB 6.0, sharding a collection does not require you to first run the sh.enableSharding() method to configure the database.

Important

mongosh Method

This page documents a mongosh method. This is not the documentation for database commands or language-specific drivers, such as Node.js.

For the database command, see the shardCollection command.

For MongoDB API drivers, refer to the language-specific MongoDB driver documentation.

For the legacy mongo shell documentation, refer to the documentation for the corresponding MongoDB Server release:

sh.shardCollection() takes the following arguments:

Parameter	Type	Description
`namespace`	string	The namespace of the collection to shard in the form `"<database>.<collection>"`.
`key`	document	The document that specifies the field or fields to use as the shard key. `{ <field1>: <1\|"hashed">, ... }` Set the field value to either: `1` for ranged based sharding `"hashed"` to specify a hashed shard key. shard key must be supported by an index. Unless the collection is empty, the index must exist prior to the `shardCollection` command. If the collection is empty, MongoDB creates the index prior to sharding the collection if the index that can support the shard key does not already exist. See also Shard Key Indexes
`unique`	boolean	Optional. Specify `true` to ensure that the underlying index enforces a unique constraint. Defaults to `false`. You cannot specify `true` when using hashed shard keys. If specifying the `options` document, you must explicitly specify the value for `unique`.
`options`	document	Optional. A document containing optional fields, including `numInitialChunks` and `collation`.

The options argument supports the following options:

Parameter	Type	Description
`numInitialChunks`	integer	Optional. Specifies the minimum number of chunks to create initially when sharding an empty collection with a hashed shard key. MongoDB then creates and balances chunks across the cluster. The `numInitialChunks` parameter must be less than `8192` chunks per shard. Defaults to `2` chunks per shard. If the collection is not empty or the shard key does not contain a hashed field, the operation returns an error. If sharding with presplitHashedZones: true, MongoDB attempts to evenly distribute the specified number of chunks across the zones in the cluster. If sharding with presplitHashedZones: false or omitted and no zones and zone ranges are defined for the empty collection, MongoDB attempts to evenly distributed the specified number of chunks across the shards in the cluster. If sharding with presplitHashedZones: false or omitted and zones and zone ranges have been defined for the empty collection, `numInitChunks` has no effect. Changed in version 4.4.
`collation`	document	Optional. If the collection specified to `shardCollection` has a default collation, you must include a collation document with``{ locale : "simple" }``, or the `shardCollection` command fails. At least one of the indexes whose fields support the shard key pattern must have the simple collation.
presplitHashedZones	boolean	Optional. Specify `true` to perform initial chunk creation and distribution for an empty or non-existing collection based on the defined zones and zone ranges for the collection. For hashed sharding only. `shardCollection()` with `presplitHashedZones: true` returns an error if any of the following are true: The shard key does not contain a hashed field (i.e. is not a single field hashed index or compound hashed index). The collection has no defined zones or zone ranges. The defined zone ranges do not meet the requirements. New in version 4.4.
timeseries	document	Optional. Specify this option to create a new sharded time series collection. To shard an existing time series collection, omit this parameter. When the collection specified to `shardCollection` is a time series collection and the `timeseries` option is not specified, MongoDB uses the values that define the existing time series collection to populate the `timeseries` field. For detailed syntax, see Time Series Options.

Time Series Options

New in version 5.1.

To create a new time series collection that is sharded, specify the timeseries option to sh.shardCollection().

The timeseries option takes the following fields:

Field	Type	Description
`timeField`	string	Required. The name of the field which contains the date in each time series document. Documents in a time series collection must have a valid BSON date as the value for the `timeField`.
`metaField`	string	Optional. The name of the field which contains metadata in each time series document. The metadata in the specified field should be data that is used to label a unique series of documents. The metadata should rarely, if ever, change. The name of the specified field may not be `_id` or the same as the `timeseries.timeField`. The field can be of any type.
`granularity`	string	Optional. Possible values are: `"seconds"` `"minutes"` `"hours"` By default, MongoDB sets the `granularity` to `"seconds"` for high-frequency ingestion. Manually set the `granularity` parameter to improve performance by optimizing how data in the time series collection is stored internally. To select a value for `granularity`, choose the closest match to the time span between consecutive incoming measurements. If you specify the `timeseries.metaField`, consider the time span between consecutive incoming measurements that have the same unique value for the `metaField` field. Measurements often have the same unique value for the `metaField` field if they come from the same source. If you do not specify `timeseries.metaField`, consider the time span between all measurements that are inserted in the collection. If you set the `granularity` parameter, you can't set the `bucketMaxSpanSeconds` and `bucketRoundingSeconds` parameters.

Considerations

Once a collection has been sharded, MongoDB provides no method to unshard a sharded collection.

Shard Keys

While you can change your shard key later, it is important to carefully consider your shard key choice to avoid scalability and perfomance issues.

Shard Keys on Time Series Collections

When sharding time series collections, you can only specify the following fields in the shard key:

The metaField
Sub-fields of metaField
The timeField

You may specify combinations of these fields in the shard key. No other fields, including _id, are allowed in the shard key pattern.

When you specify the shard key:

metaField can be either a:
- Hashed shard key
- Ranged shard key
timeField must be:
- A ranged shard key
- At the end of the shard key pattern

Tip

Avoid specifying only the timeField as the shard key. Since the timeField increases monotonically, it may result in all writes appearing on a single chunk within the cluster. Ideally, data is evenly distributed across chunks.

To learn how to best choose a shard key, see:

Tip

Hashed Shard Keys

Hashed shard keys use a hashed index or a compound hashed index as the shard key.

Use the form field: "hashed" to specify a hashed shard key field.

Note

If chunk migrations are in progress while creating a hashed shard key collection, the initial chunk distribution may be uneven until the balancer automatically balances the collection.

Tip

Zone Sharding and Initial Chunk Distribution

The shard collection operation (i.e. shardCollection command and the sh.shardCollection() helper) can perform initial chunk creation and distribution for an empty or a non-existing collection if zones and zone ranges have been defined for the collection. Initial chunk distribution allows for a faster setup of zoned sharding. After the initial distribution, the balancer manages the chunk distribution going forward per usual.

See Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection for an example. If sharding a collection using a ranged or single-field hashed shard key, the numInitialChunks option has no effect if zones and zone ranges have been defined for the empty collection.

To shard a collection using a compound hashed index, see Initial Chunk Distribution with Compound Hashed Indexes.

Initial Chunk Distribution with Compound Hashed Indexes

Starting in version 4.4, MongoDB supports sharding collections on compound hashed indexes. When sharding an empty or non-existing collection using a compound hashed shard key, additional requirements apply in order for MongoDB to perform initial chunk creation and distribution.

The numInitialChunks option has no effect if zones and zone ranges have been defined for the empty collection and presplitHashedZones is false.

See Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection for an example.

Tip

Uniqueness

If specifying unique: true:

If the collection is empty, sh.shardCollection() creates the unique index on the shard key if such an index does not already exist.
If the collection is not empty, you must create the index first before using sh.shardCollection().

Although you can have a unique compound index where the shard key is a prefix, if using unique parameter, the collection must have a unique index that is on the shard key.

Collation

If the collection has a default collation, the sh.shardCollection() command must include a collation parameter with the value { locale: "simple" }. For non-empty collections with a default collation, you must have at least one index with the simple collation whose fields support the shard key pattern.

You do not need to specify the collation option for collections without a collation. If you do specify the collation option for a collection with no collation, it will have no effect.

Write Concern

mongos uses "majority" for the write concern of the shardCollection command and its helper sh.shardCollection().

Examples

Simple Usage

Given a collection named people in a database named records, the following command shards the collection by the zipcode field:

sh.shardCollection("records.people", { zipcode: 1 } )

Usage with Options

The phonebook database has a collection contacts with no default collation. The following example uses sh.shardCollection() to shard the phonebook.contacts with:

a hashed shard key on the last_name field,
5 initial chunks, and
a collation of simple.

sh.shardCollection(
  "phonebook.contacts",
  { last_name: "hashed" },
  false,
  {
    numInitialChunks: 5,
    collation: { locale: "simple" }
  }
)

sh.shardCollection()

Definition

Note

Important

mongosh Method

Time Series Options

Considerations

Shard Keys

Shard Keys on Time Series Collections

Tip

Tip

See also:

Hashed Shard Keys

Note

Tip

See also:

Zone Sharding and Initial Chunk Distribution

Initial Chunk Distribution with Compound Hashed Indexes

Tip

See also:

Uniqueness

Collation

Write Concern

Examples

Simple Usage

Usage with Options

Tip

See also: