Docs HomeMongoDB Manual

Sharding分片

Sharding is a method for distributing data across multiple machines. 分片是一种在多台机器之间分配数据的方法。MongoDB uses sharding to support deployments with very large data sets and high throughput operations.MongoDB使用分片来支持具有非常大的数据集和高吞吐量操作的部署。

Database systems with large data sets or high throughput applications can challenge the capacity of a single server. 具有大数据集或高吞吐量应用程序的数据库系统可能会挑战单个服务器的容量。For example, high query rates can exhaust the CPU capacity of the server. 例如,高查询率可能会耗尽服务器的CPU容量。Working set sizes larger than the system's RAM stress the I/O capacity of disk drives.大于系统RAM的工作集大小会增加磁盘驱动器的I/O容量。

There are two methods for addressing system growth: vertical and horizontal scaling.有两种方法可以解决系统增长问题:垂直缩放和水平缩放。

Vertical Scaling垂直缩放 involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space. 涉及增加单个服务器的容量,例如使用更强大的CPU、添加更多的RAM或增加存储空间。Limitations in available technology may restrict a single machine from being sufficiently powerful for a given workload. 可用技术的限制可能会限制单个机器对给定工作负载的足够强大。Additionally, Cloud-based providers have hard ceilings based on available hardware configurations. As a result, there is a practical maximum for vertical scaling.此外,基于云的提供商有基于可用硬件配置的硬上限。因此,存在垂直缩放的实际最大值。

Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required. While the overall speed or capacity of a single machine may not be high, each machine handles a subset of the overall workload, potentially providing better efficiency than a single high-speed high-capacity server. 横向扩展包括将系统数据集和负载划分到多个服务器上,并根据需要添加额外的服务器以增加容量。虽然单个机器的总体速度或容量可能不高,但每台机器都处理总体工作负载的一个子集,这可能比单个高速高容量服务器提供更好的效率。Expanding the capacity of the deployment only requires adding additional servers as needed, which can be a lower overall cost than high-end hardware for a single machine. 扩展部署的容量只需要根据需要添加额外的服务器,这可能比单机的高端硬件的总体成本更低。The trade off is increased complexity in infrastructure and maintenance for the deployment.权衡是基础设施和部署维护的复杂性增加。

MongoDB supports horizontal scaling through sharding.MongoDB支持通过分片进行水平扩展

Sharded Cluster分片集群

A MongoDB sharded cluster consists of the following components:MongoDB分片集群由以下组件组成:

  • shard分片: Each shard contains a subset of the sharded data. Each shard can be deployed as a replica set.:每个分片都包含分片数据的一个子集。每个分片都可以作为一个复制集进行部署。
  • mongos: The mongos acts as a query router, providing an interface between client applications and the sharded cluster. mongos充当查询路由器,提供客户端应用程序和分片集群之间的接口。Starting in MongoDB 4.4, mongos can support hedged reads to minimize latencies.从MongoDB 4.4开始,mongos可以支持对冲读取,以最大限度地减少延迟。
  • config servers配置服务器: Config servers store metadata and configuration settings for the cluster.:配置服务器存储群集的元数据和配置设置。

The following graphic describes the interaction of components within a sharded cluster:下图描述了分片集群中组件的交互:

Diagram of a sample sharded cluster for production purposes.  Contains exactly 3 config servers, 2 or more ``mongos`` query routers, and at least 2 shards. The shards are replica sets.

MongoDB shards data at the collection level, distributing the collection data across the shards in the cluster.MongoDB在集合级别对数据进行分片化,将集合数据分布在集群中的分片上。

Shard Keys分片键

MongoDB uses the shard key to distribute the collection's documents across shards. The shard key consists of a field or multiple fields in the documents.MongoDB使用分片键在分片之间分发集合的文档。分片键由文档中的一个或多个字段组成。

  • Starting in version 4.4, documents in sharded collections can be missing the shard key fields. 从4.4版本开始,分片集合中的文档可能会缺少分片键字段。Missing shard key fields are treated as having null values when distributing the documents across shards but not when routing queries. 在跨分片分发文档时,丢失的分片键字段被视为具有null值,而在路由查询时则不被视为。For more information, see Missing Shard Key Fields.有关详细信息,请参阅缺少分片键字段
  • In version 4.2 and earlier, shard key fields must exist in every document for a sharded collection.在4.2及更早版本中,分片集合的每个文档中都必须存在分片键字段。

You select the shard key when sharding a collection.对集合进行分片时,可以选择分片键。

  • Starting in MongoDB 5.0, you can reshard a collection by changing a collection's shard key.从MongoDB 5.0开始,您可以通过更改集合的分片键来重新分片集合
  • Starting in MongoDB 4.4, you can refine a shard key by adding a suffix field or fields to the existing shard key.从MongoDB 4.4开始,您可以通过在现有的分片键中添加一个或多个后缀字段来细化分片键
  • In MongoDB 4.2 and earlier, the choice of shard key cannot be changed after sharding.在MongoDB 4.2及更早版本中,分片后不能更改对分片键的选择。

A document's shard key value determines its distribution across the shards.文档的分片键值决定了它在分片之间的分布。

  • Starting in MongoDB 4.2, you can update a document's shard key value unless your shard key field is the immutable _id field. 从MongoDB 4.2开始,您可以更新文档的分片键值,除非您的分片键字段是不可变的_id字段。See Change a Document's Shard Key Value for more information.有关详细信息,请参阅更改文档的分片键值
  • In MongoDB 4.0 and earlier, a document's shard key field value is immutable.在MongoDB 4.0及更早版本中,文档的分片键字段值是不可变的。

Shard Key Index分片键索引

To shard a populated collection, the collection must have an index that starts with the shard key. 要对已填充的集合进行分片,该集合必须具有以分片键开头的索引When sharding an empty collection, MongoDB creates the supporting index if the collection does not already have an appropriate index for the specified shard key. 当对一个空集合进行分片时,如果该集合尚未为指定的分片键创建适当的索引,MongoDB将创建支持索引。See Shard Key Indexes.请参阅分片键索引

Shard Key Strategy分片关键策略

The choice of shard key affects the performance, efficiency, and scalability of a sharded cluster. 分片键的选择会影响分片集群的性能、效率和可扩展性。A cluster with the best possible hardware and infrastructure can be bottlenecked by the choice of shard key. 拥有尽可能好的硬件和基础设施的集群可能会因分片键的选择而受到瓶颈。The choice of shard key and its backing index can also affect the sharding strategy that your cluster can use.分片键及其支持索引的选择也会影响集群可以使用的分片策略

Tip

See also: 另请参阅:

Choose a Shard Key选择分片键

Chunks区块

MongoDB partitions sharded data into chunks. Each chunk has an inclusive lower and exclusive upper range based on the shard key.MongoDB将分片数据划分为区块。每个区块都有一个基于分片键的包含下限和独占上限范围。

Balancer and Even Chunk Distribution平衡器和均匀区块分布

In an attempt to achieve an even distribution of chunks across all shards in the cluster, a balancer runs in the background to migrate chunks across the shards .为了在集群中的所有分片上实现区块的均匀分布,平衡器在后台运行,以在分片之间迁移区块

Advantages of Sharding分片的优点

Reads / Writes读取/写入

MongoDB distributes the read and write workload across the shards in the sharded cluster, allowing each shard to process a subset of cluster operations. MongoDB将读写工作负载分布在分片集群中的分片上,允许每个分片处理集群操作的子集。Both read and write workloads can be scaled horizontally across the cluster by adding more shards.通过添加更多的分片,可以在集群中横向扩展读写工作负载。

For queries that include the shard key or the prefix of a compound shard key, mongos can target the query at a specific shard or set of shards. 对于包含分片键或复合分片键前缀的查询,mongos可以将查询的目标锁定在特定的分片或一组分片上。These targeted operations are generally more efficient than broadcasting to every shard in the cluster.这些有针对性的操作通常比向集群中的每个分片广播更高效。

Starting in MongoDB 4.4, mongos can support hedged reads to minimize latencies.从MongoDB 4.4开始,mongos可以支持对冲读取,以最大限度地减少延迟。

Storage Capacity存储容量

Sharding distributes data across the shards in the cluster, allowing each shard to contain a subset of the total cluster data. 分片将数据分布在集群中的分片上,允许每个分片包含总集群数据的一个子集。As the data set grows, additional shards increase the storage capacity of the cluster.随着数据集的增长,额外的分片会增加集群的存储容量。

High Availability高可用性

The deployment of config servers and shards as replica sets provide increased availability.配置服务器和分片作为副本集的部署提供了更高的可用性。

Even if one or more shard replica sets become completely unavailable, the sharded cluster can continue to perform partial reads and writes. 即使一个或多个分片副本集完全不可用,分片集群也可以继续执行部分读取和写入。That is, while data on the unavailable shard(s) cannot be accessed, reads or writes directed at the available shards can still succeed.也就是说,虽然无法访问不可用分片上的数据,但针对可用分片的读取或写入仍然可以成功。

Considerations Before Sharding拆分前的注意事项

Sharded cluster infrastructure requirements and complexity require careful planning, execution, and maintenance.分散的集群基础架构需求和复杂性需要仔细规划、执行和维护。

Once a collection has been sharded, MongoDB provides no method to unshard a sharded collection.一旦对集合进行了分片,MongoDB就不提供任何方法来取消对分片集合的分片。

While you can reshard your collection later, it is important to carefully consider your shard key choice to avoid scalability and perfomance issues.虽然您可以稍后重新分片集合,但重要的是要仔细考虑您的分片键选择,以避免可扩展性和性能问题。

Tip

See also: 另请参阅:

Choose a Shard Key选择分片键

To understand the operational requirements and restrictions for sharding your collection, see Operational Restrictions in Sharded Clusters.要了解对集合进行分片的操作要求和限制,请参阅分片集群中的操作限制

If queries do not include the shard key or the prefix of a compound shard key, mongos performs a broadcast operation, querying all shards in the sharded cluster. 如果查询不包括分片键或复合分片键的前缀,mongos会执行广播操作,查询分片集群中的所有分片。These scatter/gather queries can be long running operations.这些分散/聚集查询可能是长时间运行的操作。

Starting in MongoDB 5.1, when starting, restarting or adding a shard server with sh.addShard() the Cluster Wide Write Concern (CWWC) must be set.从MongoDB 5.1开始,当使用sh.addShard()启动、重新启动或添加分片服务器时,必须设置集群范围写问题(CWWC)

If the CWWC is not set and the shard is configured such that the default write concern is { w : 1 } the shard server will fail to start or be added and returns an error.如果未设置CWWC,并且对分片进行了配置,使得默认的写入关注{ w : 1 },则分片服务器将无法启动或添加,并返回错误。

See default write concern calculations for details on how the default write concern is calculated.有关如何计算默认写入问题的详细信息,请参阅默认写入关注计算

Note

If you have an active support contract with MongoDB, consider contacting your account representative for assistance with sharded cluster planning and deployment.如果您与MongoDB签订了有效的支持合同,请考虑联系您的客户代表,以获得分片集群规划和部署方面的帮助。

Sharded and Non-Sharded Collections分片和非分片集合

A database can have a mixture of sharded and unsharded collections. 数据库可以包含分片集合和非分片集合。Sharded collections are partitioned and distributed across the shards in the cluster. 分片集合被分区并分布在集群中的分片中。Unsharded collections are stored on a primary shard. Each database has its own primary shard.未绑定的集合存储在主分片上。每个数据库都有自己的主分片

Diagram of a primary shard. A primary shard contains non-sharded collections as well as chunks of documents from sharded collections. Shard A is the primary shard.

Connecting to a Sharded Cluster连接到分片群集

You must connect to a mongos router to interact with any collection in the sharded cluster. 您必须连接到mongos路由器才能与分片集群中的任何集合进行交互。This includes sharded and unsharded collections. Clients should never connect to a single shard in order to perform read or write operations.这包括已分片和未分片的集合。客户端不应该为了执行读或写操作而连接到单个分片。

Diagram of applications/drivers issuing queries to mongos for unsharded collection as well as sharded collection. Config servers not shown.

You can connect to a mongos the same way you connect to a mongod using the mongosh or a MongoDB driver.您可以像使用mongosh或MongoDB驱动程序连接mongod一样连接mongos

Sharding Strategy分片策略

MongoDB supports two sharding strategies for distributing data across sharded clusters.MongoDB支持两种分片策略,用于跨分片集群分发数据。

Hashed Sharding哈希分片

Hashed Sharding involves computing a hash of the shard key field's value. 哈希分片涉及计算分片键字段值的散列。Each chunk is then assigned a range based on the hashed shard key values.然后,根据散列的分片键值为每个区块分配一个范围。

Tip

MongoDB automatically computes the hashes when resolving queries using hashed indexes. MongoDB在使用散列索引解析查询时自动计算散列。 Applications do not need to compute hashes.应用程序不需要计算散列。

Diagram of the hashed based segmentation.

While a range of shard keys may be "close", their hashed values are unlikely to be on the same chunk. 虽然一系列分片键可能“接近”,但它们的哈希值不太可能在同一区块上。Data distribution based on hashed values facilitates more even data distribution, especially in data sets where the shard key changes monotonically.基于散列值的数据分布有助于更均匀的数据分布,尤其是在分片键单调变化的数据集中。

However, hashed distribution means that range-based queries on the shard key are less likely to target a single shard, resulting in more cluster wide broadcast operations然而,散列分布意味着对分片键的基于范围的查询不太可能针对单个分片,从而导致更多集群范围的广播操作

See Hashed Sharding for more information.有关详细信息,请参阅哈希分片

Ranged Sharding范围分片

Ranged sharding involves dividing data into ranges based on the shard key values. 范围分片涉及根据分片键值将数据划分为多个范围。Each chunk is then assigned a range based on the shard key values.然后根据分片键值为每个区块分配一个范围。

Diagram of the shard key value space segmented into smaller ranges or chunks.

A range of shard keys whose values are "close" are more likely to reside on the same chunk. 值为“close”的一系列分片键更有可能位于同一区块上。This allows for targeted operations as a mongos can route the operations to only the shards that contain the required data.这允许有针对性的操作,因为mongos可以将操作路由到只包含所需数据的分片。

The efficiency of ranged sharding depends on the shard key chosen. 远程分片的效率取决于所选的分片键。Poorly considered shard keys can result in uneven distribution of data, which can negate some benefits of sharding or can cause performance bottlenecks. 考虑不周的分片键可能会导致数据分布不均,这可能会抵消分片的一些好处,或者导致性能瓶颈。See shard key selection for range-based sharding.有关基于范围的分片,请参阅分片关键帧选择

See Ranged Sharding for more information.有关详细信息,请参阅范围分片

Zones in Sharded Clusters分片集群中的区域

Zones can help improve the locality of data for sharded clusters that span multiple data centers.分区可以帮助提高跨多个数据中心的分片集群的数据位置。

In sharded clusters, you can create zones of sharded data based on the shard key. 在分片集群中,可以根据分片键创建分片数据区域You can associate each zone with one or more shards in the cluster. A shard can associate with any number of zones. 您可以将每个区域与集群中的一个或多个分片相关联。分片可以与任意数量的区域关联。In a balanced cluster, MongoDB migrates chunks covered by a zone only to those shards associated with the zone.在一个平衡的集群中,MongoDB只将一个区域覆盖的区块迁移到与该区域关联的分片中。

Each zone covers one or more ranges of shard key values. 每个区域覆盖一个或多个分片键值范围。Each range a zone covers is always inclusive of its lower boundary and exclusive of its upper boundary.一个区域所覆盖的每个范围总是包括其下边界,不包括其上边界。

Diagram of data distribution based on zones in a sharded cluster

You must use fields contained in the shard key when defining a new range for a zone to cover. 在为要覆盖的区域定义新范围时,必须使用分片键中包含的字段。If using a compound shard key, the range must include the prefix of the shard key. See shard keys in zones for more information.如果使用复合分片键,则范围必须包括分片键的前缀。有关详细信息,请参阅区域中的分片键

The possible use of zones in the future should be taken into consideration when choosing a shard key.在选择分片键时,应考虑将来可能使用的区域。

Tip

Starting in MongoDB 4.0.3, setting up zones and zone ranges before you shard an empty or a non-existing collection allows for a faster setup of zoned sharding.从MongoDB 4.0.3开始,在分割空的或不存在的集合之前设置区域和区域范围,可以更快地设置分区分割。

See zones for more information.有关详细信息,请参阅分区

Collations in Sharding分片中的排序规则

Use the shardCollection command with the collation : { locale : "simple" } option to shard a collection which has a default collation. Successful sharding requires that:使用带有collation : { locale : "simple" }选项的shardCollection命令可以对具有默认排序规则的集合进行分片。成功的分片需要:

  • The collection must have an index whose prefix is the shard key集合必须具有前缀为分片键的索引
  • The index must have the collation { locale: "simple" }索引必须具有排序规则{ locale: "simple" }

When creating new collections with a collation, ensure these conditions are met prior to sharding the collection.使用排序规则创建新集合时,请确保在对集合进行分片之前满足这些条件。

Note

Queries on the sharded collection continue to use the default collation configured for the collection. 对分片集合的查询继续使用为该集合配置的默认排序规则。To use the shard key index's simple collation, specify {locale : "simple"} in the query's collation document.要使用分片键索引的simple排序规则,请在查询的排序规则文档中指定{locale : "simple"}

See shardCollection for more information about sharding and collation.有关分片和排序规则的更多信息,请参阅shardCollection

Change Streams更改流

Starting in MongoDB 3.6, change streams are available for replica sets and sharded clusters. 从MongoDB 3.6开始,变更流可用于副本集和分片集群。Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. 更改流允许应用程序访问实时数据更改,而不会带来跟踪操作日志的复杂性和风险。Applications can use change streams to subscribe to all data changes on a collection or collections.应用程序可以使用更改流订阅集合上的所有数据更改。

Transactions事务

Starting in MongoDB 4.2, with the introduction of distributed transactions, multi-document transactions are available on sharded clusters.从MongoDB 4.2开始,随着分布式事务的引入,多文档事务可以在分片集群上使用。

Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.在事务提交之前,在事务中所做的数据更改在事务外部是不可见的。

However, when a transaction writes to multiple shards, not all outside read operations need to wait for the result of the committed transaction to be visible across the shards. 然而,当一个事务写入多个分片时,并不是所有的外部读取操作都需要等待提交的事务的结果在分片中可见。For example, if a transaction is committed and write 1 is visible on shard A but write 2 is not yet visible on shard B, an outside read at read concern "local" can read the results of write 1 without seeing write 2.例如,如果事务已提交,并且写1在分片a上可见,但写2在分片B上还不可见,则外部读取时关注点"local"可以读取写1的结果,而不会看到写2。

Learn More了解更多信息

Practical MongoDB Aggregations E-Book实用的MongoDB聚合电子书

For more information on how sharding works with aggregations, read the sharding chapter in the Practical MongoDB Aggregations e-book.有关分片如何使用聚合的更多信息,请阅读实用MongoDB聚合电子书中的分片章节。

Additional Information附加信息