Sharding分片

On this page本页内容

Sharding分片 is a method for distributing data across multiple machines. 是一种跨多台机器分发数据的方法。MongoDB uses sharding to support deployments with very large data sets and high throughput operations.MongoDB使用分片来支持具有超大数据集和高吞吐量操作的部署。

Database systems with large data sets or high throughput applications can challenge the capacity of a single server. 具有大型数据集或高吞吐量应用程序的数据库系统可能会挑战单个服务器的容量。For example, high query rates can exhaust the CPU capacity of the server. 例如,高查询率可能会耗尽服务器的CPU容量。Working set sizes larger than the system's RAM stress the I/O capacity of disk drives.大于系统RAM的工作集大小会增加磁盘驱动器的I/O容量。

There are two methods for addressing system growth: vertical and horizontal scaling.有两种方法可以解决系统增长问题:垂直扩展和水平扩展。

Vertical Scaling垂直缩放 involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space. 包括增加单个服务器的容量,例如使用更强大的CPU、添加更多RAM或增加存储空间。Limitations in available technology may restrict a single machine from being sufficiently powerful for a given workload. 现有技术的局限性可能会限制一台机器对给定工作负载的足够能力。Additionally, Cloud-based providers have hard ceilings based on available hardware configurations. 此外,基于云的提供商根据可用的硬件配置有硬上限。As a result, there is a practical maximum for vertical scaling.因此,垂直缩放有一个实际的最大值。

Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required. 横向扩展涉及将系统数据集和负载划分到多个服务器上,根据需要添加额外的服务器以增加容量。While the overall speed or capacity of a single machine may not be high, each machine handles a subset of the overall workload, potentially providing better efficiency than a single high-speed high-capacity server. 虽然一台机器的总体速度或容量可能不高,但每台机器都处理总体工作负载的一个子集,可能比一台高速高容量服务器提供更好的效率。Expanding the capacity of the deployment only requires adding additional servers as needed, which can be a lower overall cost than high-end hardware for a single machine. 扩展部署容量只需要根据需要添加额外的服务器,这可能比单台机器的高端硬件的总体成本更低。The trade off is increased complexity in infrastructure and maintenance for the deployment.取舍是部署的基础设施和维护变得更加复杂。

MongoDB supports horizontal scaling through sharding.MongoDB支持通过分片进行水平缩放。

Sharded Cluster分片群集

A MongoDB sharded cluster consists of the following components:MongoDB分片集群由以下组件组成:

  • shard: Each shard contains a subset of the sharded data. :每个分片包含分片数据的子集。Each shard can be deployed as a replica set.每个分片都可以作为副本集部署。
  • mongos: The mongos acts as a query router, providing an interface between client applications and the sharded cluster. mongos充当查询路由器,在客户端应用程序和分片集群之间提供接口。Starting in MongoDB 4.4, mongos can support hedged reads to minimize latencies.从MongoDB 4.4开始,mongos可以支持模糊读取以最小化延迟。
  • config servers配置服务器: Config servers store metadata and configuration settings for the cluster.:Config servers存储集群的元数据和配置设置。

The following graphic describes the interaction of components within a sharded cluster:下图描述了分片集群内组件的交互:

Diagram of a sample sharded cluster for production purposes.  Contains exactly 3 config servers, 2 or more ``mongos`` query routers, and at least 2 shards. The shards are replica sets.

MongoDB shards data at the collection level, distributing the collection data across the shards in the cluster.MongoDB在集合级别对数据进行分片,将集合数据分布在集群中的分片上。

Shard Keys分片键

MongoDB uses the shard key to distribute the collection's documents across shards. MongoDB使用分片键在分片之间分发集合的文档。The shard key consists of a field or multiple fields in the documents.分片键由文档中的一个或多个字段组成。

  • Starting in version 4.4, documents in sharded collections can be missing the shard key fields. 从4.4版开始,分片集合中的文档可能会缺少分片键字段。Missing shard key fields are treated as having null values when distributing the documents across shards but not when routing queries. 在分片之间分发文档时,缺少的分片键字段被视为具有空值,而在路由查询时则不被视为空值。For more information, see Missing Shard Key Fields.有关更多信息,请参阅缺少分片键字段
  • In version 4.2 and earlier, shard key fields must exist in every document for a sharded collection.在4.2版及更早版本中,分片集合的每个文档中都必须存在分片键字段。

You select the shard key when sharding a collection.分片集合时选择分片键

  • Starting in MongoDB 5.0, you can reshard a collection by changing a collection's shard key.从MongoDB 5.0开始,您可以通过更改集合的分片键来重新硬拷贝集合
  • Starting in MongoDB 4.4, you can refine a shard key by adding a suffix field or fields to the existing shard key.从MongoDB 4.4开始,可以通过向现有分片键添加一个或多个后缀字段来优化分片键
  • In MongoDB 4.2 and earlier, the choice of shard key cannot be changed after sharding.在MongoDB 4.2及更早版本中,分片后不能更改分片键的选择。

A document's shard key value determines its distribution across the shards.文档的分片键值决定了它在分片中的分布。

  • Starting in MongoDB 4.2, you can update a document's shard key value unless your shard key field is the immutable _id field. 从MongoDB 4.2开始,您可以更新文档的分片键值,除非您的分片键字段是不可变的_id字段。See Change a Document's Shard Key Value for more information.有关更多信息,请参阅更改文档的分片键值
  • In MongoDB 4.0 and earlier, a document's shard key field value is immutable.在MongoDB 4.0及更早版本中,文档的shard key字段值是不可变的。

Shard Key Index分片键索引

To shard a populated collection, the collection must have an index that starts with the shard key. 要对已填充的集合进行分片,该集合必须具有以分片键开头的索引When sharding an empty collection, MongoDB creates the supporting index if the collection does not already have an appropriate index for the specified shard key. 在对空集合进行分片时,如果集合尚未为指定的分片键创建适当的索引,MongoDB将创建支持索引。See Shard Key Indexes.请参阅分片键索引

Shard Key Strategy分片键策略

The choice of shard key affects the performance, efficiency, and scalability of a sharded cluster. 分片键的选择会影响分片集群的性能、效率和可伸缩性。A cluster with the best possible hardware and infrastructure can be bottlenecked by the choice of shard key. 拥有尽可能最好的硬件和基础设施的集群可能会因选择分片键而受到限制。The choice of shard key and its backing index can also affect the sharding strategy that your cluster can use.分片键及其支持索引的选择也会影响集群可以使用的分片策略

Tip提示

Chunks大块

MongoDB partitions sharded data into chunks. MongoDB将分片的数据分割成Each chunk has an inclusive lower and exclusive upper range based on the shard key.每个区块都有一个基于分片键的包含性下限和独占性上限。

Balancer and Even Chunk Distribution均衡器和均匀块分布

In an attempt to achieve an even distribution of chunks across all shards in the cluster, a balancer runs in the background to migrate chunks across the shards .为了在集群中的所有分片上实现块的均匀分布,一个均衡器在后台运行,在分片上迁移

Tip提示
See also: 参阅:

Advantages of Sharding分片的优点

Reads / Writes读/写

MongoDB distributes the read and write workload across the shards in the sharded cluster, allowing each shard to process a subset of cluster operations. MongoDB将读写工作负载分布在分片集群中的分片上,允许每个分片处理集群操作的子集。Both read and write workloads can be scaled horizontally across the cluster by adding more shards.通过添加更多分片,读写工作负载都可以在集群中水平扩展。

For queries that include the shard key or the prefix of a compound shard key, mongos can target the query at a specific shard or set of shards. 对于包含分片键或复合分片键前缀的查询,mongos可以将查询的目标指向特定的分片或分片集。These targeted operations are generally more efficient than broadcasting to every shard in the cluster.这些有针对性的操作通常比向集群中的每个分片广播更高效。

Starting in MongoDB 4.4, mongos can support hedged reads to minimize latencies.从MongoDB 4.4开始,mongos可以支持模糊读取以最小化延迟。

Storage Capacity存储容量

Sharding distributes data across the shards in the cluster, allowing each shard to contain a subset of the total cluster data. 分片将数据分布在集群中的各个分片上,允许每个分片包含整个集群数据的一个子集。As the data set grows, additional shards increase the storage capacity of the cluster.随着群集存储容量的增加,数据集的容量也随之增加。

High Availability高可用性

The deployment of config servers and shards as replica sets provide increased availability.配置服务器和分片作为副本集的部署提供了更高的可用性。

Even if one or more shard replica sets become completely unavailable, the sharded cluster can continue to perform partial reads and writes. 即使一个或多个分片副本集完全不可用,分片集群也可以继续执行部分读写。That is, while data on the unavailable shard(s) cannot be accessed, reads or writes directed at the available shards can still succeed.也就是说,虽然无法访问不可用分片上的数据,但针对可用分片的读取或写入仍然可以成功。

Considerations Before Sharding分片前的考虑

Sharded cluster infrastructure requirements and complexity require careful planning, execution, and maintenance.分片集群基础架构的需求和复杂性需要仔细规划、执行和维护。

Once a collection has been sharded, MongoDB provides no method to unshard a sharded collection.一旦一个集合被分片,MongoDB就不提供任何方法来取消分片集合的分片。

While you can reshard your collection later, it is important to carefully consider your shard key choice to avoid scalability and perfomance issues.虽然您可以稍后对集合进行重新硬存储,但重要的是要仔细考虑分片键的选择,以避免可伸缩性和性能问题。

Tip提示
See also: 参阅:

To understand the operational requirements and restrictions for sharding your collection, see Operational Restrictions in Sharded Clusters.要了解对集合进行分片的操作要求和限制,请参阅分片集群中的操作限制

If queries do not include the shard key or the prefix of a compound shard key, mongos performs a broadcast operation, querying all shards in the sharded cluster. 如果查询不包括分片键或复合分片键的前缀,mongos将执行广播操作,查询分片集群中的所有分片。These scatter/gather queries can be long running operations.这些分散/聚集查询可以是长时间运行的操作。

Starting in MongoDB 5.1, when starting, restarting or adding a shard server with sh.addShard() the Cluster Wide Write Concern (CWWC) must be set.从MongoDB 5.1开始,在使用shard server with sh.addShard()启动、重新启动或添加shard服务器时,必须设置集群范围的写问题(CWWC)

If the CWWC is not set and the shard is configured such that the default write concern is { w : 1 } the shard server will fail to start or be added and returns an error.如果未设置CWWC,并且已将分片配置为默认写入关注点{w:1},则分片服务器将无法启动或添加,并返回错误。

See default write concern calculations for details on how the default write concern is calculated.有关如何计算默认写入关注点的详细信息,请参阅默认写入关注点计算

Note注意

If you have an active support contract with MongoDB, consider contacting your account representative for assistance with sharded cluster planning and deployment.如果您与MongoDB签订了有效的支持合同,请考虑联系您的客户代表,以获得分片集群规划和部署方面的帮助。

Sharded and Non-Sharded Collections分片和非分片集合

A database can have a mixture of sharded and unsharded collections. 数据库可以混合使用分片集合和非分片集合。Sharded collections are partitioned and distributed across the shards in the cluster. 分片集合被分区并分布在集群中的分片上。Unsharded collections are stored on a primary shard. 未分片的集合存储在主分片上。Each database has its own primary shard.每个数据库都有自己的主分片。

Diagram of a primary shard. A primary shard contains non-sharded collections as well as chunks of documents from sharded collections. Shard A is the primary shard.

Connecting to a Sharded Cluster连接到分片群集

You must connect to a mongos router to interact with any collection in the sharded cluster. 您必须连接到mongos路由器才能与分片集群中的任何集合进行交互。This includes sharded and unsharded collections. 这包括分片集合和非分片集合。Clients should never connect to a single shard in order to perform read or write operations.客户端绝不应该连接到一个分片来执行读或写操作。

Diagram of applications/drivers issuing queries to mongos for unsharded collection as well as sharded collection. Config servers not shown.

You can connect to a mongos the same way you connect to a mongod using the mongosh or a MongoDB driver.使用mongosh或MongoDB驱动程序连接mongod的方式与连接mongos的方式相同。

Sharding Strategy分片策略

MongoDB supports two sharding strategies for distributing data across sharded clusters.MongoDB支持两种分片策略,用于在分片集群之间分发数据。

Hashed Sharding散列分片

Hashed Sharding involves computing a hash of the shard key field's value. 哈希分片涉及计算分片键字段值的哈希。Each chunk is then assigned a range based on the hashed shard key values.然后根据散列的分片键值为每个块分配一个范围。

Tip提示

MongoDB automatically computes the hashes when resolving queries using hashed indexes. 当使用散列索引解析查询时,MongoDB会自动计算散列。Applications do not need to compute hashes.应用程序需要计算哈希。

Diagram of the hashed based segmentation.

While a range of shard keys may be "close", their hashed values are unlikely to be on the same chunk. 虽然一系列分片键可能“接近”,但它们的散列值不太可能在同一上。Data distribution based on hashed values facilitates more even data distribution, especially in data sets where the shard key changes monotonically.基于散列值的数据分布有助于更均匀的数据分布,尤其是在分片键单调变化的数据集中。

However, hashed distribution means that range-based queries on the shard key are less likely to target a single shard, resulting in more cluster wide broadcast operations然而,散列分布意味着基于分片键的范围查询不太可能以单个分片为目标,从而导致更多集群范围的广播操作

See Hashed Sharding for more information.有关更多信息,请参阅哈希分片

Ranged Sharding远程分片

Ranged sharding involves dividing data into ranges based on the shard key values. 范围分片涉及根据分片键值将数据划分为范围。Each chunk is then assigned a range based on the shard key values.然后根据分片键值为每个区块分配一个范围。

Diagram of the shard key value space segmented into smaller ranges or chunks.

A range of shard keys whose values are "close" are more likely to reside on the same chunk. 值为“close”的一系列分片键更有可能位于同一上。This allows for targeted operations as a mongos can route the operations to only the shards that contain the required data.这允许有针对性的操作,因为mongos只能将操作路由到包含所需数据的分片。

The efficiency of ranged sharding depends on the shard key chosen. 远程分片的效率取决于选择的分片键。Poorly considered shard keys can result in uneven distribution of data, which can negate some benefits of sharding or can cause performance bottlenecks. 考虑不周的分片键可能会导致数据分布不均匀,这可能会抵消分片的某些好处,或导致性能瓶颈。See shard key selection for range-based sharding.有关基于范围的分片,请参阅分片关键点选择

See Ranged Sharding for more information.有关更多信息,请参阅远程分片

Zones in Sharded Clusters分片群集中的区域

Zones can help improve the locality of data for sharded clusters that span multiple data centers.分区可以帮助改善跨多个数据中心的分片集群的数据位置。

In sharded clusters, you can create zones of sharded data based on the shard key. 在分片集群中,可以基于分片键创建分片数据区域You can associate each zone with one or more shards in the cluster. 可以将每个分区与群集中的一个或多个分片相关联。A shard can associate with any number of zones. 分片可以与任意数量的区域关联。In a balanced cluster, MongoDB migrates chunks covered by a zone only to those shards associated with the zone.在平衡集群中,MongoDB只将区域覆盖的迁移到与该区域相关联的分片。

Each zone covers one or more ranges of shard key values. 每个分区包含一个或多个分片键值范围。Each range a zone covers is always inclusive of its lower boundary and exclusive of its upper boundary.区域覆盖的每个范围始终包括其下边界,不包括其上边界。

Diagram of data distribution based on zones in a sharded cluster

You must use fields contained in the shard key when defining a new range for a zone to cover. 定义分区要覆盖的新范围时,必须使用分片键中包含的字段。If using a compound shard key, the range must include the prefix of the shard key. 如果使用复合分片键,则范围必须包括分片键的前缀。See shard keys in zones for more information.有关详细信息,请参阅区域中的分片键

The possible use of zones in the future should be taken into consideration when choosing a shard key.在选择分片键时,应考虑将来可能使用的分区。

Tip提示

Starting in MongoDB 4.0.3, setting up zones and zone ranges beforeyou shard an empty or a non-existing collection allows for a faster setup of zoned sharding.从MongoDB 4.0.3开始,在分片空的或不存在的集合之前设置区域和区域范围,可以更快地设置分区分片。

See zones for more information.有关更多信息,请参阅区域

Collations in Sharding分片整理

Use the shardCollection command with the collation : { locale : "simple" } option to shard a collection which has a default collation. shardCollection命令与collation : { locale : "simple" }选项一起使用,对具有默认排序规则的集合进行分片。Successful sharding requires that:成功的分片需要:

  • The collection must have an index whose prefix is the shard key集合必须有一个前缀为分片键的索引
  • The index must have the collation { locale: "simple" }必须有简单的区域索引{ locale: "simple" }

When creating new collections with a collation, ensure these conditions are met prior to sharding the collection.使用排序规则创建新集合时,请确保在分片集合之前满足这些条件。

Note注意

Queries on the sharded collection continue to use the default collation configured for the collection. 对分片集合的查询继续使用为集合配置的默认排序规则。To use the shard key index's simple collation, specify {locale : "simple"} in the query's collation document.要使用分片键索引的简单排序规则,请在查询的排序规则文档中指定{locale : "simple"}

See shardCollection for more information about sharding and collation.有关分片和排序的更多信息,请参阅shardCollection

Change Streams更改流

Starting in MongoDB 3.6, change streams are available for replica sets and sharded clusters. 从MongoDB 3.6开始,更改流可用于副本集和分片集群。Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. 更改流允许应用程序访问实时数据更改,而不必承担跟踪oplog的复杂性和风险。Applications can use change streams to subscribe to all data changes on a collection or collections.应用程序可以使用更改流订阅一个或多个集合上的所有数据更改。

Transactions事务

Starting in MongoDB 4.2, with the introduction of distributed transactions, multi-document transactions are available on sharded clusters.从MongoDB 4.2开始,随着分布式事务的引入,多文档事务在分片集群上可用。

Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.在事务提交之前,事务中所做的数据更改在事务外部是不可见的。

However, when a transaction writes to multiple shards, not all outside read operations need to wait for the result of the committed transaction to be visible across the shards. 但是,当事务写入多个分片时,并非所有外部读取操作都需要等待提交的事务的结果在分片中可见。For example, if a transaction is committed and write 1 is visible on shard A but write 2 is not yet visible on shard B, an outside read at read concern "local" can read the results of write 1 without seeing write 2.例如,如果事务已提交,且写入1在分片a上可见,但写入2在分片B上尚不可见,则外部读取-读取关注点"local"可以读取写入1的结果,而不查看写入2。

Learn More了解更多

Practical MongoDB Aggregations E-Book实用MongoDB聚合电子书

For more information on how sharding works with aggregations, read the sharding chapter in the Practical MongoDB Aggregations e-book.有关分片如何与聚合一起工作的更多信息,请阅读实用MongoDB聚合电子书中的分片一章。

Additional Information其他信息

←  Replica Set Member StatesSharded Cluster Components →