On this page本页内容
Sharding分片 is a method for distributing data across multiple machines. 是一种跨多台机器分发数据的方法。MongoDB uses sharding to support deployments with very large data sets and high throughput operations.MongoDB使用分片来支持具有超大数据集和高吞吐量操作的部署。
Database systems with large data sets or high throughput applications can challenge the capacity of a single server. 具有大型数据集或高吞吐量应用程序的数据库系统可能会挑战单个服务器的容量。For example, high query rates can exhaust the CPU capacity of the server. 例如,高查询率可能会耗尽服务器的CPU容量。Working set sizes larger than the system's RAM stress the I/O capacity of disk drives.大于系统RAM的工作集大小会增加磁盘驱动器的I/O容量。
There are two methods for addressing system growth: vertical and horizontal scaling.有两种方法可以解决系统增长问题:垂直扩展和水平扩展。
Vertical Scaling垂直缩放 involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space. 包括增加单个服务器的容量,例如使用更强大的CPU、添加更多RAM或增加存储空间。Limitations in available technology may restrict a single machine from being sufficiently powerful for a given workload. 现有技术的局限性可能会限制一台机器对给定工作负载的足够能力。Additionally, Cloud-based providers have hard ceilings based on available hardware configurations. 此外,基于云的提供商根据可用的硬件配置有硬上限。As a result, there is a practical maximum for vertical scaling.因此,垂直缩放有一个实际的最大值。
Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required. 横向扩展涉及将系统数据集和负载划分到多个服务器上,根据需要添加额外的服务器以增加容量。While the overall speed or capacity of a single machine may not be high, each machine handles a subset of the overall workload, potentially providing better efficiency than a single high-speed high-capacity server. 虽然一台机器的总体速度或容量可能不高,但每台机器都处理总体工作负载的一个子集,可能比一台高速高容量服务器提供更好的效率。Expanding the capacity of the deployment only requires adding additional servers as needed, which can be a lower overall cost than high-end hardware for a single machine. 扩展部署容量只需要根据需要添加额外的服务器,这可能比单台机器的高端硬件的总体成本更低。The trade off is increased complexity in infrastructure and maintenance for the deployment.取舍是部署的基础设施和维护变得更加复杂。
MongoDB supports horizontal scaling through sharding.MongoDB支持通过分片进行水平缩放。
A MongoDB sharded cluster consists of the following components:MongoDB分片集群由以下组件组成:
mongos
acts as a query router, providing an interface between client applications and the sharded cluster. mongos
充当查询路由器,在客户端应用程序和分片集群之间提供接口。mongos
can support hedged reads to minimize latencies.mongos
可以支持模糊读取以最小化延迟。The following graphic describes the interaction of components within a sharded cluster:下图描述了分片集群内组件的交互:
MongoDB shards data at the collection level, distributing the collection data across the shards in the cluster.MongoDB在集合级别对数据进行分片,将集合数据分布在集群中的分片上。
MongoDB uses the shard key to distribute the collection's documents across shards. MongoDB使用分片键在分片之间分发集合的文档。The shard key consists of a field or multiple fields in the documents.分片键由文档中的一个或多个字段组成。
You select the shard key when sharding a collection.分片集合时选择分片键。
A document's shard key value determines its distribution across the shards.文档的分片键值决定了它在分片中的分布。
_id
field. _id
字段。To shard a populated collection, the collection must have an index that starts with the shard key. 要对已填充的集合进行分片,该集合必须具有以分片键开头的索引。When sharding an empty collection, MongoDB creates the supporting index if the collection does not already have an appropriate index for the specified shard key. 在对空集合进行分片时,如果集合尚未为指定的分片键创建适当的索引,MongoDB将创建支持索引。See Shard Key Indexes.请参阅分片键索引。
The choice of shard key affects the performance, efficiency, and scalability of a sharded cluster. 分片键的选择会影响分片集群的性能、效率和可伸缩性。A cluster with the best possible hardware and infrastructure can be bottlenecked by the choice of shard key. 拥有尽可能最好的硬件和基础设施的集群可能会因选择分片键而受到限制。The choice of shard key and its backing index can also affect the sharding strategy that your cluster can use.分片键及其支持索引的选择也会影响集群可以使用的分片策略。
MongoDB partitions sharded data into chunks. MongoDB将分片的数据分割成块。Each chunk has an inclusive lower and exclusive upper range based on the shard key.每个区块都有一个基于分片键的包含性下限和独占性上限。
In an attempt to achieve an even distribution of chunks across all shards in the cluster, a balancer runs in the background to migrate chunks across the shards .为了在集群中的所有分片上实现块的均匀分布,一个均衡器在后台运行,在分片上迁移块。
MongoDB distributes the read and write workload across the shards in the sharded cluster, allowing each shard to process a subset of cluster operations. MongoDB将读写工作负载分布在分片集群中的分片上,允许每个分片处理集群操作的子集。Both read and write workloads can be scaled horizontally across the cluster by adding more shards.通过添加更多分片,读写工作负载都可以在集群中水平扩展。
For queries that include the shard key or the prefix of a compound shard key, 对于包含分片键或复合分片键前缀的查询,mongos
can target the query at a specific shard or set of shards. mongos
可以将查询的目标指向特定的分片或分片集。These targeted operations are generally more efficient than broadcasting to every shard in the cluster.这些有针对性的操作通常比向集群中的每个分片广播更高效。
Starting in MongoDB 4.4, 从MongoDB 4.4开始,mongos
can support hedged reads to minimize latencies.mongos
可以支持模糊读取以最小化延迟。
Sharding distributes data across the shards in the cluster, allowing each shard to contain a subset of the total cluster data. 分片将数据分布在集群中的各个分片上,允许每个分片包含整个集群数据的一个子集。As the data set grows, additional shards increase the storage capacity of the cluster.随着群集存储容量的增加,数据集的容量也随之增加。
The deployment of config servers and shards as replica sets provide increased availability.配置服务器和分片作为副本集的部署提供了更高的可用性。
Even if one or more shard replica sets become completely unavailable, the sharded cluster can continue to perform partial reads and writes. 即使一个或多个分片副本集完全不可用,分片集群也可以继续执行部分读写。That is, while data on the unavailable shard(s) cannot be accessed, reads or writes directed at the available shards can still succeed.也就是说,虽然无法访问不可用分片上的数据,但针对可用分片的读取或写入仍然可以成功。
Sharded cluster infrastructure requirements and complexity require careful planning, execution, and maintenance.分片集群基础架构的需求和复杂性需要仔细规划、执行和维护。
Once a collection has been sharded, MongoDB provides no method to unshard a sharded collection.一旦一个集合被分片,MongoDB就不提供任何方法来取消分片集合的分片。
While you can reshard your collection later, it is important to carefully consider your shard key choice to avoid scalability and perfomance issues.虽然您可以稍后对集合进行重新硬存储,但重要的是要仔细考虑分片键的选择,以避免可伸缩性和性能问题。
To understand the operational requirements and restrictions for sharding your collection, see Operational Restrictions in Sharded Clusters.要了解对集合进行分片的操作要求和限制,请参阅分片集群中的操作限制。
If queries do not include the shard key or the prefix of a compound shard key, 如果查询不包括分片键或复合分片键的前缀,mongos
performs a broadcast operation, querying all shards in the sharded cluster. mongos
将执行广播操作,查询分片集群中的所有分片。These scatter/gather queries can be long running operations.这些分散/聚集查询可以是长时间运行的操作。
Starting in MongoDB 5.1, when starting, restarting or adding a shard server with 从MongoDB 5.1开始,在使用shard server with sh.addShard()
the Cluster Wide Write Concern (CWWC) must be set.sh.addShard()
启动、重新启动或添加shard服务器时,必须设置集群范围的写问题(CWWC)。
If the 如果未设置CWWC
is not set and the shard is configured such that the default write concern is { w : 1 }
the shard server will fail to start or be added and returns an error.CWWC
,并且已将分片配置为默认写入关注点为{w:1}
,则分片服务器将无法启动或添加,并返回错误。
See default write concern calculations for details on how the default write concern is calculated.有关如何计算默认写入关注点的详细信息,请参阅默认写入关注点计算。
If you have an active support contract with MongoDB, consider contacting your account representative for assistance with sharded cluster planning and deployment.如果您与MongoDB签订了有效的支持合同,请考虑联系您的客户代表,以获得分片集群规划和部署方面的帮助。
A database can have a mixture of sharded and unsharded collections. 数据库可以混合使用分片集合和非分片集合。Sharded collections are partitioned and distributed across the shards in the cluster. 分片集合被分区并分布在集群中的分片上。Unsharded collections are stored on a primary shard. 未分片的集合存储在主分片上。Each database has its own primary shard.每个数据库都有自己的主分片。
You must connect to a mongos router to interact with any collection in the sharded cluster. 您必须连接到mongos
路由器才能与分片集群中的任何集合进行交互。This includes sharded and unsharded collections. 这包括分片集合和非分片集合。Clients should never connect to a single shard in order to perform read or write operations.客户端绝不应该连接到一个分片来执行读或写操作。
You can connect to a 使用mongos
the same way you connect to a mongod
using the mongosh
or a MongoDB driver.mongosh
或MongoDB驱动程序连接mongod
的方式与连接mongos
的方式相同。
MongoDB supports two sharding strategies for distributing data across sharded clusters.MongoDB支持两种分片策略,用于在分片集群之间分发数据。
Hashed Sharding involves computing a hash of the shard key field's value. 哈希分片涉及计算分片键字段值的哈希。Each chunk is then assigned a range based on the hashed shard key values.然后根据散列的分片键值为每个区块分配一个范围。
MongoDB automatically computes the hashes when resolving queries using hashed indexes. 当使用散列索引解析查询时,MongoDB会自动计算散列。Applications do not need to compute hashes.应用程序不需要计算哈希。
While a range of shard keys may be "close", their hashed values are unlikely to be on the same chunk. 虽然一系列分片键可能“接近”,但它们的散列值不太可能在同一块上。Data distribution based on hashed values facilitates more even data distribution, especially in data sets where the shard key changes monotonically.基于散列值的数据分布有助于更均匀的数据分布,尤其是在分片键单调变化的数据集中。
However, hashed distribution means that range-based queries on the shard key are less likely to target a single shard, resulting in more cluster wide broadcast operations然而,散列分布意味着基于分片键的范围查询不太可能以单个分片为目标,从而导致更多集群范围的广播操作
See Hashed Sharding for more information.有关更多信息,请参阅哈希分片。
Ranged sharding involves dividing data into ranges based on the shard key values. 范围分片涉及根据分片键值将数据划分为范围。Each chunk is then assigned a range based on the shard key values.然后根据分片键值为每个区块分配一个范围。
A range of shard keys whose values are "close" are more likely to reside on the same chunk. 值为“close”的一系列分片键更有可能位于同一块上。This allows for targeted operations as a 这允许有针对性的操作,因为mongos
can route the operations to only the shards that contain the required data.mongos
只能将操作路由到包含所需数据的分片。
The efficiency of ranged sharding depends on the shard key chosen. 远程分片的效率取决于选择的分片键。Poorly considered shard keys can result in uneven distribution of data, which can negate some benefits of sharding or can cause performance bottlenecks. 考虑不周的分片键可能会导致数据分布不均匀,这可能会抵消分片的某些好处,或导致性能瓶颈。See shard key selection for range-based sharding.有关基于范围的分片,请参阅分片关键点选择。
See Ranged Sharding for more information.有关更多信息,请参阅远程分片。
Zones can help improve the locality of data for sharded clusters that span multiple data centers.分区可以帮助改善跨多个数据中心的分片集群的数据位置。
In sharded clusters, you can create zones of sharded data based on the shard key. 在分片集群中,可以基于分片键创建分片数据区域。You can associate each zone with one or more shards in the cluster. 可以将每个分区与群集中的一个或多个分片相关联。A shard can associate with any number of zones. 分片可以与任意数量的区域关联。In a balanced cluster, MongoDB migrates chunks covered by a zone only to those shards associated with the zone.在平衡集群中,MongoDB只将区域覆盖的块迁移到与该区域相关联的分片。
Each zone covers one or more ranges of shard key values. 每个分区包含一个或多个分片键值范围。Each range a zone covers is always inclusive of its lower boundary and exclusive of its upper boundary.区域覆盖的每个范围始终包括其下边界,不包括其上边界。
You must use fields contained in the shard key when defining a new range for a zone to cover. 定义分区要覆盖的新范围时,必须使用分片键中包含的字段。If using a compound shard key, the range must include the prefix of the shard key. 如果使用复合分片键,则范围必须包括分片键的前缀。See shard keys in zones for more information.有关详细信息,请参阅区域中的分片键。
The possible use of zones in the future should be taken into consideration when choosing a shard key.在选择分片键时,应考虑将来可能使用的分区。
Starting in MongoDB 4.0.3, setting up zones and zone ranges beforeyou shard an empty or a non-existing collection allows for a faster setup of zoned sharding.从MongoDB 4.0.3开始,在分片空的或不存在的集合之前设置区域和区域范围,可以更快地设置分区分片。
Use the 将shardCollection
command with the collation : { locale : "simple" }
option to shard a collection which has a default collation. shardCollection
命令与collation : { locale : "simple" }
选项一起使用,对具有默认排序规则的集合进行分片。Successful sharding requires that:成功的分片需要:
{ locale: "simple" }
{ locale: "simple" }
When creating new collections with a collation, ensure these conditions are met prior to sharding the collection.使用排序规则创建新集合时,请确保在分片集合之前满足这些条件。
Queries on the sharded collection continue to use the default collation configured for the collection. 对分片集合的查询继续使用为集合配置的默认排序规则。To use the shard key index's 要使用分片键索引的简单排序规则,请在查询的排序规则文档中指定simple
collation, specify {locale : "simple"}
in the query's collation document.{locale : "simple"}
。
See 有关分片和排序的更多信息,请参阅shardCollection
for more information about sharding and collation.shardCollection
。
Starting in MongoDB 3.6, change streams are available for replica sets and sharded clusters. 从MongoDB 3.6开始,更改流可用于副本集和分片集群。Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. 更改流允许应用程序访问实时数据更改,而不必承担跟踪oplog的复杂性和风险。Applications can use change streams to subscribe to all data changes on a collection or collections.应用程序可以使用更改流订阅一个或多个集合上的所有数据更改。
Starting in MongoDB 4.2, with the introduction of distributed transactions, multi-document transactions are available on sharded clusters.从MongoDB 4.2开始,随着分布式事务的引入,多文档事务在分片集群上可用。
Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.在事务提交之前,事务中所做的数据更改在事务外部是不可见的。
However, when a transaction writes to multiple shards, not all outside read operations need to wait for the result of the committed transaction to be visible across the shards. 但是,当事务写入多个分片时,并非所有外部读取操作都需要等待提交的事务的结果在分片中可见。For example, if a transaction is committed and write 1 is visible on shard A but write 2 is not yet visible on shard B, an outside read at read concern 例如,如果事务已提交,且写入1在分片a上可见,但写入2在分片B上尚不可见,则外部读取-读取关注点"local"
can read the results of write 1 without seeing write 2."local"
可以读取写入1的结果,而不查看写入2。
For more information on how sharding works with aggregations, read the sharding chapter in the Practical MongoDB Aggregations e-book.有关分片如何与聚合一起工作的更多信息,请阅读实用MongoDB聚合电子书中的分片一章。