Database Manual / Sharding / Sharded Cluster Components

Routing with mongosmongos路由

MongoDB mongos instances route queries and write operations to shards in a sharded cluster. MongoDB mongos实例将查询和写入操作路由到分片集群中的分片。mongos provides the only interface to a sharded cluster from the perspective of applications. Applications never connect or communicate directly with the shards.从应用程序的角度来看,mongos提供了分片集群的唯一接口。应用程序从不直接与分片连接或通信。

The mongos tracks what data is on which shard by caching the metadata from the config servers. mongos通过缓存配置服务器的元数据来跟踪哪个分片上的数据。The mongos uses the metadata to route operations from applications and clients to the mongod instances. mongos使用元数据将操作从应用程序和客户端路由到mongod实例。A mongos has no persistent state and consumes minimal system resources.mongos没有持久状态,消耗的系统资源最少。

The most common practice is to run mongos instances on the same systems as your application servers, but you can maintain mongos instances on the shards or on other dedicated resources. 最常见的做法是在与应用程序服务器相同的系统上运行mongos实例,但您可以在分片或其他专用资源上维护mongos示例。See also Number of mongos and Distribution.另请参见mongos的数量和分布

Routing And Results Process路线和结果流程

A mongos instance routes a query to a cluster by:mongos实例通过以下方式将查询路由到集群

  1. Determining the list of shards that must receive the query.确定必须接收查询的分片列表。
  2. Establishing a cursor on all targeted shards.在所有目标分片上建立游标。

The mongos then merges the data from each of the targeted shards and returns the result document. Certain query modifiers, such as sorting, are performed on each shard before mongos retrieves the results.然后,mongos合并来自每个目标分片的数据,并返回结果文档。在mongos检索结果之前,对每个分片执行某些查询修饰符,如排序

Aggregation operations running on multiple shards may route results back to the mongos to merge results if they don't need to run on the database's primary shard.在多个分片上运行的聚合操作可能会将结果路由回mongos,以便在不需要在数据库的primary分片上执行的情况下合并结果。

There are two cases in which a pipeline is ineligible to run on mongos.有两种情况下,管道没有资格在mongos上运行。

The first case occurs when the merge part of the split pipeline contains a stage which must run on a specific shard. For instance, if $lookup requires access to an unsharded collection in the same database as the sharded collection on which the aggregation is running, the merge runs on the shard that hosts the unsharded collection.第一种情况发生在拆分管道的合并部分包含必须在特定分片上运行的阶段时。例如,如果$lookup需要访问与运行聚合的分片集合位于同一数据库中的未分片集合,则合并将在承载未分片集合的分片上运行。

The second case occurs when the merge part of the split pipeline contains a stage which may write temporary data to disk, such as $group, and the client has specified allowDiskUse:true. 第二种情况发生在拆分管道的合并部分包含一个可能将临时数据写入磁盘的阶段(如$group),并且客户端已指定allowDiskUse:true时。In this case, assuming that there are no other stages in the merge pipeline which require the primary shard, the merge runs on a randomly-selected shard in the set of shards targeted by the aggregation.在这种情况下,假设合并管道中没有其他需要主分片的阶段,合并将在聚合目标分片集中随机选择的一个分片上运行。

For more information on how the work of aggregation is split among components of a sharded cluster query, use explain:true as a parameter to the aggregate() call. The return includes three JSON objects:有关如何在分片集群查询的组件之间分割聚合工作的更多信息,请使用explain:true作为aggregate()调用的参数。返回包含三个JSON对象:

  • mergeType shows where the stage of the merge happens ("anyShard", "specificShard", or "router"). When mergeType is specificShard, the aggregate output includes a mergeShard property that contains the shard ID of the merging shard.
  • splitPipeline shows which operations in your pipeline have run on individual shards.
  • shards shows the work each shard has done.

In some cases, when the shard key or a prefix of the shard key is a part of the query, the mongos performs a targeted operation, routing queries to a subset of shards in the cluster.

mongos performs a broadcast operation for queries that do not include the shard key, routing queries to all shards in the cluster. Some queries that do include the shard key may still result in a broadcast operation depending on the distribution of data in the cluster and the selectivity of the query.

See Targeted Operations vs. Broadcast Operations for more on targeted and broadcast operations.

How mongos Handles Query Modifiersmongos如何处理查询修饰符

Sorting排序

If the result of the query is not sorted, the mongos instance opens a result cursor that "round robins" results from all cursors on the shards.

Limits限制

If the query limits the size of the result set using the limit() cursor method, the mongos instance passes that limit to the shards and then re-applies the limit to the result before returning the result to the client.

Skips跳过

If the query specifies a number of records to skip using the skip() cursor method, the mongos cannot pass the skip to the shards, but rather retrieves unskipped results from the shards and skips the appropriate number of documents when assembling the complete result.

When used in conjunction with a limit(), the mongos passes the limit plus the value of the skip() to the shards to improve the efficiency of these operations.

Read Preference and Shards

For sharded clusters, mongos applies the read preference when reading from the shards. The member selected is governed by both the read preference and replication.localPingThresholdMs settings, and is re-evaluated for each operation.

For details on read preference and sharded clusters, see Read Preference and Shards.

Confirm Connection to mongos Instances

To detect if the MongoDB instance that your client is connected to is mongos, use the hello command. When a client connects to a mongos, hello returns a document with a msg field that holds the string isdbgrid. For example:

{
"isWritablePrimary" : true,
"msg" : "isdbgrid",
"maxBsonObjectSize" : 16777216,
"ok" : 1,
...
}

If the application is instead connected to a mongod, the returned document does not include the isdbgrid string.

Targeted Operations vs. Broadcast Operations

Generally, the fastest queries in a sharded environment are those that mongos route to a single shard, using the shard key and the cluster meta data from the config server. These targeted operations use the shard key value to locate the shard or subset of shards that satisfy the query document.

For queries that don't include the shard key, mongos must query all shards, wait for their responses and then return the result to the application. These "scatter/gather" queries can be long running operations.

Broadcast Operations

mongos instances broadcast queries to all shards for the collection unless the mongos can determine which shard or subset of shards stores this data.

Read operations to a sharded cluster. Query criteria does not include the shard key. The query router ``mongos`` must broadcast query to all shards for the collection.

After the mongos receives responses from all shards, it merges the data and returns the result document. The performance of a broadcast operation depends on the overall load of the cluster, as well as variables like network latency, individual shard load, and number of documents returned per shard. Whenever possible, favor operations that result in targeted operation over those that result in a broadcast operation.

Multi-update operations are always broadcast operations.

The updateMany() and deleteMany() methods are broadcast operations, unless the query document specifies the shard key in full.

Targeted Operations

mongos can route queries that include the shard key or the prefix of a compound shard key a specific shard or set of shards. mongos uses the shard key value to locate the chunk whose range includes the shard key value and directs the query at the shard containing that chunk.

Read operations to a sharded cluster. Query criteria includes the shard key. The query router ``mongos`` can target the query to the appropriate shard or shards.

For example, if the shard key is:

{ a: 1, b: 1, c: 1 }

The mongos program can route queries that include the full shard key or either of the following shard key prefixes at a specific shard or set of shards:

{ a: 1 }
{ a: 1, b: 1 }

All insertOne() operations target to one shard. Each document in the insertMany() array targets to a single shard, but there is no guarantee all documents in the array insert into a single shard.

All updateOne(), replaceOne() and deleteOne() operations must include the shard key or _id in the query document. MongoDB returns an error if these methods are used without the shard key or _id.

Depending on the distribution of data in the cluster and the selectivity of the query, mongos may still perform a broadcast operation to fulfill these queries.

Index Use

When a shard receives a query, it uses the most efficient index available to fulfill that query. The index used may be either the shard key index or another eligible index present on the shard.

Sharded Cluster Security

Use Self-Managed Internal/Membership Authentication to enforce intra-cluster security and prevent unauthorized cluster components from accessing the cluster. You must start each mongod or mongos in the cluster with the appropriate security settings in order to enforce internal authentication.

Starting in MongoDB 5.3, SCRAM-SHA-1 cannot be used for intra-cluster authentication. Only SCRAM-SHA-256 is supported.

In previous MongoDB versions, SCRAM-SHA-1 and SCRAM-SHA-256 can both be used for intra-cluster authentication, even if SCRAM is not explicitly enabled.

See Deploy Self-Managed Sharded Cluster with Keyfile Authentication for a tutorial on deploying a secured sharded cluster.

Cluster Users

Sharded clusters support Role-Based Access Control in Self-Managed Deployments (RBAC) for restricting unauthorized access to cluster data and operations. You must start each mongod in the cluster, including the config servers, with the --auth option in order to enforce RBAC. Alternatively, enforcing Self-Managed Internal/Membership Authentication for inter-cluster security also enables user access controls via RBAC.

With RBAC enforced, clients must specify a --username, --password, and --authenticationDatabase when connecting to the mongos in order to access cluster resources.

Each cluster has its own cluster users. These users cannot be used to access individual shards.

See Enable Access Control on Self-Managed Deployments for a tutorial on enabling adding users to an RBAC-enabled MongoDB deployment.

Metadata Operations

mongos uses "majority" write concern for the following operations that affect the sharded cluster metadata:

CommandMethod
addShardsh.addShard()
createdb.createCollection()
dropdb.collection.drop()
dropDatabasedb.dropDatabase()
enableShardingsh.enableSharding()
movePrimary
renameCollectiondb.collection.renameCollection()
shardCollectionsh.shardCollection()
removeShard
setFeatureCompatibilityVersion

Additional Information附加信息

FCV Compatibility

The mongos binary cannot connect to mongod instances whose feature compatibility version (FCV) is greater than that of the mongos. For example, you cannot connect a MongoDB 4.0 version mongos to a 4.2 sharded cluster with FCV set to 4.2. You can, however, connect a MongoDB 4.0 version mongos to a 4.2 sharded cluster with FCV set to 4.0.

Full Time Diagnostic Data Capture Requirements

mongod includes a Full Time Diagnostic Data Capture mechanism to assist MongoDB engineers with troubleshooting deployments. If this thread fails, it terminates the originating process. To avoid the most common failures, confirm that the user running the process has permissions to create the FTDC diagnostic.data directory. For mongod the directory is within storage.dbPath. For mongos it is parallel to systemLog.path.

Connection Pools

Starting in MongoDB 4.2, MongoDB adds the parameter ShardingTaskExecutorPoolReplicaSetMatching. This parameter determines the minimum size of the mongod / mongos instance's connection pool to each member of the sharded cluster. This value can vary during runtime.

mongod and mongos maintain connection pools to each replica set secondary for every replica set in the sharded cluster. By default, these pools have a number of connections that is at least the number of connections to the primary.

To modify, see ShardingTaskExecutorPoolReplicaSetMatching.

Using Aggregation Pipelines with Clusters

For more information on how sharding works with aggregations, read the sharding chapter in the Practical MongoDB Aggregations e-book.