Map-Reduce and Sharded CollectionsMap Reduce和分片集合

~~On this page~~本页内容

~~Sharded Collection as Input~~作为输入的分片集合
~~Sharded Collection as Output~~作为输出的分片集合

Note

Aggregation Pipeline as an Alternative to Map-Reduce聚合管道作为Map Reduce的替代方案

~~Starting in MongoDB 5.0, map-reduce is deprecated:~~从MongoDB 5.0开始，不赞成使用map-reduce：

Instead of map-reduce, you should use an aggregation pipeline. Aggregation pipelines provide better performance and usability than map-reduce.
You can rewrite map-reduce operations using aggregation pipeline stages, such as $group, $merge, and others.
For map-reduce operations that require custom functionality, you can use the $accumulator and $function aggregation operators, available starting in version 4.4. You can use those operators to define custom aggregation expressions in JavaScript.

~~For examples of aggregation pipeline alternatives to map-reduce, see:~~有关映射减少的聚合管道替代方案的示例，请参阅：

~~Map-reduce supports operations on sharded collections, both as an input and as an output.~~ Map reduce支持对分片集合的操作，既可以作为输入，也可以作为输出。~~This section describes the behaviors of mapReduce specific to sharded collections.~~本节介绍mapReduce特定于分片集合的行为。

~~However, starting in version 4.2, MongoDB deprecates the map-reduce option to create a new sharded collection as well as the use of the sharded option for map-reduce.~~ 然而，从4.2版本开始，MongoDB反对使用map-reduce选项来创建新的分片集合，并放弃了map-reduce的sharded选项。~~To output to a sharded collection, create the sharded collection first. MongoDB 4.2 also deprecates the replacement of an existing sharded collection.~~若要输出到分片集合，请首先创建分片集合。MongoDB 4.2也反对替换现有的分片集合。

Sharded Collection as Input作为输入的分片集合

~~When using sharded collection as the input for a map-reduce operation, mongos will automatically dispatch the map-reduce job to each shard in parallel.~~ 当使用分片 collection作为map reduce操作的输入时，mongos会自动将map reduce作业并行分配给每个分片。~~There is no special option required.~~ 不需要特殊选项。~~mongos will wait for jobs on all shards to finish.~~mongos将等待所有分片上的作业完成。

Sharded Collection as Output作为输出的分片集合

~~If the out field for mapReduce has the sharded value, MongoDB shards the output collection using the _id field as the shard key.~~如果mapReduce的out字段具有分片值，MongoDB将使用_id字段作为分片键来对输出集合进行分片。

Note

~~Starting in version 4.2, MongoDB deprecates the use of the sharded option for mapReduce/db.collection.mapReduce().~~从4.2版本开始，MongoDB反对使用mapReduce/db.collection.mapReduce()的sharded选项。

~~To output to a sharded collection:~~要输出到分片集合，请执行以下操作：

~~If the output collection does not exist, create the sharded collection first.~~如果输出集合不存在，请先创建分片集合。

~~Starting in version 4.2, MongoDB deprecates the map-reduce option to create a new sharded collection and the use of the sharded option for map-reduce.~~ 从4.2版本开始，MongoDB放弃了使用map-reduce选项来创建新的分片集合，并放弃了map-reduce的sharded选项。~~As such, to output to a sharded collection, create the sharded collection first.~~因此，要输出到分片集合，请首先创建分片集合。

~~If you did not create the sharded collection first, MongoDB creates and shards the collection on the _id field.~~ 如果您没有首先创建分片集合，MongoDB会在_id字段上创建并分片集合。~~However, it is recommended that you create the sharded collection first.~~但是，建议您先创建分片集合。
~~Starting in version 4.2, MongoDB deprecates the replacement of an existing sharded collection.~~从4.2版本开始，MongoDB反对替换现有的分片集合。
~~Starting in version 4.0, if the output collection already exists but is not sharded, map-reduce fails.~~从4.0版本开始，如果输出集合已经存在但未进行分片，则map reduce将失败。
~~For a new or an empty sharded collection, MongoDB uses the results of the first stage of the map-reduce operation to create the initial chunks distributed among the shards.~~对于新的或空的分片集合，MongoDB使用map reduce操作的第一阶段的结果来创建分布在分片之间的初始区块。
mongos dispatches, in parallel, a map-reduce post-processing job to every shard that owns a chunk. During the post-processing, each shard will pull the results for its own chunks from the other shards, run the final reduce/finalize, and write locally to the output collection.mongos并行地向每个拥有区块的分片分派一个map-reduce后处理作业。在后处理过程中，每个分片将从其他分片中提取其自己的区块的结果，运行最终的reduce/finalize，并在本地写入输出集合。

Note

~~During later map-reduce jobs, MongoDB splits chunks as needed.~~在以后的map reduce作业中，MongoDB会根据需要拆分块。
~~Balancing of chunks for the output collection is automatically prevented during post-processing to avoid concurrency issues.~~在后处理过程中，会自动阻止输出集合的块平衡，以避免并发问题。

← Map-Reduce Map-Reduce Concurrency →