Database Manual / Aggregation Operations / Map-Reduce

Map-Reduce and Sharded Collections地图缩减和分片化集合

Note

Aggregation Pipeline as an Alternative to Map-Reduce聚合管道作为Map-Reduce的替代方案

~~Starting in MongoDB 5.0, map-reduce is deprecated:~~从MongoDB 5.0开始，map-reduce被弃用：

~~Instead of map-reduce, you should use an aggregation pipeline. Aggregation pipelines provide better performance and usability than map-reduce.~~您应该使用聚合管道而不是map-reduce。聚合管道提供了比map-reduce更好的性能和可用性。
~~You can rewrite map-reduce operations using aggregation pipeline stages, such as $group, $merge, and others.~~您可以使用聚合管道阶段（如$group、$merge等）重写map-reduce操作。
For map-reduce operations that require custom functionality, you can use the $accumulator and $function aggregation operators. You can use those operators to define custom aggregation expressions in JavaScript.对于需要自定义功能的map-reduce操作，可以使用$accumulator和$function聚合运算符。您可以使用这些运算符在JavaScript中定义自定义聚合表达式。

~~For examples of aggregation pipeline alternatives to map-reduce, see:~~有关map-reduce的聚合管道替代方案的示例，请参阅：

~~Map-reduce supports operations on sharded collections, both as an input and as an output. This section describes the behaviors of mapReduce specific to sharded collections.~~Map-reduce支持对分片集合进行操作，既可以作为输入，也可以作为输出。本节介绍特定于分片集合的mapReduce行为。

Sharded Collection as Input分片集合作为输入

~~When using sharded collection as the input for a map-reduce operation, mongos will automatically dispatch the map-reduce job to each shard in parallel.~~ 当使用分片集合作为map-reduce操作的输入时，mongos将自动并行地将map-reduce作业分派给每个分片。~~There is no special option required. mongos will wait for jobs on all shards to finish.~~不需要特殊选项。mongos将等待所有分片上的作业完成。

Sharded Collection as Output分片集合作为输出

~~If the out field for mapReduce has the sharded value, MongoDB shards the output collection using the _id field as the shard key.~~如果mapReduce的out字段具有sharded值，MongoDB将使用_id字段作为分片键对输出集合进行分片。

~~To output to a sharded collection:~~要输出到分片集合，请执行以下操作：

~~If the output collection does not exist, create the sharded collection first.~~如果输出集合不存在，请先创建分片集合。
~~If the output collection already exists but is not sharded, map-reduce fails.~~如果输出集合已存在但未分片，则map-reduce将失败。
~~For a new or an empty sharded collection, MongoDB uses the results of the first stage of the map-reduce operation to create the initial chunks distributed among the shards.~~对于新的或空的分片集合，MongoDB使用map-reduce操作第一阶段的结果来创建分布在分片之间的初始块。
mongos dispatches, in parallel, a map-reduce post-processing job to every shard that owns a chunk. During the post-processing, each shard will pull the results for its own chunks from the other shards, run the final reduce/finalize, and write locally to the output collection.mongos同时向每个拥有块的分片分派一个map-reduce后处理作业。在后处理过程中，每个分片将从其他分片中提取自己块的结果，运行最终的reduce/finalize，并在本地写入输出集合。

Note

~~During later map-reduce jobs, MongoDB splits chunks as needed.~~在后续的map-reduce作业中，MongoDB会根据需要拆分块。
~~Balancing of chunks for the output collection is automatically prevented during post-processing to avoid concurrency issues.~~在后处理过程中，会自动阻止输出集合的块平衡，以避免并发问题。

Back

Map-Reduce

~~Concurrency~~并发