Note
Aggregation Pipeline as an Alternative to Map-Reduce聚合管道作为Map-Reduce的替代方案
Starting in MongoDB 5.0, map-reduce is deprecated:从MongoDB 5.0开始,map-reduce被弃用:
Instead of map-reduce, you should use an aggregation pipeline. Aggregation pipelines provide better performance and usability than map-reduce.您应该使用聚合管道而不是map-reduce。聚合管道提供了比map-reduce更好的性能和可用性。You can rewrite map-reduce operations using aggregation pipeline stages, such as您可以使用聚合管道阶段(如$group,$merge, and others.$group、$merge等)重写map-reduce操作。For map-reduce operations that require custom functionality, you can use the对于需要自定义功能的map-reduce操作,可以使用$accumulatorand$functionaggregation operators. You can use those operators to define custom aggregation expressions in JavaScript.$accumulator和$function聚合运算符。您可以使用这些运算符在JavaScript中定义自定义聚合表达式。
For examples of aggregation pipeline alternatives to map-reduce, see:有关map-reduce的聚合管道替代方案的示例,请参阅:
Map-reduce supports operations on sharded collections, both as an input and as an output. This section describes the behaviors of Map-reduce支持对分片集合进行操作,既可以作为输入,也可以作为输出。本节介绍特定于分片集合的mapReduce specific to sharded collections.mapReduce行为。
Sharded Collection as Input分片集合作为输入
When using sharded collection as the input for a map-reduce operation, 当使用分片集合作为map-reduce操作的输入时,mongos will automatically dispatch the map-reduce job to each shard in parallel. mongos将自动并行地将map-reduce作业分派给每个分片。There is no special option required. 不需要特殊选项。mongos will wait for jobs on all shards to finish.mongos将等待所有分片上的作业完成。
Sharded Collection as Output分片集合作为输出
If the 如果out field for mapReduce has the sharded value, MongoDB shards the output collection using the _id field as the shard key.mapReduce的out字段具有sharded值,MongoDB将使用_id字段作为分片键对输出集合进行分片。
To output to a sharded collection:要输出到分片集合,请执行以下操作:
If the output collection does not exist, create the sharded collection first.如果输出集合不存在,请先创建分片集合。If the output collection already exists but is not sharded, map-reduce fails.如果输出集合已存在但未分片,则map-reduce将失败。For a new or an empty sharded collection, MongoDB uses the results of the first stage of the map-reduce operation to create the initial chunks distributed among the shards.对于新的或空的分片集合,MongoDB使用map-reduce操作第一阶段的结果来创建分布在分片之间的初始块。mongosdispatches, in parallel, a map-reduce post-processing job to every shard that owns a chunk. During the post-processing, each shard will pull the results for its own chunks from the other shards, run the final reduce/finalize, and write locally to the output collection.mongos同时向每个拥有块的分片分派一个map-reduce后处理作业。在后处理过程中,每个分片将从其他分片中提取自己块的结果,运行最终的reduce/finalize,并在本地写入输出集合。
Note
During later map-reduce jobs, MongoDB splits chunks as needed.在后续的map-reduce作业中,MongoDB会根据需要拆分块。Balancing of chunks for the output collection is automatically prevented during post-processing to avoid concurrency issues.在后处理过程中,会自动阻止输出集合的块平衡,以避免并发问题。