How to Run MongoDB Map-Reduce Jobs如何运行MongoDB Map Reduce作业
Posted on: 13/07/2018 (last updated: 04/08/2021) by Thomas Zahn
In this post, we’ll show you the simplest way of writing, debugging, and running MongoDB map-reduce jobs using Studio 3T’s Map-Reduce screen.在本文中,我们将向您展示使用Studio 3T的map reduce屏幕编写、调试和运行MongoDB map reduce作业的最简单方法。
Don’t have Studio 3T on your machine? Download it here, available for Windows, Mac, and Linux.你的机器上没有Studio 3T吗?在这里下载,适用于Windows、Mac和Linux。
MongoDB Map-Reduce vs Aggregation PipelineMongoDB Map Reduce vs Aggregation Pipeline
MongoDB’s Map-Reduce is the flexible cousin of the Aggregation Pipeline.MongoDB的Map Reduce是聚合管道的灵活近亲。
In general, it works by taking the data through two stages:一般来说,它通过两个阶段获取数据来工作:
a map stage that processes each document and emits one or more objects for each input document处理每个文档并为每个输入文档发送一个或多个对象的map
阶段a reduce stage that combines emitted objects from the output of the map operation将贴图操作的输出中发射的对象组合在一起的reduce
阶段
The main advantage over the Aggregation Pipeline is that Map-Reduce may use arbitrary JavaScript for each stage enabling otherwise impossible operations though at the expense of lower performance (potentially higher execution times). 与聚合管道相比,Map Reduce的主要优势在于,Map Reduce可能会在每个阶段使用任意JavaScript,以降低性能(可能会增加执行时间)为代价,实现本来不可能的操作。You can read more about it in MongoDB’s reference documentation.您可以在MongoDB的参考文档中了解更多信息。
MongoDB recommends the Aggregation Pipeline for most aggregation options. MongoDB为大多数聚合选项推荐聚合管道。As an alternative to Map-Reduce, please also check out Aggregation Editor, Studio 3T’s MongoDB aggregation query builder.作为Map-Reduce的替代方案,还请查看Studio 3T的MongoDB聚合查询生成器聚合编辑器。
{ "_id" : 592341, "tags" : [ "cats", "kittens", "travel" ] }
A map-reduce examplemap-reduce示例
In this example, our objective is to group images by tag except for those which include the “work” tag.在本例中,目标是按标签对图像进行分组,除了那些包含“工作”标签的图像。
To achieve this, we will need to write a Map-Reduce job that will:为了实现这一点,我们需要编写一个Map Reduce作业,该作业将:
Exclude all images which include the “work” tag.排除所有包含“工作”标签的图像。Have the让map()
function emit the image id for each of the tags as key.map()
函数将每个标记的图像id作为键发出。Have the让reduce()
function combine the image ids for each tag.reduce()
函数组合每个标记的图像ID。
Let us start by opening Studio 3T’s new Map-Reduce screen by selecting the Open Map-Reduce option from the context menu:首先,我们从上下文菜单中选择“打开Map-Reduce”选项,打开Studio 3T的新Map-Reduce屏幕:

Filtering the input data筛选输入数据
Clicking on the “Input data” tab and then the “Preview Input” toolbar button shows us a preview of the collection data. It is here that we can shape the data fed into the Map-Reduce job and omit any image tagged “work”. 单击“输入数据”选项卡,然后单击“预览输入”工具栏按钮,向我们显示采集数据的预览。正是在这里,我们可以对输入Map Reduce作业的数据进行塑形,并省略任何标记为“work”的图像。This is achieved by the following query这是通过以下查询实现的
{ "tags": { $ne: "work" } }
We can inspect the data that will be fed into the map function by clicking the “Preview Output” toolbar button.我们可以通过单击“预览输出”工具栏按钮来检查将输入地图功能的数据。

map() functionmap()
函数
For the second step, we move to the “map()” tab.第二步,我们转到“map()
”选项卡。
In this tab we want to specify the function responsible for emitting one or more key-pairs for each document. 在此选项卡中,我们希望指定负责为每个文档发送一个或多个密钥对的函数。The following function gets the job done:以下函数用于完成任务:
function () { for (var index in this.tags) { emit ( this.tags[index] , this._id ); } }
We can sample the 我们可以通过单击预览按钮对map()
function’s output by clicking the preview button, verifying that this function was successful. map()
函数的输出进行采样,以验证该函数是否成功。The preview feature is extremely useful, in particular before submitting jobs that could take hours to run. 预览功能非常有用,尤其是在提交可能需要几个小时才能运行的作业之前。The “map() sample output” tab gives us a detailed breakdown of how our “map()
function operates, showing the emitted key/value pairs as well as their original document _id
.map()
范例输出”选项卡为我们提供了map()
函数运行方式的详细分解,显示了发出的键/值对及其原始文档id。

reduce() functionreduce()
函数
Studio 3T’s default implementation of the Studio 3T的reduce()
function takes care of the rest:reduce()
函数的默认实现负责剩下的部分:
function (key, values) { var reducedValue = "" + values; return reducedValue; }
Again, the Preview Output toolbar button will let us verify that our function is successful. 同样,“预览输出”工具栏按钮将让我们验证功能是否成功。If we were writing a more complex 如果我们正在编写一个更复杂的reduce()
function or trying to debug what was being fed in, we could sample the input by clicking on the Preview Input button. reduce()
函数,或者试图调试输入的内容,我们可以通过单击“预览输入”按钮对输入进行采样。This gives us a few of the key-value pairs that are emitted and then reduced.这为我们提供了一些发出然后减少的键值对。

finalize() functionfinalize()
函数
MongoDB allows for a final stage to a Map-Reduce job for doing some final processing with use of a MongoDB允许Map Reduce作业的最后一个阶段使用finalize()
function. finalize()
函数进行一些最终处理。Let’s use this just so the output is easier to read:让我们使用它,以便输出更易于阅读:
function (key, reducedValue) { var finalValue = "tag '" + key + "' was found in images: " + reducedValue; return finalValue; }
A quick inspection of 快速检查finalize()
’s sample output and we are ready to submit a job that will process all of the data.finalize()
的样本输出,我们准备提交一份处理所有数据的工作。

Running the map-reduce job运行map-reduce作业
Now that we have set all the parameters of the job, and are sure that all our functions run as intended, we can submit the Map-Reduce job to run through the whole collection dataset by clicking the “Execute” button on the toolbar.现在我们已经设置了作业的所有参数,并确保所有功能都按预期运行,我们可以通过单击工具栏上的“执行”按钮提交Map-Reduce作业,以在整个集合数据集中运行。
This action will open a new tab which will contain the results of the job when it is finished:此操作将打开一个新选项卡,其中包含作业完成后的结果:

Clicking on 单击“显示详细信息”将弹出一个对话框,显示此作业的执行统计信息以及配置摘要。Show details
will bring up a dialog showing execution statistics as well as a configuration summary for this job.

Epilogue后记
Now that the Map-Reduce job is finished, we can save all this work as a script. 现在Map Reduce作业已经完成,我们可以将所有这些工作保存为脚本。The format is 100% JavaScript code, which allows the saved file to be run in IntelliShell or even the basic mongo shell and will produce identical results.该格式为100%JavaScript代码,允许保存的文件在IntelliShell甚至基本mongo shell中运行,并将产生相同的结果。
// *** 3T Software Labs, MongoChef: MapReduce Job **** // Variable for db var __3t_mongochef_db = "exam"; // Variable for map var __3t_mongochef_map = function () { for (var index in this.tags) { emit ( this.tags[index] , this._id ); } } ; // Variable for reduce var __3t_mongochef_reduce = function (key, values) { var reducedValue = "" + values; return reducedValue; }; // Variable for finalize var __3t_mongochef_finalize = function (key, reducedValue) { var finalValue = "tag '" + key + "' was found in images: " + reducedValue; return finalValue; } ; db.runCommand({ mapReduce: "images", map: __3t_mongochef_map, reduce: __3t_mongochef_reduce, finalize: __3t_mongochef_finalize, out: { "inline" : 1}, query: { "tags": { $ne: "work" } }, sort: { }, inputDB: "exam", });
Do you have an existing script that you’ve been working with already? 你有一个已经在使用的脚本吗?No problem, Studio 3T will load it into the Map-Reduce screen, just click on the “Open Map-Reduce File” toolbar button, select the file and there you have it!没问题,Studio 3T会将其加载到Map Reduce屏幕,只需单击“打开Map Reduce文件”工具栏按钮,选择文件,就可以了!
Once you’re done running MongoDB map-reduce jobs, why not keep the momentum by learning how to build MongoDB aggregation queries, and discover our MongoDB shell integration, IntelliShell.一旦运行完MongoDB map reduce jobs,为什么不学习如何构建MongoDB聚合查询,并探索MongoDB外壳集成IntelliShell来保持势头呢。