Map-Reduce Examples示例
Aggregation Pipeline as Alternative to Map-Reduce聚合管道作为Map Reduce的替代方案
An aggregation pipeline provides better performance and usability than a map-reduce operation.聚合管道提供了比map-reduce操作更好的性能和可用性。
Map-reduce operations can be rewritten using aggregation pipeline operators, such as 映射减少操作可以使用聚合管道运算符重写,例如$group, $merge, and others.$group、$merge和其他运算符。
For map-reduce operations that require custom functionality, MongoDB provides the 对于需要自定义功能的map reduce操作,MongoDB从4.4版本开始提供$accumulator and $function aggregation operators starting in version 4.4. Use these operators to define custom aggregation expressions in JavaScript.$accumulator和$function聚合运算符。使用这些运算符可以在JavaScript中定义自定义聚合表达式。
In 在mongosh, the db.collection.mapReduce() method is a wrapper around the mapReduce command. mongosh中,db.collection.mapReduce()方法是mapReduce命令的包装器。The following examples use the 以下示例使用db.collection.mapReduce() method.db.collection.mapReduce()方法。
The examples in this section include aggregation pipeline alternatives without custom aggregation expressions. 本节中的示例包括没有自定义聚合表达式的聚合管道备选方案。For alternatives that use custom expressions, see Map-Reduce to Aggregation Pipeline Translation Examples.有关使用自定义表达式的备选方案,请参阅Map-Reduce到聚合管道转换示例。
Create a sample collection 使用以下文档创建示例orders with these documents:orders集合:
db.orders.insertMany([
{ _id: 1, cust_id: "Ant O. Knee", ord_date: new Date("2020-03-01"), price: 25, items: [ { sku: "oranges", qty: 5, price: 2.5 }, { sku: "apples", qty: 5, price: 2.5 } ], status: "A" },
{ _id: 2, cust_id: "Ant O. Knee", ord_date: new Date("2020-03-08"), price: 70, items: [ { sku: "oranges", qty: 8, price: 2.5 }, { sku: "chocolates", qty: 5, price: 10 } ], status: "A" },
{ _id: 3, cust_id: "Busby Bee", ord_date: new Date("2020-03-08"), price: 50, items: [ { sku: "oranges", qty: 10, price: 2.5 }, { sku: "pears", qty: 10, price: 2.5 } ], status: "A" },
{ _id: 4, cust_id: "Busby Bee", ord_date: new Date("2020-03-18"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },
{ _id: 5, cust_id: "Busby Bee", ord_date: new Date("2020-03-19"), price: 50, items: [ { sku: "chocolates", qty: 5, price: 10 } ], status: "A"},
{ _id: 6, cust_id: "Cam Elot", ord_date: new Date("2020-03-19"), price: 35, items: [ { sku: "carrots", qty: 10, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" },
{ _id: 7, cust_id: "Cam Elot", ord_date: new Date("2020-03-20"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },
{ _id: 8, cust_id: "Don Quis", ord_date: new Date("2020-03-20"), price: 75, items: [ { sku: "chocolates", qty: 5, price: 10 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" },
{ _id: 9, cust_id: "Don Quis", ord_date: new Date("2020-03-20"), price: 55, items: [ { sku: "carrots", qty: 5, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 }, { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },
{ _id: 10, cust_id: "Don Quis", ord_date: new Date("2020-03-23"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" }
])
Return the Total Price Per Customer返回每个客户的总价
Perform the map-reduce operation on the 对orders collection to group by the cust_id, and calculate the sum of the price for each cust_id:orders集合执行map-reduce操作,以cust_id进行分组,并计算每个cust_id的price之和:
Define the map function to process each input document:定义映射函数以处理每个输入文档:In the function,在函数中,thisrefers to the document that the map-reduce operation is processing.this指map-reduce操作正在处理的文档。The function maps the该函数将每个文档的priceto thecust_idfor each document and emits thecust_idandprice.price映射到cust_id,并发出cust_id和price。
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};Define the corresponding reduce function with two arguments使用两个参数keyCustIdandvaluesPrices:keyCustId和valuesPrices定义相应的reduce函数:ThevaluesPricesis an array whose elements are thepricevalues emitted by the map function and grouped bykeyCustId.valuesPrices是一个数组,其元素是map函数发出的price值,并按keyCustId分组。The function reduces the该函数将valuesPricearray to the sum of its elements.valuesPrice数组减少为其元素的总和。
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};Perform map-reduce on all documents in the使用orderscollection using themapFunction1map function and thereduceFunction1reduce function:mapFunction1映射函数和reduceFunction1reduce函数对订单集合中的所有文档执行映射reduce:db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)This operation outputs the results to a collection named此操作将结果输出到名为map_reduce_example.map_reduce_example的集合。If the如果map_reduce_examplecollection already exists, the operation will replace the contents with the results of this map-reduce operation.map_reduce_example集合已经存在,则该操作将用此map-reduce操作的结果替换内容。Query the查询map_reduce_examplecollection to verify the results:map_reduce_example集合以验证结果:db.map_reduce_example.find().sort( { _id: 1 } )
The operation returns these documents:操作将返回以下文档:{ "_id" : "Ant O. Knee", "value" : 95 }
{ "_id" : "Busby Bee", "value" : 125 }
{ "_id" : "Cam Elot", "value" : 60 }
{ "_id" : "Don Quis", "value" : 155 }
Aggregation Alternative聚合备选方案
Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:使用可用的聚合管道运算符,您可以重写映射减少操作,而无需定义自定义函数:
db.orders.aggregate([
{ $group: { _id: "$cust_id", value: { $sum: "$price" } } },
{ $out: "agg_alternative_1" }
])
The$groupstage groups by thecust_idand calculates thevaluefield (See also$sum).$group阶段根据cust_id进行分组,并计算value字段(另请参见$sum)。Thevaluefield contains the totalpricefor eachcust_id.value字段包含每个cust_id的总价。The stage output the following documents to the next stage:该阶段将以下文档输出到下一阶段:{ "_id" : "Don Quis", "value" : 155 }
{ "_id" : "Ant O. Knee", "value" : 95 }
{ "_id" : "Cam Elot", "value" : 60 }
{ "_id" : "Busby Bee", "value" : 125 }Then, the然后,$outwrites the output to the collectionagg_alternative_1. Alternatively, you could use$mergeinstead of$out.$out将输出写入集合agg_alternative_1。或者,您可以使用$merge而不是$out。Query the查询agg_alternative_1collection to verify the results:agg_alternative_1集合以验证结果:db.agg_alternative_1.find().sort( { _id: 1 } )
The operation returns the following documents:该操作返回以下文档:{ "_id" : "Ant O. Knee", "value" : 95 }
{ "_id" : "Busby Bee", "value" : 125 }
{ "_id" : "Cam Elot", "value" : 60 }
{ "_id" : "Don Quis", "value" : 155 }
See also: 另请参阅:
For an alternative that uses custom aggregation expressions, see Map-Reduce to Aggregation Pipeline Translation Examples.有关使用自定义聚合表达式的替代方案,请参阅将Reduce映射到聚合管道转换示例。
Calculate Order and Total Quantity with Average Quantity Per Item使用每个项目的平均数量计算订单和总数量
In the following example, you will see a map-reduce operation on the 在以下示例中,您将看到orders collection for all documents that have an ord_date value greater than or equal to 2020-03-01.orders集合上的映射减少操作,该操作适用于ord_date值大于或等于2020-03-01的所有文档。
The operation in the example:示例中的操作:
Groups by the按item.skufield, and calculates the number of orders and the total quantity ordered for eachsku.item.sku字段分组,并计算每个sku的订单数量和订购总量。Calculates the average quantity per order for each计算每个skuvalue and merges the results into the output collection.sku值的每个订单的平均数量,并将结果合并到输出集合中。
When merging results, if an existing document has the same key as the new result, the operation overwrites the existing document. 合并结果时,如果现有文档与新结果具有相同的键,则该操作将覆盖现有文档。If there is no existing document with the same key, the operation inserts the document.如果没有具有相同键的现有文档,则操作将插入该文档。
Example steps:示例步骤:
Define the map function to process each input document:定义映射函数以处理每个输入文档:In the function,在函数中,thisrefers to the document that the map-reduce operation is processing.this是指地图缩小操作正在处理的文档。For each item, the function associates the对于每个项目,该函数将skuwith a new objectvaluethat contains thecountof1and the itemqtyfor the order and emits thesku(stored in thekey) and thevalue.sku与一个新的对象value相关联,该值包含订单的count值1和项目qty,并发出sku(存储在key中)和value。
var mapFunction2 = function() {
for (var idx = 0; idx < this.items.length; idx++) {
var key = this.items[idx].sku;
var value = { count: 1, qty: this.items[idx].qty };
emit(key, value);
}
};Define the corresponding reduce function with two arguments使用两个参数keySKUandcountObjVals:keySKU和countObjVals定义相应的reduce函数:countObjValsis an array whose elements are the objects mapped to the grouped是一个数组,其元素是映射到由map函数传递给reducer函数的分组keySKUvalues passed by map function to the reducer function.keySKU值的对象。- The function reduces the
countObjValsarray to a single objectreducedValuethat contains thecountand theqtyfields. - In
reducedVal, thecountfield contains the sum of thecountfields from the individual array elements, and theqtyfield contains the sum of theqtyfields from the individual array elements.
var reduceFunction2 = function(keySKU, countObjVals) {
reducedVal = { count: 0, qty: 0 };
for (var idx = 0; idx < countObjVals.length; idx++) {
reducedVal.count += countObjVals[idx].count;
reducedVal.qty += countObjVals[idx].qty;
}
return reducedVal;
};- Define a finalize function with two arguments
keyandreducedVal. The function modifies thereducedValobject to add a computed field namedavgand returns the modified object:var finalizeFunction2 = function (key, reducedVal) {
reducedVal.avg = reducedVal.qty/reducedVal.count;
return reducedVal;
}; - Perform the map-reduce operation on the
orderscollection using themapFunction2,reduceFunction2, andfinalizeFunction2functions:db.orders.mapReduce(
mapFunction2,
reduceFunction2,
{
out: { merge: "map_reduce_example2" },
query: { ord_date: { $gte: new Date("2020-03-01") } },
finalize: finalizeFunction2
}
);This operation uses the
queryfield to select only those documents withord_dategreater than or equal tonew Date("2020-03-01"). Then it outputs the results to a collectionmap_reduce_example2.If the
map_reduce_example2collection already exists, the operation will merge the existing contents with the results of this map-reduce operation. That is, if an existing document has the same key as the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document. - Query the
map_reduce_example2collection to verify the results:db.map_reduce_example2.find().sort( { _id: 1 } )
The operation returns these documents:
{ "_id" : "apples", "value" : { "count" : 4, "qty" : 35, "avg" : 8.75 } }
{ "_id" : "carrots", "value" : { "count" : 2, "qty" : 15, "avg" : 7.5 } }
{ "_id" : "chocolates", "value" : { "count" : 3, "qty" : 15, "avg" : 5 } }
{ "_id" : "oranges", "value" : { "count" : 7, "qty" : 63, "avg" : 9 } }
{ "_id" : "pears", "value" : { "count" : 1, "qty" : 10, "avg" : 10 } }
Aggregation Alternative
Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:
db.orders.aggregate( [
{ $match: { ord_date: { $gte: new Date("2020-03-01") } } },
{ $unwind: "$items" },
{ $group: { _id: "$items.sku", qty: { $sum: "$items.qty" }, orders_ids: { $addToSet: "$_id" } } },
{ $project: { value: { count: { $size: "$orders_ids" }, qty: "$qty", avg: { $divide: [ "$qty", { $size: "$orders_ids" } ] } } } },
{ $merge: { into: "agg_alternative_3", on: "_id", whenMatched: "replace", whenNotMatched: "insert" } }
] )
- The
$matchstage selects only those documents withord_dategreater than or equal tonew Date("2020-03-01"). - The
$unwindstage breaks down the document by theitemsarray field to output a document for each array element. For example:{ "_id" : 1, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-01T00:00:00Z"), "price" : 25, "items" : { "sku" : "oranges", "qty" : 5, "price" : 2.5 }, "status" : "A" }
{ "_id" : 1, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-01T00:00:00Z"), "price" : 25, "items" : { "sku" : "apples", "qty" : 5, "price" : 2.5 }, "status" : "A" }
{ "_id" : 2, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 70, "items" : { "sku" : "oranges", "qty" : 8, "price" : 2.5 }, "status" : "A" }
{ "_id" : 2, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 70, "items" : { "sku" : "chocolates", "qty" : 5, "price" : 10 }, "status" : "A" }
{ "_id" : 3, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 50, "items" : { "sku" : "oranges", "qty" : 10, "price" : 2.5 }, "status" : "A" }
{ "_id" : 3, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 50, "items" : { "sku" : "pears", "qty" : 10, "price" : 2.5 }, "status" : "A" }
{ "_id" : 4, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-18T00:00:00Z"), "price" : 25, "items" : { "sku" : "oranges", "qty" : 10, "price" : 2.5 }, "status" : "A" }
{ "_id" : 5, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-19T00:00:00Z"), "price" : 50, "items" : { "sku" : "chocolates", "qty" : 5, "price" : 10 }, "status" : "A" }
... - The
$groupstage groups by theitems.sku, calculating for each sku:-
- The
qtyfield. Theqtyfield contains the - total
qtyordered per eachitems.sku(See$sum).
- The
-
- The
orders_idsarray. Theorders_idsfield contains an - array of distinct order
_id's for theitems.sku(See$addToSet).
- The
{ "_id" : "chocolates", "qty" : 15, "orders_ids" : [ 2, 5, 8 ] }
{ "_id" : "oranges", "qty" : 63, "orders_ids" : [ 4, 7, 3, 2, 9, 1, 10 ] }
{ "_id" : "carrots", "qty" : 15, "orders_ids" : [ 6, 9 ] }
{ "_id" : "apples", "qty" : 35, "orders_ids" : [ 9, 8, 1, 6 ] }
{ "_id" : "pears", "qty" : 10, "orders_ids" : [ 3 ] } -
- The
$projectstage reshapes the output document to mirror the map-reduce's output to have two fields_idandvalue. The$projectsets: - The
$unwindstage breaks down the document by theitemsarray field to output a document for each array element. For example:{ "_id" : 1, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-01T00:00:00Z"), "price" : 25, "items" : { "sku" : "oranges", "qty" : 5, "price" : 2.5 }, "status" : "A" }
{ "_id" : 1, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-01T00:00:00Z"), "price" : 25, "items" : { "sku" : "apples", "qty" : 5, "price" : 2.5 }, "status" : "A" }
{ "_id" : 2, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 70, "items" : { "sku" : "oranges", "qty" : 8, "price" : 2.5 }, "status" : "A" }
{ "_id" : 2, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 70, "items" : { "sku" : "chocolates", "qty" : 5, "price" : 10 }, "status" : "A" }
{ "_id" : 3, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 50, "items" : { "sku" : "oranges", "qty" : 10, "price" : 2.5 }, "status" : "A" }
{ "_id" : 3, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 50, "items" : { "sku" : "pears", "qty" : 10, "price" : 2.5 }, "status" : "A" }
{ "_id" : 4, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-18T00:00:00Z"), "price" : 25, "items" : { "sku" : "oranges", "qty" : 10, "price" : 2.5 }, "status" : "A" }
{ "_id" : 5, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-19T00:00:00Z"), "price" : 50, "items" : { "sku" : "chocolates", "qty" : 5, "price" : 10 }, "status" : "A" }
... - The
$groupstage groups by theitems.sku, calculating for each sku:- The
qtyfield. Theqtyfield contains the totalqtyordered per eachitems.skuusing$sum. - The
orders_idsarray. Theorders_idsfield contains an array of distinct order_id's for theitems.skuusing$addToSet.
{ "_id" : "chocolates", "qty" : 15, "orders_ids" : [ 2, 5, 8 ] }
{ "_id" : "oranges", "qty" : 63, "orders_ids" : [ 4, 7, 3, 2, 9, 1, 10 ] }
{ "_id" : "carrots", "qty" : 15, "orders_ids" : [ 6, 9 ] }
{ "_id" : "apples", "qty" : 35, "orders_ids" : [ 9, 8, 1, 6 ] }
{ "_id" : "pears", "qty" : 10, "orders_ids" : [ 3 ] } - The
- The
$projectstage reshapes the output document to mirror the map-reduce's output to have two fields_idandvalue. The$projectsets:- the
value.countto the size of theorders_idsarray using$size. - the
value.qtyto theqtyfield of input document. - the
value.avgto the average number of qty per order using$divideand$size.
{ "_id" : "apples", "value" : { "count" : 4, "qty" : 35, "avg" : 8.75 } }
{ "_id" : "pears", "value" : { "count" : 1, "qty" : 10, "avg" : 10 } }
{ "_id" : "chocolates", "value" : { "count" : 3, "qty" : 15, "avg" : 5 } }
{ "_id" : "oranges", "value" : { "count" : 7, "qty" : 63, "avg" : 9 } }
{ "_id" : "carrots", "value" : { "count" : 2, "qty" : 15, "avg" : 7.5 } } - the
- Finally, the
$mergewrites the output to the collectionagg_alternative_3. If an existing document has the same key_idas the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document. - Query the
agg_alternative_3collection to verify the results:db.agg_alternative_3.find().sort( { _id: 1 } )
The operation returns the following documents:
{ "_id" : "apples", "value" : { "count" : 4, "qty" : 35, "avg" : 8.75 } }
{ "_id" : "carrots", "value" : { "count" : 2, "qty" : 15, "avg" : 7.5 } }
{ "_id" : "chocolates", "value" : { "count" : 3, "qty" : 15, "avg" : 5 } }
{ "_id" : "oranges", "value" : { "count" : 7, "qty" : 63, "avg" : 9 } }
{ "_id" : "pears", "value" : { "count" : 1, "qty" : 10, "avg" : 10 } }
See also:
For an alternative that uses custom aggregation expressions, see Map-Reduce to Aggregation Pipeline Translation Examples.