Map-Reduce Examples示例
Aggregation Pipeline as Alternative to Map-Reduce聚合管道作为Map Reduce的替代方案
An aggregation pipeline provides better performance and usability than a map-reduce operation.聚合管道提供了比map-reduce操作更好的性能和可用性。
Map-reduce operations can be rewritten using aggregation pipeline operators, such as 映射减少操作可以使用聚合管道运算符重写,例如$group
, $merge
, and others.$group
、$merge
和其他运算符。
For map-reduce operations that require custom functionality, MongoDB provides the 对于需要自定义功能的map reduce操作,MongoDB从4.4版本开始提供$accumulator
and $function
aggregation operators starting in version 4.4. Use these operators to define custom aggregation expressions in JavaScript.$accumulator
和$function
聚合运算符。使用这些运算符可以在JavaScript中定义自定义聚合表达式。
In 在mongosh
, the db.collection.mapReduce()
method is a wrapper around the mapReduce
command. mongosh
中,db.collection.mapReduce()
方法是mapReduce
命令的包装器。The following examples use the 以下示例使用db.collection.mapReduce()
method.db.collection.mapReduce()
方法。
The examples in this section include aggregation pipeline alternatives without custom aggregation expressions. 本节中的示例包括没有自定义聚合表达式的聚合管道备选方案。For alternatives that use custom expressions, see Map-Reduce to Aggregation Pipeline Translation Examples.有关使用自定义表达式的备选方案,请参阅Map-Reduce到聚合管道转换示例。
Create a sample collection 使用以下文档创建示例orders
with these documents:orders
集合:
db.orders.insertMany([
{ _id: 1, cust_id: "Ant O. Knee", ord_date: new Date("2020-03-01"), price: 25, items: [ { sku: "oranges", qty: 5, price: 2.5 }, { sku: "apples", qty: 5, price: 2.5 } ], status: "A" },
{ _id: 2, cust_id: "Ant O. Knee", ord_date: new Date("2020-03-08"), price: 70, items: [ { sku: "oranges", qty: 8, price: 2.5 }, { sku: "chocolates", qty: 5, price: 10 } ], status: "A" },
{ _id: 3, cust_id: "Busby Bee", ord_date: new Date("2020-03-08"), price: 50, items: [ { sku: "oranges", qty: 10, price: 2.5 }, { sku: "pears", qty: 10, price: 2.5 } ], status: "A" },
{ _id: 4, cust_id: "Busby Bee", ord_date: new Date("2020-03-18"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },
{ _id: 5, cust_id: "Busby Bee", ord_date: new Date("2020-03-19"), price: 50, items: [ { sku: "chocolates", qty: 5, price: 10 } ], status: "A"},
{ _id: 6, cust_id: "Cam Elot", ord_date: new Date("2020-03-19"), price: 35, items: [ { sku: "carrots", qty: 10, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" },
{ _id: 7, cust_id: "Cam Elot", ord_date: new Date("2020-03-20"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },
{ _id: 8, cust_id: "Don Quis", ord_date: new Date("2020-03-20"), price: 75, items: [ { sku: "chocolates", qty: 5, price: 10 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" },
{ _id: 9, cust_id: "Don Quis", ord_date: new Date("2020-03-20"), price: 55, items: [ { sku: "carrots", qty: 5, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 }, { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },
{ _id: 10, cust_id: "Don Quis", ord_date: new Date("2020-03-23"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" }
])
Return the Total Price Per Customer返回每个客户的总价
Perform the map-reduce operation on the 对orders
collection to group by the cust_id
, and calculate the sum of the price
for each cust_id
:orders
集合执行map-reduce操作,以cust_id
进行分组,并计算每个cust_id
的price
之和:
Define the map function to process each input document:定义映射函数以处理每个输入文档:In the function,在函数中,this
refers to the document that the map-reduce operation is processing.this
指map-reduce操作正在处理的文档。The function maps the该函数将每个文档的price
to thecust_id
for each document and emits thecust_id
andprice
.price
映射到cust_id
,并发出cust_id
和price
。
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};Define the corresponding reduce function with two arguments使用两个参数keyCustId
andvaluesPrices
:keyCustId
和valuesPrices
定义相应的reduce函数:ThevaluesPrices
is an array whose elements are theprice
values emitted by the map function and grouped bykeyCustId
.valuesPrices
是一个数组,其元素是map
函数发出的price
值,并按keyCustId
分组。The function reduces the该函数将valuesPrice
array to the sum of its elements.valuesPrice
数组减少为其元素的总和。
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};Perform map-reduce on all documents in the使用orders
collection using themapFunction1
map function and thereduceFunction1
reduce function:mapFunction1
映射函数和reduceFunction1
reduce函数对订单集合中的所有文档执行映射reduce:db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)This operation outputs the results to a collection named此操作将结果输出到名为map_reduce_example
.map_reduce_example
的集合。If the如果map_reduce_example
collection already exists, the operation will replace the contents with the results of this map-reduce operation.map_reduce_example
集合已经存在,则该操作将用此map-reduce操作的结果替换内容。Query the查询map_reduce_example
collection to verify the results:map_reduce_example
集合以验证结果:db.map_reduce_example.find().sort( { _id: 1 } )
The operation returns these documents:操作将返回以下文档:{ "_id" : "Ant O. Knee", "value" : 95 }
{ "_id" : "Busby Bee", "value" : 125 }
{ "_id" : "Cam Elot", "value" : 60 }
{ "_id" : "Don Quis", "value" : 155 }
Aggregation Alternative聚合备选方案
Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:使用可用的聚合管道运算符,您可以重写映射减少操作,而无需定义自定义函数:
db.orders.aggregate([
{ $group: { _id: "$cust_id", value: { $sum: "$price" } } },
{ $out: "agg_alternative_1" }
])
The$group
stage groups by thecust_id
and calculates thevalue
field (See also$sum
).$group
阶段根据cust_id
进行分组,并计算value
字段(另请参见$sum
)。Thevalue
field contains the totalprice
for eachcust_id
.value
字段包含每个cust_id
的总价。The stage output the following documents to the next stage:该阶段将以下文档输出到下一阶段:{ "_id" : "Don Quis", "value" : 155 }
{ "_id" : "Ant O. Knee", "value" : 95 }
{ "_id" : "Cam Elot", "value" : 60 }
{ "_id" : "Busby Bee", "value" : 125 }Then, the然后,$out
writes the output to the collectionagg_alternative_1
. Alternatively, you could use$merge
instead of$out
.$out
将输出写入集合agg_alternative_1
。或者,您可以使用$merge
而不是$out
。Query the查询agg_alternative_1
collection to verify the results:agg_alternative_1
集合以验证结果:db.agg_alternative_1.find().sort( { _id: 1 } )
The operation returns the following documents:该操作返回以下文档:{ "_id" : "Ant O. Knee", "value" : 95 }
{ "_id" : "Busby Bee", "value" : 125 }
{ "_id" : "Cam Elot", "value" : 60 }
{ "_id" : "Don Quis", "value" : 155 }
See also: 另请参阅:
For an alternative that uses custom aggregation expressions, see Map-Reduce to Aggregation Pipeline Translation Examples.有关使用自定义聚合表达式的替代方案,请参阅将Reduce映射到聚合管道转换示例。
Calculate Order and Total Quantity with Average Quantity Per Item使用每个项目的平均数量计算订单和总数量
In the following example, you will see a map-reduce operation on the 在以下示例中,您将看到orders
collection for all documents that have an ord_date
value greater than or equal to 2020-03-01
.orders
集合上的映射减少操作,该操作适用于ord_date
值大于或等于2020-03-01
的所有文档。
The operation in the example:示例中的操作:
Groups by the按item.sku
field, and calculates the number of orders and the total quantity ordered for eachsku
.item.sku
字段分组,并计算每个sku
的订单数量和订购总量。Calculates the average quantity per order for each计算每个sku
value and merges the results into the output collection.sku
值的每个订单的平均数量,并将结果合并到输出集合中。
When merging results, if an existing document has the same key as the new result, the operation overwrites the existing document. 合并结果时,如果现有文档与新结果具有相同的键,则该操作将覆盖现有文档。If there is no existing document with the same key, the operation inserts the document.如果没有具有相同键的现有文档,则操作将插入该文档。
Example steps:示例步骤:
Define the map function to process each input document:定义映射函数以处理每个输入文档:In the function,在函数中,this
refers to the document that the map-reduce operation is processing.this
是指地图缩小操作正在处理的文档。For each item, the function associates the对于每个项目,该函数将sku
with a new objectvalue
that contains thecount
of1
and the itemqty
for the order and emits thesku
(stored in thekey
) and thevalue
.sku
与一个新的对象value
相关联,该值包含订单的count
值1
和项目qty
,并发出sku
(存储在key
中)和value
。
var mapFunction2 = function() {
for (var idx = 0; idx < this.items.length; idx++) {
var key = this.items[idx].sku;
var value = { count: 1, qty: this.items[idx].qty };
emit(key, value);
}
};Define the corresponding reduce function with two arguments使用两个参数keySKU
andcountObjVals
:keySKU
和countObjVals
定义相应的reduce函数:countObjVals
is an array whose elements are the objects mapped to the grouped是一个数组,其元素是映射到由map函数传递给reducer函数的分组keySKU
values passed by map function to the reducer function.keySKU
值的对象。- The function reduces the
countObjVals
array to a single objectreducedValue
that contains thecount
and theqty
fields. - In
reducedVal
, thecount
field contains the sum of thecount
fields from the individual array elements, and theqty
field contains the sum of theqty
fields from the individual array elements.
var reduceFunction2 = function(keySKU, countObjVals) {
reducedVal = { count: 0, qty: 0 };
for (var idx = 0; idx < countObjVals.length; idx++) {
reducedVal.count += countObjVals[idx].count;
reducedVal.qty += countObjVals[idx].qty;
}
return reducedVal;
};- Define a finalize function with two arguments
key
andreducedVal
. The function modifies thereducedVal
object to add a computed field namedavg
and returns the modified object:var finalizeFunction2 = function (key, reducedVal) {
reducedVal.avg = reducedVal.qty/reducedVal.count;
return reducedVal;
}; - Perform the map-reduce operation on the
orders
collection using themapFunction2
,reduceFunction2
, andfinalizeFunction2
functions:db.orders.mapReduce(
mapFunction2,
reduceFunction2,
{
out: { merge: "map_reduce_example2" },
query: { ord_date: { $gte: new Date("2020-03-01") } },
finalize: finalizeFunction2
}
);This operation uses the
query
field to select only those documents withord_date
greater than or equal tonew Date("2020-03-01")
. Then it outputs the results to a collectionmap_reduce_example2
.If the
map_reduce_example2
collection already exists, the operation will merge the existing contents with the results of this map-reduce operation. That is, if an existing document has the same key as the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document. - Query the
map_reduce_example2
collection to verify the results:db.map_reduce_example2.find().sort( { _id: 1 } )
The operation returns these documents:
{ "_id" : "apples", "value" : { "count" : 4, "qty" : 35, "avg" : 8.75 } }
{ "_id" : "carrots", "value" : { "count" : 2, "qty" : 15, "avg" : 7.5 } }
{ "_id" : "chocolates", "value" : { "count" : 3, "qty" : 15, "avg" : 5 } }
{ "_id" : "oranges", "value" : { "count" : 7, "qty" : 63, "avg" : 9 } }
{ "_id" : "pears", "value" : { "count" : 1, "qty" : 10, "avg" : 10 } }
Aggregation Alternative
Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:
db.orders.aggregate( [
{ $match: { ord_date: { $gte: new Date("2020-03-01") } } },
{ $unwind: "$items" },
{ $group: { _id: "$items.sku", qty: { $sum: "$items.qty" }, orders_ids: { $addToSet: "$_id" } } },
{ $project: { value: { count: { $size: "$orders_ids" }, qty: "$qty", avg: { $divide: [ "$qty", { $size: "$orders_ids" } ] } } } },
{ $merge: { into: "agg_alternative_3", on: "_id", whenMatched: "replace", whenNotMatched: "insert" } }
] )
- The
$match
stage selects only those documents withord_date
greater than or equal tonew Date("2020-03-01")
. - The
$unwind
stage breaks down the document by theitems
array field to output a document for each array element. For example:{ "_id" : 1, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-01T00:00:00Z"), "price" : 25, "items" : { "sku" : "oranges", "qty" : 5, "price" : 2.5 }, "status" : "A" }
{ "_id" : 1, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-01T00:00:00Z"), "price" : 25, "items" : { "sku" : "apples", "qty" : 5, "price" : 2.5 }, "status" : "A" }
{ "_id" : 2, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 70, "items" : { "sku" : "oranges", "qty" : 8, "price" : 2.5 }, "status" : "A" }
{ "_id" : 2, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 70, "items" : { "sku" : "chocolates", "qty" : 5, "price" : 10 }, "status" : "A" }
{ "_id" : 3, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 50, "items" : { "sku" : "oranges", "qty" : 10, "price" : 2.5 }, "status" : "A" }
{ "_id" : 3, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 50, "items" : { "sku" : "pears", "qty" : 10, "price" : 2.5 }, "status" : "A" }
{ "_id" : 4, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-18T00:00:00Z"), "price" : 25, "items" : { "sku" : "oranges", "qty" : 10, "price" : 2.5 }, "status" : "A" }
{ "_id" : 5, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-19T00:00:00Z"), "price" : 50, "items" : { "sku" : "chocolates", "qty" : 5, "price" : 10 }, "status" : "A" }
... - The
$group
stage groups by theitems.sku
, calculating for each sku:-
- The
qty
field. Theqty
field contains the - total
qty
ordered per eachitems.sku
(See$sum
).
- The
-
- The
orders_ids
array. Theorders_ids
field contains an - array of distinct order
_id
's for theitems.sku
(See$addToSet
).
- The
{ "_id" : "chocolates", "qty" : 15, "orders_ids" : [ 2, 5, 8 ] }
{ "_id" : "oranges", "qty" : 63, "orders_ids" : [ 4, 7, 3, 2, 9, 1, 10 ] }
{ "_id" : "carrots", "qty" : 15, "orders_ids" : [ 6, 9 ] }
{ "_id" : "apples", "qty" : 35, "orders_ids" : [ 9, 8, 1, 6 ] }
{ "_id" : "pears", "qty" : 10, "orders_ids" : [ 3 ] } -
- The
$project
stage reshapes the output document to mirror the map-reduce's output to have two fields_id
andvalue
. The$project
sets: - The
$unwind
stage breaks down the document by theitems
array field to output a document for each array element. For example:{ "_id" : 1, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-01T00:00:00Z"), "price" : 25, "items" : { "sku" : "oranges", "qty" : 5, "price" : 2.5 }, "status" : "A" }
{ "_id" : 1, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-01T00:00:00Z"), "price" : 25, "items" : { "sku" : "apples", "qty" : 5, "price" : 2.5 }, "status" : "A" }
{ "_id" : 2, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 70, "items" : { "sku" : "oranges", "qty" : 8, "price" : 2.5 }, "status" : "A" }
{ "_id" : 2, "cust_id" : "Ant O. Knee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 70, "items" : { "sku" : "chocolates", "qty" : 5, "price" : 10 }, "status" : "A" }
{ "_id" : 3, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 50, "items" : { "sku" : "oranges", "qty" : 10, "price" : 2.5 }, "status" : "A" }
{ "_id" : 3, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-08T00:00:00Z"), "price" : 50, "items" : { "sku" : "pears", "qty" : 10, "price" : 2.5 }, "status" : "A" }
{ "_id" : 4, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-18T00:00:00Z"), "price" : 25, "items" : { "sku" : "oranges", "qty" : 10, "price" : 2.5 }, "status" : "A" }
{ "_id" : 5, "cust_id" : "Busby Bee", "ord_date" : ISODate("2020-03-19T00:00:00Z"), "price" : 50, "items" : { "sku" : "chocolates", "qty" : 5, "price" : 10 }, "status" : "A" }
... - The
$group
stage groups by theitems.sku
, calculating for each sku:- The
qty
field. Theqty
field contains the totalqty
ordered per eachitems.sku
using$sum
. - The
orders_ids
array. Theorders_ids
field contains an array of distinct order_id
's for theitems.sku
using$addToSet
.
{ "_id" : "chocolates", "qty" : 15, "orders_ids" : [ 2, 5, 8 ] }
{ "_id" : "oranges", "qty" : 63, "orders_ids" : [ 4, 7, 3, 2, 9, 1, 10 ] }
{ "_id" : "carrots", "qty" : 15, "orders_ids" : [ 6, 9 ] }
{ "_id" : "apples", "qty" : 35, "orders_ids" : [ 9, 8, 1, 6 ] }
{ "_id" : "pears", "qty" : 10, "orders_ids" : [ 3 ] } - The
- The
$project
stage reshapes the output document to mirror the map-reduce's output to have two fields_id
andvalue
. The$project
sets:- the
value.count
to the size of theorders_ids
array using$size
. - the
value.qty
to theqty
field of input document. - the
value.avg
to the average number of qty per order using$divide
and$size
.
{ "_id" : "apples", "value" : { "count" : 4, "qty" : 35, "avg" : 8.75 } }
{ "_id" : "pears", "value" : { "count" : 1, "qty" : 10, "avg" : 10 } }
{ "_id" : "chocolates", "value" : { "count" : 3, "qty" : 15, "avg" : 5 } }
{ "_id" : "oranges", "value" : { "count" : 7, "qty" : 63, "avg" : 9 } }
{ "_id" : "carrots", "value" : { "count" : 2, "qty" : 15, "avg" : 7.5 } } - the
- Finally, the
$merge
writes the output to the collectionagg_alternative_3
. If an existing document has the same key_id
as the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document. - Query the
agg_alternative_3
collection to verify the results:db.agg_alternative_3.find().sort( { _id: 1 } )
The operation returns the following documents:
{ "_id" : "apples", "value" : { "count" : 4, "qty" : 35, "avg" : 8.75 } }
{ "_id" : "carrots", "value" : { "count" : 2, "qty" : 15, "avg" : 7.5 } }
{ "_id" : "chocolates", "value" : { "count" : 3, "qty" : 15, "avg" : 5 } }
{ "_id" : "oranges", "value" : { "count" : 7, "qty" : 63, "avg" : 9 } }
{ "_id" : "pears", "value" : { "count" : 1, "qty" : 10, "avg" : 10 } }
See also:
For an alternative that uses custom aggregation expressions, see Map-Reduce to Aggregation Pipeline Translation Examples.