Unpack Arrays & Group Differently展开数组并以不同方式分组
Minimum MongoDB Version: 4.2
Scenario情形
You want to generate a retail report to list the total value and quantity of expensive products sold (valued over 15 dollars). 您希望生成一份零售报告,列出销售的昂贵产品(价值超过15美元)的总价值和数量。The source data is a list of shop orders, where each order contains the set of products purchased as part of the order.源数据是一个商店订单列表,其中每个订单都包含作为订单一部分购买的一组产品。
Sample Data Population样本数据总体
Drop any old version of the database (if it exists) and then populate a new 删除数据库的任何旧版本(如果存在),然后填充一个新orders
collection where each document contains an array of products purchased:orders
集合,其中每个文档都包含一系列购买的产品:
db = db.getSiblingDB("book-unpack-array-group-differently");
db.dropDatabase();
// Insert 4 records into the orders collection each with 1+ product items在订单集合中插入4条记录,每条记录包含1个以上的产品项目
db.orders.insertMany([
{
"order_id": 6363763262239,
"products": [
{
"prod_id": "abc12345",
"name": "Asus Laptop",
"price": NumberDecimal("431.43"),
},
{
"prod_id": "def45678",
"name": "Karcher Hose Set",
"price": NumberDecimal("22.13"),
},
],
},
{
"order_id": 1197372932325,
"products": [
{
"prod_id": "abc12345",
"name": "Asus Laptop",
"price": NumberDecimal("429.99"),
},
],
},
{
"order_id": 9812343774839,
"products": [
{
"prod_id": "pqr88223",
"name": "Morphy Richardds Food Mixer",
"price": NumberDecimal("431.43"),
},
{
"prod_id": "def45678",
"name": "Karcher Hose Set",
"price": NumberDecimal("21.78"),
},
],
},
{
"order_id": 4433997244387,
"products": [
{
"prod_id": "def45678",
"name": "Karcher Hose Set",
"price": NumberDecimal("23.43"),
},
{
"prod_id": "jkl77336",
"name": "Picky Pencil Sharpener",
"price": NumberDecimal("0.67"),
},
{
"prod_id": "xyz11228",
"name": "Russell Hobbs Chrome Kettle",
"price": NumberDecimal("15.76"),
},
],
},
]);
Aggregation Pipeline聚合管道
Define a pipeline ready to perform the aggregation:定义准备执行聚合的管道:
var pipeline = [
// Unpack each product from the each order's product as a new separate record将每个订单产品中的每个产品拆开包装,作为一个新的单独记录
{"$unwind": {
"path": "$products",
}},
// Match only products valued greater than 15.00仅匹配价值大于15.00的产品
{"$match": {
"products.price": {
"$gt": NumberDecimal("15.00"),
},
}},
// Group by product type, capturing each product's total value + quantity按产品类型分组,获取每个产品的总价值+数量
{"$group": {
"_id": "$products.prod_id",
"product": {"$first": "$products.name"},
"total_value": {"$sum": "$products.price"},
"quantity": {"$sum": 1},
}},
// Set product id to be the value of the field that was grouped on将产品id设置为分组所在字段的值
{"$set": {
"product_id": "$_id",
}},
// Omit unwanted fields
{"$unset": [
"_id",
]},
];
Execution执行
Execute the aggregation using the defined pipeline and also view its explain plan:使用定义的管道执行聚合,并查看其解释计划:
db.orders.aggregate(pipeline);
db.orders.explain("executionStats").aggregate(pipeline);
Expected Results预期结果
Four documents should be returned, representing only the four expensive products that were referenced multiple times in the customer orders, each showing the product's total order value and amount sold as shown below:应返回四份文件,仅代表客户订单中多次引用的四种昂贵产品,每份文件均显示产品的订单总价值和销售金额,如下所示:
[
{
product_id: 'pqr88223',
product: 'Morphy Richardds Food Mixer',
total_value: NumberDecimal('431.43'),
quantity: 1
},
{
product_id: 'abc12345',
product: 'Asus Laptop',
total_value: NumberDecimal('861.42'),
quantity: 2
},
{
product_id: 'def45678',
product: 'Karcher Hose Set',
total_value: NumberDecimal('67.34'),
quantity: 3
},
{
product_id: 'xyz11228',
product: 'Russell Hobbs Chrome Kettle',
total_value: NumberDecimal('15.76'),
quantity: 1
}
]
Note, the order of fields shown for each document may vary.请注意,每个文档显示的字段顺序可能会有所不同。
Observations观察
-
Unwinding Arrays.展开数组。The$unwind
stage is a powerful concept, although often unfamiliar to many developers initially.$unwind
阶段是一个强大的概念,尽管许多开发人员最初并不熟悉。Distilled down, it does one simple thing: it generates a new record for each element in an array field of every input document.精简后,它做了一件简单的事情:它为每个输入文档的数组字段中的每个元素生成一个新记录。If a source collection has 3 documents and each document contains an array of 4 elements, then performing an如果一个源集合有3个文档,并且每个文档都包含一个由4个元素组成的数组,那么对每个记录的数组字段执行$unwind
on each record's array field produces 12 records (3 x 4).$unwind
将生成12条记录(3 x 4)。 -
Introducing A Partial Match.引入部分匹配。The current example pipeline scans all documents in the collection and then filters out unpacked products where当前示例管道扫描集合中的所有文档,然后筛选出price > 15.00
.price > 15.00
的未包装产品。If the pipeline executed this filter as the first stage, it would incorrectly produce some result product records with a value of 15 dollars or less.如果管道在第一阶段执行这个筛选器,它将错误地生成一些值为15美元或更低的结果产品记录。This would be the case for an order composed of both inexpensive and expensive products.对于既有便宜产品又有昂贵产品的订单来说,情况就是这样。However, you can still improve the pipeline by including an additional "partial match" filter at the start of the pipeline for products valued at over 15 dollars.然而,对于价值超过15美元的产品,您仍然可以通过在管道开始时添加一个额外的“部分匹配”筛选器来改进管道。The aggregation could leverage an index (on聚合可以利用索引(在products.price
), resulting in a partial rather than full collection scan.products.price
上),从而导致部分而非完整的集合扫描。This extra filter stage is beneficial if the input data set is large and many customer orders are for inexpensive items only.如果输入数据集很大,并且许多客户订单只针对廉价商品,那么这个额外的筛选阶段是有益的。This approach is described in the chapter Pipeline Performance Considerations.管道性能注意事项一章中介绍了这种方法。