$merge (aggregation)
On this page
Definition定义
This page describes the 此页面描述$merge stage, which outputs the aggregation pipeline results to a collection. $merge阶段,它将聚合管道结果输出到集合。For the 有关将文档合并为单个文档的$mergeObjects operator, which merges documents into a single document, see $mergeObjects.$mergeObjects运算符,请参阅$mergeObjects。
$merge-
Writes the results of the aggregation pipeline to a specified collection. The将聚合管道的结果写入指定的集合。$mergeoperator must be the last stage in the pipeline.$merge运算符必须是管道中的最后一个阶段。The$mergestage:$merge阶段:-
Can output to a collection in the same or different database.可以输出到相同或不同数据库中的集合。 -
Starting in MongoDB 4.4:从MongoDB 4.4开始:-
$mergecan output to the same collection that is being aggregated.可以输出到正在聚合的同一集合。For more information, see Output to the Same Collection that is Being Aggregated.有关详细信息,请参阅输出到正在聚合的同一集合。 -
Pipelines with the如果集群中的所有节点都将$mergestage can run on replica set secondary nodes if all the nodes in cluster have featureCompatibilityVersion set to4.4or higher and the Read Preference allows secondary reads.featureCompatibilityVersion设置为4.4或更高版本,并且读取首选项允许辅助读取,则具有$merge阶段的管道可以在副本集辅助节点上运行。-
Read operations of the$mergestatement are sent to secondary nodes, while the write operations occur only on the primary node.$merge语句的读取操作被发送到辅助节点,而写入操作仅发生在主节点上。 -
Not all driver versions support targeting of并非所有驱动程序版本都支持将$mergeoperations to replica set secondary nodes.$merge操作定向到副本集辅助节点。Check your driver documentation to see when your driver added support for查看驱动程序文档,了解驱动程序何时添加了对在辅助节点上运行的$mergeread operations running on secondary nodes.$merge-read操作的支持。
-
-
-
Creates a new collection if the output collection does not already exist.如果输出集合不存在,则创建新集合。 -
Can incorporate results (insert new documents, merge documents, replace documents, keep existing documents, fail the operation, process documents with a custom update pipeline) into an existing collection.可以将结果(插入新文档、合并文档、替换文档、保留现有文档、操作失败、使用自定义更新管道处理文档)合并到现有集合中。 -
Can output to a sharded collection. Input collection can also be sharded.可以输出到分片集合。输入集合也可以进行分片化。
For a comparison with the有关与$outstage which also outputs the aggregation results to a collection, see$mergeand$outComparison.$out阶段的比较,该阶段还将聚合结果输出到集合,请参阅$merge和$out的比较。 -
On-Demand Materialized Views按需物化视图
$merge can incorporate the pipeline results into an existing output collection rather than perform a full replacement of the collection. 可以将管道结果合并到现有的输出集合中,而不是执行集合的完全替换。This functionality allows users to create on-demand materialized views, where the content of the output collection is incrementally updated when the pipeline is run.此功能允许用户创建按需物化视图,在管道运行时,输出集合的内容将在其中进行增量更新。
For more information on this use case, see On-Demand Materialized Views as well as the examples on this page.有关此用例的更多信息,请参阅按需物化视图以及本页上的示例。
Materialized views are separate from read-only views. For information on creating read-only views, see read-only views.物化视图与只读视图是分开的。有关创建只读视图的信息,请参阅只读视图。
Syntax语法
$merge has the following syntax:具有以下语法:
{ $merge: {
into: <collection> -or- { db: <db>, coll: <collection> },
on: <identifier field> -or- [ <identifier field1>, ...], // Optional
let: <variables>, // Optional
whenMatched: <replace|keepExisting|merge|fail|pipeline>, // Optional
whenNotMatched: <insert|discard|fail> // Optional
} }
For example:例如:
{ $merge: { into: "myOutput", on: "_id", whenMatched: "replace", whenNotMatched: "insert" } }
If using all default options for 如果使用$merge, including writing to a collection in the same database, you can use the simplified form:$merge的所有默认选项,包括写入同一数据库中的集合,则可以使用简化形式:
{ $merge: <collection> } // Output collection is in the same database
The $merge takes a document with the following fields:$merge获取具有以下字段的文档:
into |
Note
| ||||||||||
on |
$mergeon标识符字段相对应的键。on fields as its keys. on字段作为其键。
on的默认值取决于输出集合:
| ||||||||||
whenMatched | $merge if a result document and an existing document in the collection have the same value for the specified on field(s).on字段具有相同的值,则$merge的行为。
| ||||||||||
let | whenMatched管道中使用的变量。{ <variable_name_1>: <expression_1>,
Note
| ||||||||||
whenNotMatched | $merge if a result document does not match an existing document in the out collection.out集合中的现有文档不匹配,$merge的行为。
|
Considerations注意事项
_id Field Generation字段生成
If the 如果聚合管道结果中的文档中不存在_id field is not present in a document from the aggregation pipeline results, the $merge stage generates it automatically._id字段,$merge阶段会自动生成该字段。
For example, in the following aggregation pipeline, 例如,在下面的聚合管道中,$project excludes the _id field from the documents passed into $merge. $project从传递到$merge的文档中排除了_id字段。When 当$merge writes these documents to the "newCollection", $merge generates a new _id field and value.$merge将这些文档写入"newCollection"时,$merge会生成一个新的_id字段和值。
db.sales.aggregate( [
{ $project: { _id: 0 } },
{ $merge : { into : "newCollection" } }
] )
Create a New Collection if Output Collection is Non-Existent如果输出集合不存在,则创建新集合
The 如果指定的输出集合不存在,$merge operation creates a new collection if the specified output collection does not exist.$merge操作将创建一个新集合。
-
The output collection is created when输出集合是在$mergewrites the first document into the collection and is immediately visible.$merge将第一个文档写入集合时创建的,并且立即可见。 -
If the aggregation fails, any writes completed by the如果聚合失败,$mergebefore the error will not be rolled back.$merge在出现错误之前完成的任何写入操作都不会回滚。
For a replica set or a standalone, if the output database does not exist, 对于副本集或独立数据库,如果输出数据库不存在,$merge also creates the database.$merge还会创建该数据库。
For a sharded cluster, the specified output database must already exist.对于分片集群,指定的输出数据库必须已经存在。
If the output collection does not exist, 如果输出集合不存在,$merge requires the on identifier to be the _id field. $merge要求on标识符为_id字段。To use a different 若要对不存在的集合使用不同的字段值,可以首先通过在所需字段上创建唯一索引来创建集合。on field value for a collection that does not exist, you can create the collection first by creating a unique index on the desired field(s) first. For example, if the output collection 例如,如果输出集合newDailySales201905 does not exist and you want to specify the salesDate field as the on identifier:newDailySales201905不存在,并且您希望将salesDate字段指定为on标识符:
db.newDailySales201905.createIndex( { salesDate: 1 }, { unique: true } )
db.sales.aggregate( [
{ $match: { date: { $gte: new Date("2019-05-01"), $lt: new Date("2019-06-01") } } },
{ $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$date" } }, totalqty: { $sum: "$quantity" } } },
{ $project: { _id: 0, salesDate: { $toDate: "$_id" }, totalqty: 1 } },
{ $merge : { into : "newDailySales201905", on: "salesDate" } }
] )
Output to a Sharded Collection输出到分片集合
The $merge stage can output to a sharded collection. $merge阶段可以输出到分片集合。When the output collection is sharded, 当输出集合被分片时,$merge uses the _id field and all the shard key fields as the default on identifier. $merge使用_id字段和所有分片键字段作为默认的on标识符。If you override the default, the on identifier must include all the shard key fields:如果覆盖默认值,on标识符必须包括所有shard密钥字段:
{ $merge: {
into: "<shardedColl>" or { db:"<sharding enabled db>", coll: "<shardedColl>" },
on: [ "<shardkeyfield1>", "<shardkeyfield2>",... ], // Shard key fields and any additional fields
let: <variables>, // Optional
whenMatched: <replace|keepExisting|merge|fail|pipeline>, // Optional
whenNotMatched: <insert|discard|fail> // Optional
} }
For example, use the 例如,使用sh.shardCollection() method to create a new sharded collection newrestaurants with the postcode field as the shard key.sh.shardCollection()方法创建一个新的分片集合newrestaurants,并将postcode字段作为分片键。
sh.shardCollection(
"exampledb.newrestaurants", // Namespace of the collection to shard
{ postcode: 1 }, // Shard key
);
The newrestaurants collection will contain documents with information on new restaurant openings by month (date field) and postcode (shard key); specifically, the on identifier is ["date", "postcode"] (the ordering of the fields does not matter). newrestaurants集合将包含按月份(date字段)和邮政编码(分片键)列出的新餐厅开业信息的文档;具体来说,on标识符是["date", "postcode"](字段的顺序无关紧要)。Because 由于$merge requires a unique index with keys that correspond to the on identifier fields, create the unique index (the ordering of the fields do not matter): $merge需要一个唯一索引,其中包含与on标识符字段相对应的键,因此请创建唯一索引(字段的顺序无关紧要):[1]
use exampledb
db.newrestaurants.createIndex( { postcode: 1, date: 1 }, { unique: true } )
With the sharded collection 有了分片的集合restaurants and the unique index created, you can use $merge to output the aggregation results to this collection, matching on [ "date", "postcode" ] as in this example:restaurants和创建的唯一索引,您可以使用$merge将聚合结果输出到此集合,在[ "date", "postcode" ]上匹配,如本例所示:
use exampledb
db.openings.aggregate([
{ $group: {
_id: { date: { $dateToString: { format: "%Y-%m", date: "$date" } }, postcode: "$postcode" },
restaurants: { $push: "$restaurantName" } } },
{ $project: { _id: 0, postcode: "$_id.postcode", date: "$_id.date", restaurants: 1 } },
{ $merge: { into: "newrestaurants", "on": [ "date", "postcode" ], whenMatched: "replace", whenNotMatched: "insert" } }
])
| [1] | sh.shardCollection() method can also create a unique index on the shard key when passed the { unique: true } option if: the shard key is range-based, the collection is empty, and a unique index on the shard key doesn't already exist.{ unique: true }选项时,sh.shardCollection()方法还可以在分片键上创建一个唯一索引,如果:分片键是基于范围的,集合是空的,并且分片键上还不存在唯一索引。on identifier is the shard key and another field, a separate operation to create the corresponding index is required.on标识符是分片键和另一个字段,因此需要单独的操作来创建相应的索引。 |
Replace Documents ($merge) vs Replace Collection ($out)替换文档($merge)与替换集合($out)
$merge) vs Replace Collection ($out)$merge can replace an existing document in the output collection if the aggregation results contain a document or documents that match based on the on specification. 如果聚合结果包含基于on规范匹配的一个或多个文档,则可以替换输出集合中的现有文档。As such, 因此,如果聚合结果包括集合中所有现有文档的匹配文档,并且您为$merge can replace all documents in the existing collection if the aggregation results include matching documents for all existing documents in the collection and you specify "replace" for whenMatched.whenMatched指定了"replace",$merge可以替换现有集合中的所有文档。
However, to replace an existing collection regardless of the aggregation results, use 但是,要替换现有集合而不管聚合结果如何,请改用$out instead.$out。
Existing Documents and _id and Shard Key Values现有文档和_id以及分片键值
_id and Shard Key ValuesThe 如果$merge errors if the $merge results in a change to an existing document's _id value.$merge导致现有文档的_id值发生更改,则$merge将出错。
To avoid this error, if the on field does not include the 为了避免此错误,如果_id field, remove the _id field in the aggregation results to avoid the error, such as with a preceding $unset stage, and so on.on字段不包括_id字段,请在聚合结果中删除_id字段以避免错误,例如使用前面的$unset阶段,依此类推。
Additionally, for a sharded collection, 此外,对于分片集合,如果$merge also generates an error if it results in a change to the shard key value of an exising document.$merge导致现有文档的分片键值发生更改,那么它也会生成错误。
Any writes completed by the $merge before the error will not be rolled back.$merge在出现错误之前完成的任何写入操作都不会回滚。
Unique Index Constraints唯一索引约束
If the unique index used by 如果$merge for on field(s) is dropped mid-aggregation, there is no guarantee that the aggregation will be killed. $merge用于on字段的唯一索引在聚合过程中被丢弃,则不能保证聚合会被终止。If the aggregation continues, there is no guarantee that documents do not have duplicate 如果继续聚合,则无法保证文档的on field values.on字段值不重复。
If the 如果$merge attempts to write a document that violates any unique index on the output collection, the operation generates an error. $merge试图写入违反输出集合上任何唯一索引的文档,则该操作将生成错误。For example:例如:
-
Insert a non-matching document插入不匹配的文档that violates a unique index other than the index on the on field(s).违反了除on字段上的索引之外的唯一索引。 -
Fail失败if there is a matching document in the collection. Specifically, the operation attempts to insert the matching document which violates the unique index on the on field(s).如果集合中有匹配的文档。具体而言,该操作试图插入匹配的文档,这违反了on字段的唯一索引。 -
Replace an existing document替换现有文档with a new document that violates a unique index other than the index on the on field(s).新文档违反了除on字段上的索引之外的唯一索引。 -
Merge the matching documents合并匹配的文档that results in a document that violates a unique index other than the index on the on field(s).这会导致文档违反on字段的索引之外的唯一索引。
whenMatched Pipeline Behavior管道行为
Starting in MongoDB 4.2.2, if all of the following are true for a 从MongoDB 4.2.2开始,如果$merge stage:$merge阶段以下所有条件都成立:
-
The value of whenMatched is an aggregation pipeline,whenMatched的值是聚合管道, -
The value of whenNotMatched isinsert, andwhenNotMatched的值为insert,并且 -
There is no match for a document in the output collection,在输出集合中没有文档的匹配项,
$merge inserts the document directly into the output collection.将文档直接插入到输出集合中。
Prior to MongoDB 4.2.2, when these conditions for a 在MongoDB 4.2.2之前,当$merge stage are met, the pipeline specified in the whenMatched field is executed with an empty input document. $merge阶段的这些条件得到满足时,在whenMatched字段中指定的管道将使用空输入文档执行。The resulting document from the pipeline is inserted into the output collection.来自管道的结果文档被插入到输出集合中。
See also: 另请参阅:
$merge and $out Comparison$merge和$out的比较
$merge and $out ComparisonWith the introduction of 随着$merge, MongoDB provides two stages, $merge and $out, for writing the results of the aggregation pipeline to a collection:$merge的引入,MongoDB提供了$merge和$out两个阶段,用于将聚合管道的结果写入集合:
$merge | $out |
|---|---|
$merge) vs Replace Collection ($out).$merge)与替换集合($out)。 | |
|
|
Output to the Same Collection that is Being Aggregated输出到正在聚合的同一集合
When 当$merge outputs to the same collection that is being aggregated, documents may get updated multiple times or the operation may result in an infinite loop. $merge输出到正在聚合的同一集合时,文档可能会被多次更新,或者该操作可能导致无限循环。This behavior occurs when the update performed by 当$merge changes the physical location of documents stored on disk.$merge执行的更新更改了存储在磁盘上的文档的物理位置时,就会发生这种行为。When the physical location of a document changes, 当文档的物理位置发生更改时,$merge may view it as an entirely new document, resulting in additional updates. $merge可能会将其视为一个全新的文档,从而产生额外的更新。For more information on this behavior, see Halloween Problem.有关此行为的详细信息,请参阅万圣节问题。
Starting in MongoDB 4.4, 从MongoDB 4.4开始,$merge can output to the same collection that is being aggregated. $merge可以输出到正在聚合的同一集合。You can also output to a collection which appears in other stages of the pipeline, such as 您还可以输出到出现在管道的其他阶段的集合,例如$lookup.$lookup。
Versions of MongoDB prior to 4.4 did not allow 4.4之前的MongoDB版本不允许$merge to output to the same collection as the collection being aggregated.$merge输出到与正在聚合的集合相同的集合。
Restrictions限制
$merge inside a transaction.$merge。 | |
$merge to output to a time series collection.$merge输出到时间序列集合。 | |
$merge stage. $merge阶段。$facet stage), this $merge stage restriction applies to the nested pipelines as well.$facet阶段),则此$merge阶段限制也适用于嵌套管道。 | |
$lookup | $lookup stage's nested pipeline cannot include the $merge stage.$lookup阶段的嵌套管道不能包括$merge阶段。 |
$facet | $facet stage's nested pipeline cannot include the $merge stage.$facet阶段的嵌套管道不能包括$merge阶段。 |
$unionWith | $unionWith stage's nested pipeline cannot include the $merge stage.$unionWith阶段的嵌套管道不能包括$merge阶段。 |
"linearizable" | $merge stage cannot be used in conjunction with read concern "linearizable". $merge阶段不能与读取关注点"linearizable"一起使用。"linearizable" read concern for db.collection.aggregate(), you cannot include the $merge stage in the pipeline. db.collection.aggregate()指定"linearizable"读取关注点,则不能在管道中包含$merge阶段。 |
Examples实例
On-Demand Materialized View: Initial Creation按需物化视图:初步创建
If the output collection does not exist, the 如果输出集合不存在,$merge creates the collection.$merge将创建该集合。
For example, a collection named 例如,salaries in the zoo database is populated with the employee salary and department history:zoo数据库中名为salaries的集合中填充了员工工资和部门历史记录:
db.getSiblingDB("zoo").salaries.insertMany([
{ "_id" : 1, employee: "Ant", dept: "A", salary: 100000, fiscal_year: 2017 },
{ "_id" : 2, employee: "Bee", dept: "A", salary: 120000, fiscal_year: 2017 },
{ "_id" : 3, employee: "Cat", dept: "Z", salary: 115000, fiscal_year: 2017 },
{ "_id" : 4, employee: "Ant", dept: "A", salary: 115000, fiscal_year: 2018 },
{ "_id" : 5, employee: "Bee", dept: "Z", salary: 145000, fiscal_year: 2018 },
{ "_id" : 6, employee: "Cat", dept: "Z", salary: 135000, fiscal_year: 2018 },
{ "_id" : 7, employee: "Gecko", dept: "A", salary: 100000, fiscal_year: 2018 },
{ "_id" : 8, employee: "Ant", dept: "A", salary: 125000, fiscal_year: 2019 },
{ "_id" : 9, employee: "Bee", dept: "Z", salary: 160000, fiscal_year: 2019 },
{ "_id" : 10, employee: "Cat", dept: "Z", salary: 150000, fiscal_year: 2019 }
])
You can use the 您可以使用$group and $merge stages to initially create a collection named budgets (in the reporting database) from the data currently in the salaries collection:$group和$merge阶段,从当前薪资集合中的数据开始创建一个名为预算的集合(在报告数据库中):
For a replica set or a standalone deployment, if the output database does not exist, 对于副本集或独立部署,如果输出数据库不存在,$merge also creates the database.$merge还会创建该数据库。
For a sharded cluster deployment, the specified output database must already exist.对于分片集群部署,指定的输出数据库必须已经存在。
db.getSiblingDB("zoo").salaries.aggregate( [
{ $group: { _id: { fiscal_year: "$fiscal_year", dept: "$dept" }, salaries: { $sum: "$salary" } } },
{ $merge : { into: { db: "reporting", coll: "budgets" }, on: "_id", whenMatched: "replace", whenNotMatched: "insert" } }
] )
-
$groupstage to group the salaries by thefiscal_yearanddept.$group阶段,按fiscal_year(财年)和dept(部门)对薪资进行分组。 -
$mergestage writes the output of the preceding阶段将前一个$groupstage to thebudgetscollection in thereportingdatabase.$group阶段的输出写入reporting数据库中的budgets集合。
To view the documents in the new 要查看新budgets collection:budgets集合中的文档,请执行以下操作:
db.getSiblingDB("reporting").budgets.find().sort( { _id: 1 } )
The budgets collection contains the following documents:budgets集合包含以下文档:
{ "_id" : { "fiscal_year" : 2017, "dept" : "A" }, "salaries" : 220000 }
{ "_id" : { "fiscal_year" : 2017, "dept" : "Z" }, "salaries" : 115000 }
{ "_id" : { "fiscal_year" : 2018, "dept" : "A" }, "salaries" : 215000 }
{ "_id" : { "fiscal_year" : 2018, "dept" : "Z" }, "salaries" : 280000 }
{ "_id" : { "fiscal_year" : 2019, "dept" : "A" }, "salaries" : 125000 }
{ "_id" : { "fiscal_year" : 2019, "dept" : "Z" }, "salaries" : 310000 }
See also: 另请参阅:
On-Demand Materialized View: Update/Replace Data按需物化视图:更新/替换数据
The following example uses the collections in the previous example.以下示例使用上一示例中的集合。
The example 示例salaries collection contains the employee salary and department history:salaries集合包含员工薪资和部门历史记录:
{ "_id" : 1, employee: "Ant", dept: "A", salary: 100000, fiscal_year: 2017 },
{ "_id" : 2, employee: "Bee", dept: "A", salary: 120000, fiscal_year: 2017 },
{ "_id" : 3, employee: "Cat", dept: "Z", salary: 115000, fiscal_year: 2017 },
{ "_id" : 4, employee: "Ant", dept: "A", salary: 115000, fiscal_year: 2018 },
{ "_id" : 5, employee: "Bee", dept: "Z", salary: 145000, fiscal_year: 2018 },
{ "_id" : 6, employee: "Cat", dept: "Z", salary: 135000, fiscal_year: 2018 },
{ "_id" : 7, employee: "Gecko", dept: "A", salary: 100000, fiscal_year: 2018 },
{ "_id" : 8, employee: "Ant", dept: "A", salary: 125000, fiscal_year: 2019 },
{ "_id" : 9, employee: "Bee", dept: "Z", salary: 160000, fiscal_year: 2019 },
{ "_id" : 10, employee: "Cat", dept: "Z", salary: 150000, fiscal_year: 2019 }
The example 示例budgets collection contains the cumulative yearly budgets:budgets集合包含累计年度预算:
{ "_id" : { "fiscal_year" : 2017, "dept" : "A" }, "salaries" : 220000 }
{ "_id" : { "fiscal_year" : 2017, "dept" : "Z" }, "salaries" : 115000 }
{ "_id" : { "fiscal_year" : 2018, "dept" : "A" }, "salaries" : 215000 }
{ "_id" : { "fiscal_year" : 2018, "dept" : "Z" }, "salaries" : 280000 }
{ "_id" : { "fiscal_year" : 2019, "dept" : "A" }, "salaries" : 125000 }
{ "_id" : { "fiscal_year" : 2019, "dept" : "Z" }, "salaries" : 310000 }
During the current fiscal year (在当前财政年度(本例中为2019 in this example), new employees are added to the salaries collection and new head counts are pre-allocated for the next year:2019年),新员工将被添加到salaries集合中,并为下一年度预先分配新的人数:
db.getSiblingDB("zoo").salaries.insertMany([
{ "_id" : 11, employee: "Wren", dept: "Z", salary: 100000, fiscal_year: 2019 },
{ "_id" : 12, employee: "Zebra", dept: "A", salary: 150000, fiscal_year: 2019 },
{ "_id" : 13, employee: "headcount1", dept: "Z", salary: 120000, fiscal_year: 2020 },
{ "_id" : 14, employee: "headcount2", dept: "Z", salary: 120000, fiscal_year: 2020 }
])
To update the 要更新budgets collection to reflect the new salary information, the following aggregation pipeline uses:budgets集合以反映新的薪资信息,请使用以下聚合管道:
-
$matchstage to find all documents with阶段查找fiscal_yeargreater than or equal to2019.fiscal_year大于或等于2019的所有文档。 -
$groupstage to group the salaries by thefiscal_yearanddept.$group阶段,按财政年度和部门对薪资进行分组。 -
$mergeto write the result set to thebudgetscollection, replacing documents with the same_idvalue (in this example, a document with the fiscal year and dept).$merge将结果集写入budgets集合,替换具有相同_id值的文档(在本例中,是具有fiscal_year和dept的文档)。For documents that do not have matches in the collection,对于集合中没有匹配项的文档,$mergeinserts the new documents.$merge将插入新文档。
db.getSiblingDB("zoo").salaries.aggregate( [
{ $match : { fiscal_year: { $gte : 2019 } } },
{ $group: { _id: { fiscal_year: "$fiscal_year", dept: "$dept" }, salaries: { $sum: "$salary" } } },
{ $merge : { into: { db: "reporting", coll: "budgets" }, on: "_id", whenMatched: "replace", whenNotMatched: "insert" } }
] )
After the aggregation is run, view the documents in the 运行聚合后,查看budgets collection:budgets集合中的文档:
db.getSiblingDB("reporting").budgets.find().sort( { _id: 1 } )
The budgets collection incorporates the new salary data for fiscal year 2019 and adds new documents for fiscal year 2020:budgets集合包含了2019财年的新薪资数据,并添加了2020财年的新文档:
{ "_id" : { "fiscal_year" : 2017, "dept" : "A" }, "salaries" : 220000 }
{ "_id" : { "fiscal_year" : 2017, "dept" : "Z" }, "salaries" : 115000 }
{ "_id" : { "fiscal_year" : 2018, "dept" : "A" }, "salaries" : 215000 }
{ "_id" : { "fiscal_year" : 2018, "dept" : "Z" }, "salaries" : 280000 }
{ "_id" : { "fiscal_year" : 2019, "dept" : "A" }, "salaries" : 275000 }
{ "_id" : { "fiscal_year" : 2019, "dept" : "Z" }, "salaries" : 410000 }
{ "_id" : { "fiscal_year" : 2020, "dept" : "Z" }, "salaries" : 240000 }
See also: 另请参阅:
Only Insert New Data仅插入新数据
To ensure that the 若要确保$merge does not overwrite existing data in the collection, set whenMatched to keepExisting or fail.$merge不会覆盖集合中的现有数据,请将whenMatched设置为keepExisting或fail。
The example salaries collection in the zoo database contains the employee salary and department history:zoo数据库中的示例salaries集合包含员工工资和部门历史记录:
{ "_id" : 1, employee: "Ant", dept: "A", salary: 100000, fiscal_year: 2017 },
{ "_id" : 2, employee: "Bee", dept: "A", salary: 120000, fiscal_year: 2017 },
{ "_id" : 3, employee: "Cat", dept: "Z", salary: 115000, fiscal_year: 2017 },
{ "_id" : 4, employee: "Ant", dept: "A", salary: 115000, fiscal_year: 2018 },
{ "_id" : 5, employee: "Bee", dept: "Z", salary: 145000, fiscal_year: 2018 },
{ "_id" : 6, employee: "Cat", dept: "Z", salary: 135000, fiscal_year: 2018 },
{ "_id" : 7, employee: "Gecko", dept: "A", salary: 100000, fiscal_year: 2018 },
{ "_id" : 8, employee: "Ant", dept: "A", salary: 125000, fiscal_year: 2019 },
{ "_id" : 9, employee: "Bee", dept: "Z", salary: 160000, fiscal_year: 2019 },
{ "_id" : 10, employee: "Cat", dept: "Z", salary: 150000, fiscal_year: 2019 }
A collection orgArchive in the reporting database contains historical departmental organization records for the past fiscal years. reporting数据库中的orgArchive集合中包含过去财政年度的部门组织历史记录。Archived records should not be modified.不应修改存档的记录。
{ "_id" : ObjectId("5cd8c68261baa09e9f3622be"), "employees" : [ "Ant", "Gecko" ], "dept" : "A", "fiscal_year" : 2018 }
{ "_id" : ObjectId("5cd8c68261baa09e9f3622bf"), "employees" : [ "Ant", "Bee" ], "dept" : "A", "fiscal_year" : 2017 }
{ "_id" : ObjectId("5cd8c68261baa09e9f3622c0"), "employees" : [ "Bee", "Cat" ], "dept" : "Z", "fiscal_year" : 2018 }
{ "_id" : ObjectId("5cd8c68261baa09e9f3622c1"), "employees" : [ "Cat" ], "dept" : "Z", "fiscal_year" : 2017 }
The orgArchive collection has a unique compound index on the fiscal_year and dept fields. orgArchive集合在fiscal_year和dept字段上有一个唯一的复合索引。Specifically, there should be at most one record for the same fiscal year and department combination:具体而言,同一会计年度和部门组合最多应存在一条记录:
db.getSiblingDB("reporting").orgArchive.createIndex ( { fiscal_year: 1, dept: 1 }, { unique: true } )
At the end of current fiscal year (在本财政年度结束时(本例中为2019 in this example), the salaries collection contain the following documents:2019年),salaries集合包含以下文档:
{ "_id" : 1, "employee" : "Ant", "dept" : "A", "salary" : 100000, "fiscal_year" : 2017 }
{ "_id" : 2, "employee" : "Bee", "dept" : "A", "salary" : 120000, "fiscal_year" : 2017 }
{ "_id" : 3, "employee" : "Cat", "dept" : "Z", "salary" : 115000, "fiscal_year" : 2017 }
{ "_id" : 4, "employee" : "Ant", "dept" : "A", "salary" : 115000, "fiscal_year" : 2018 }
{ "_id" : 5, "employee" : "Bee", "dept" : "Z", "salary" : 145000, "fiscal_year" : 2018 }
{ "_id" : 6, "employee" : "Cat", "dept" : "Z", "salary" : 135000, "fiscal_year" : 2018 }
{ "_id" : 7, "employee" : "Gecko", "dept" : "A", "salary" : 100000, "fiscal_year" : 2018 }
{ "_id" : 8, "employee" : "Ant", "dept" : "A", "salary" : 125000, "fiscal_year" : 2019 }
{ "_id" : 9, "employee" : "Bee", "dept" : "Z", "salary" : 160000, "fiscal_year" : 2019 }
{ "_id" : 10, "employee" : "Cat", "dept" : "Z", "salary" : 150000, "fiscal_year" : 2019 }
{ "_id" : 11, "employee" : "Wren", "dept" : "Z", "salary" : 100000, "fiscal_year" : 2019 }
{ "_id" : 12, "employee" : "Zebra", "dept" : "A", "salary" : 150000, "fiscal_year" : 2019 }
{ "_id" : 13, "employee" : "headcount1", "dept" : "Z", "salary" : 120000, "fiscal_year" : 2020 }
{ "_id" : 14, "employee" : "headcount2", "dept" : "Z", "salary" : 120000, "fiscal_year" : 2020 }
To update the 要更新orgArchive collection to include the fiscal year 2019 that has just ended, the following aggregation pipeline uses:orgArchive集合以包括刚刚结束的2019财年,请使用以下聚合管道:
-
$matchstage to find all documents with阶段查找fiscal_yearequal to2019.fiscal_ayear等于2019的所有文档。 -
$groupstage to group the employees by the按fiscal_yearanddept.fiscal_year和dept对员工进行分组的阶段。 -
$projectstage to suppress the阶段来取消_idfield and add separatedeptandfiscal_yearfield._id字段,并添加单独的dept和fiscal_year字段。When the documents are passed to当文档传递给$merge,$mergeautomatically generates a new_idfield for the documents.$merge时,$merge会自动为文档生成一个新的_id字段。 -
$mergeto write the result set toorgArchive.The$mergestage matches documents on thedeptandfiscal_yearfields andfailswhen matched.$merge阶段匹配dept和fiscal_year字段中的文档,匹配时失败。That is, if a document already exists for the same department and fiscal year, the也就是说,如果同一部门和会计年度的文档已经存在,$mergeerrors.$merge将出错。
db.getSiblingDB("zoo").salaries.aggregate( [
{ $match: { fiscal_year: 2019 }},
{ $group: { _id: { fiscal_year: "$fiscal_year", dept: "$dept" }, employees: { $push: "$employee" } } },
{ $project: { _id: 0, dept: "$_id.dept", fiscal_year: "$_id.fiscal_year", employees: 1 } },
{ $merge : { into : { db: "reporting", coll: "orgArchive" }, on: [ "dept", "fiscal_year" ], whenMatched: "fail" } }
] )
After the operation, the 操作后,orgArchive collection contains the following documents:orgArchive集合包含以下文档:
{ "_id" : ObjectId("5caccc6a66b22dd8a8cc419f"), "employees" : [ "Ahn", "Bess" ], "dept" : "A", "fiscal_year" : 2017 }
{ "_id" : ObjectId("5caccc6a66b22dd8a8cc419e"), "employees" : [ "Ahn", "Gee" ], "dept" : "A", "fiscal_year" : 2018 }
{ "_id" : ObjectId("5caccd0b66b22dd8a8cc438e"), "employees" : [ "Ahn", "Zeb" ], "dept" : "A", "fiscal_year" : 2019 }
{ "_id" : ObjectId("5caccc6a66b22dd8a8cc41a0"), "employees" : [ "Carl" ], "dept" : "Z", "fiscal_year" : 2017 }
{ "_id" : ObjectId("5caccc6a66b22dd8a8cc41a1"), "employees" : [ "Bess", "Carl" ], "dept" : "Z", "fiscal_year" : 2018 }
{ "_id" : ObjectId("5caccd0b66b22dd8a8cc438d"), "employees" : [ "Bess", "Carl", "Wen" ], "dept" : "Z", "fiscal_year" : 2019 }
If the 如果orgArchive collection already contained a document for 2019 for department "A" and/or "B", the aggregation fails because of the duplicate key error. However, any document inserted before the error will not be rolled back.orgArchive集合已经包含部门"A"和/或"B"的2019年文档,则由于重复键错误,聚合失败。但是,在出现错误之前插入的任何文档都不会回滚。
If you specify keepExisting for the matching document, the aggregation does not affect the matching document and does not error with duplicate key error. 如果为匹配的文档指定keepExisting,则聚合不会影响匹配的文档,也不会出现重复键错误。Similarly, if you specify replace, the operation would not fail; however, the operation would replace the existing document.同样,如果指定替换,则操作不会失败;不过,该行动将取代现有文件。
Merge Results from Multiple Collections合并多个集合的结果
By default, if a document in the aggregation results matches a document in the collection, the 默认情况下,如果聚合结果中的文档与集合中的文档匹配,$merge stage merges the documents.$merge阶段将合并这些文档。
An example collection 示例集合purchaseorders is populated with the purchase order information by quarter and regions:purchaseorders按季度和地区填充采购订单信息:
db.purchaseorders.insertMany( [
{ _id: 1, quarter: "2019Q1", region: "A", qty: 200, reportDate: new Date("2019-04-01") },
{ _id: 2, quarter: "2019Q1", region: "B", qty: 300, reportDate: new Date("2019-04-01") },
{ _id: 3, quarter: "2019Q1", region: "C", qty: 700, reportDate: new Date("2019-04-01") },
{ _id: 4, quarter: "2019Q2", region: "B", qty: 300, reportDate: new Date("2019-07-01") },
{ _id: 5, quarter: "2019Q2", region: "C", qty: 1000, reportDate: new Date("2019-07-01") },
{ _id: 6, quarter: "2019Q2", region: "A", qty: 400, reportDate: new Date("2019-07-01") },
] )
Another example collection 另一个示例集合reportedsales is populated with the reported sales information by quarter and regions:reportedsales按季度和地区填充了报告的销售信息:
db.reportedsales.insertMany( [
{ _id: 1, quarter: "2019Q1", region: "A", qty: 400, reportDate: new Date("2019-04-02") },
{ _id: 2, quarter: "2019Q1", region: "B", qty: 550, reportDate: new Date("2019-04-02") },
{ _id: 3, quarter: "2019Q1", region: "C", qty: 1000, reportDate: new Date("2019-04-05") },
{ _id: 4, quarter: "2019Q2", region: "B", qty: 500, reportDate: new Date("2019-07-02") },
] )
Assume that, for reporting purposes, you want to view the data by quarter in the following format:假设出于报告目的,您希望按季度以以下格式查看数据:
{ "_id" : "2019Q1", "sales" : 1950, "purchased" : 1200 }
{ "_id" : "2019Q2", "sales" : 500, "purchased" : 1700 }
You can use the 您可以使用$merge to merge in results from the purchaseorders collection and the reportedsales collection to create a new collection quarterlyreport.$merge合并purchaseorders集合和reportedsales集合的结果,以创建新的集合quarterlyreport。
To create the 要创建quarterlyreport collection, you can use the following pipeline:quarterlyreport集合,可以使用以下管道:
db.purchaseorders.aggregate( [
{ $group: { _id: "$quarter", purchased: { $sum: "$qty" } } }, // group purchase orders by quarter
{ $merge : { into: "quarterlyreport", on: "_id", whenMatched: "merge", whenNotMatched: "insert" } }
])
First stage:第一阶段:-
The$groupstage groups by the quarter and uses$sumto add theqtyfields into a newpurchasedfield.$group阶段按季度分组,并使用$sum将qty字段添加到新的purchased字段中。For example:例如:To create the要创建quarterlyreportcollection, you can use this pipeline:quarterlyreport集合,可以使用以下管道:{ "_id" : "2019Q2", "purchased" : 1700 }
{ "_id" : "2019Q1", "purchased" : 1200 } Second stage:第二阶段:The$mergestage writes the documents to thequarterlyreportcollection in the same database.$merge阶段将文档写入同一数据库中的quarterlyreport集合。If the stage finds an existing document in the collection that matches on the如果阶段在集合中找到与_idfield, the stage merges the matching documents._id字段匹配的现有文档,则阶段将合并匹配的文档。Otherwise, the stage inserts the document. For the initial creation, no documents should match.否则,阶段将插入文档。对于初始创建,不应匹配任何文档。
To view the documents in the collection, run the following operation:若要查看集合中的文档,请运行以下操作:
db.quarterlyreport.find().sort( { _id: 1 } )
The collection contains the following documents:该集合包含以下文档:
{ "_id" : "2019Q1", "sales" : 1200, "purchased" : 1200 }
{ "_id" : "2019Q2", "sales" : 1700, "purchased" : 1700 }
Similarly, run the following aggregation pipeline against the 同样,对reportedsales collection to merge the sales results into the quarterlyreport collection.reportedsales集合运行以下聚合管道,将销售结果合并到quarterlyreport集合中。
db.reportedsales.aggregate( [
{ $group: { _id: "$quarter", sales: { $sum: "$qty" } } }, // group sales by quarter
{ $merge : { into: "quarterlyreport", on: "_id", whenMatched: "merge", whenNotMatched: "insert" } }
])
First stage:第一阶段:-
The$groupstage groups by the quarter and uses$sumto add theqtyfields into a newsalesfield.$group阶段按季度分组,并使用$sum将qty字段添加到新的sales字段中。For example:例如:{ "_id" : "2019Q2", "sales" : 500 }
{ "_id" : "2019Q1", "sales" : 1950 } Second stage:第二阶段:The$mergestage writes the documents to thequarterlyreportcollection in the same database.$merge阶段将文档写入同一数据库中的quarterlyreport集合。If the stage finds an existing document in the collection that matches on the如果阶段在集合中找到与_idfield (the quarter), the stage merges the matching documents._id字段(季度)匹配的现有文档,则阶段将合并匹配的文档。Otherwise, the stage inserts the document.否则,阶段将插入文档。
To view the documents in the 要在合并数据后查看quarterlyreport collection after the data has been merged, run the following operation:quarterlyreport集合中的文档,请运行以下操作:
db.quarterlyreport.find().sort( { _id: 1 } )
The collection contains the following documents:该集合包含以下文档:
{ "_id" : "2019Q1", "sales" : 1950, "purchased" : 1200 }
{ "_id" : "2019Q2", "sales" : 500, "purchased" : 1700 }
Use the Pipeline to Customize the Merge使用管道自定义合并
The 当文档匹配时,$merge can use a custom update pipeline when documents match. The whenMatched pipeline can have the following stages:$merge可以使用自定义更新管道。whenMatched管道可以具有以下阶段:
-
$addFieldsand its alias及其别名$set -
$replaceRootand its alias及其别名$replaceWith
An example collection 一个示例集合votes is populated with the daily vote tally. votes是用每日计票结果填充的。Create the collection with the following documents:使用以下文档创建集合:
db.votes.insertMany( [
{ date: new Date("2019-05-01"), "thumbsup" : 1, "thumbsdown" : 1 },
{ date: new Date("2019-05-02"), "thumbsup" : 3, "thumbsdown" : 1 },
{ date: new Date("2019-05-03"), "thumbsup" : 1, "thumbsdown" : 1 },
{ date: new Date("2019-05-04"), "thumbsup" : 2, "thumbsdown" : 2 },
{ date: new Date("2019-05-05"), "thumbsup" : 6, "thumbsdown" : 10 },
{ date: new Date("2019-05-06"), "thumbsup" : 13, "thumbsdown" : 16 }
] )
Another example collection 另一个示例集合monthlytotals has the up-to-date monthly vote totals. monthlytotals具有最新的月度投票总数。Create the collection with the following document:使用以下文档创建集合:
db.monthlytotals.insertOne(
{ "_id" : "2019-05", "thumbsup" : 26, "thumbsdown" : 31 }
)
At the end of each day, that day's votes is inserted into the 每天结束时,当天的选票会被插入到votes collection:votes集合中:
db.votes.insertOne(
{ date: new Date("2019-05-07"), "thumbsup" : 14, "thumbsdown" : 10 }
)
You can use 您可以将$merge with an custom pipeline to update the existing document in the collection monthlytotals:$merge与自定义管道一起使用,在monthlytotals集合中更新现有文档:
db.votes.aggregate([
{ $match: { date: { $gte: new Date("2019-05-07"), $lt: new Date("2019-05-08") } } },
{ $project: { _id: { $dateToString: { format: "%Y-%m", date: "$date" } }, thumbsup: 1, thumbsdown: 1 } },
{ $merge: {
into: "monthlytotals",
on: "_id",
whenMatched: [
{ $addFields: {
thumbsup: { $add:[ "$thumbsup", "$$new.thumbsup" ] },
thumbsdown: { $add: [ "$thumbsdown", "$$new.thumbsdown" ] }
} } ],
whenNotMatched: "insert"
} }
])
First stage:第一阶段-
The$matchstage finds the specific day's votes.$match阶段查找特定日期的投票。For example:例如:{ "_id" : ObjectId("5ce6097c436eb7e1203064a6"), "date" : ISODate("2019-05-07T00:00:00Z"), "thumbsup" : 14, "thumbsdown" : 10 } Second stage:第二阶段-
The$projectstage sets the_idfield to a year-month string.$project阶段将_id字段设置为年-月字符串。For example:例如:{ "thumbsup" : 14, "thumbsdown" : 10, "_id" : "2019-05" } Third stage:第三阶段-
The$mergestage writes the documents to themonthlytotalscollection in the same database.$merge阶段将文档写入同一数据库中的monthlytotals集合。If the stage finds an existing document in the collection that matches on the如果阶段在集合中找到与_idfield, the stage uses a pipeline to add thethumbsupvotes and thethumbsdownvotes._id字段匹配的现有文档,则阶段将使用管道添加thumbsup投票和thumbsdown投票。-
This pipeline cannot directly accesses the fields from the results document.此管道无法直接访问结果文档中的字段。To access the为了访问结果文档中的thumbsupfield and thethumbsdownfield in the results document, the pipeline uses the$$newvariable; i.e.$$new.thumbsupand$new.thumbsdown.thumbsup字段和thumbsdown字段,管道使用$$new变量;即$$new.thumbsup和$new.thumbsdown。 -
This pipeline can directly accesses the该管道可以直接访问集合中现有文档中的thumbsupfield and thethumbsdownfield in the existing document in the collection; i.e.$thumbsupand$thumbsdown.thumbsup字段和thumbsdown字段;即$thumbsup和$thumbsdown。
The resulting document replaces the existing document.生成的文档将替换现有文档。 -
To view documents in the 若要在合并操作后查看monthlytotals collection after the merge operation, run the following operation:monthlytotals集合中的文档,请运行以下操作:
db.monthlytotals.find()
The collection contains the following document:该集合包含以下文档:
{ "_id" : "2019-05", "thumbsup" : 40, "thumbsdown" : 41 }
Use Variables to Customize the Merge使用变量自定义合并
You can use variables in the 您可以在$merge stage whenMatched field. $merge阶段的whenMatched字段中使用变量。Variables must be defined before they can be used.必须先定义变量,然后才能使用它们。
Define variables in one or both of the following:在以下一项或两项中定义变量:
To use variables in whenMatched:要在whenMatched中使用变量:
Specify the double dollar sign ($$) prefix together with the variable name in the form 指定双美元符号($$<variable_name>. $$)前缀以及形式为$$<variable_name>的变量名。For example, $$year. If the variable is set to a document, you can also include a document field in the form 如果变量设置为文档,则还可以包含形式为$$<variable_name>.<field>. $$<variable_name>.<field>的文档字段。For example, 例如,$$year.month.$$year.month。
The tabs below demonstrate behavior when variables are defined in the merge stage, the aggregate command, or both.下面的选项卡演示在合并阶段、聚合命令或两者中定义变量时的行为。
Use Variables Defined in the Merge Stage使用合并阶段中定义的变量
You can define variables in the 您可以在$merge stage let and use the variables in the whenMatched field.$merge阶段let中定义变量,并在whenMatched字段中使用变量。
Example:示例:
db.cakeSales.insertOne( [
{ _id: 1, flavor: "chocolate", salesTotal: 1580,
salesTrend: "up" }
] )
db.runCommand( {
aggregate: db.cakeSales.getName(),
pipeline: [ {
$merge: {
into: db.cakeSales.getName(),
let : { year: "2020" },
whenMatched: [ {
$addFields: { "salesYear": "$$year" }
} ]
}
} ],
cursor: {}
} )
db.cakeSales.find()
The example:该示例:
-
creates a collection named创建一个名为cakeSalescakeSales的集合 -
runs an运行一个aggregatecommand that defines ayearvariable in the$mergelet and adds the year tocakeSalesusing whenMatchedaggregate命令,该命令在$mergelet中定义一个year变量,并使用whenMatched将年份添加到cakeSales -
retrieves the检索cakeSalesdocumentcakeSales文档
Output:输出:
{ "_id" : 1, "flavor" : "chocolate", "salesTotal" : 1580,
"salesTrend" : "up", "salesYear" : "2020" }
Use Variables Defined in the Aggregate Command使用聚合命令中定义的变量
New in version 5.0.
You can define variables in the 您可以在aggregate command let and use the variables in the $merge stage whenMatched field.aggregate命令let中定义变量,并在$merge阶段whenMatched字段中使用变量。
Example:示例:
db.cakeSales.insertOne(
{ _id: 1, flavor: "chocolate", salesTotal: 1580,
salesTrend: "up" }
)
db.runCommand( {
aggregate: db.cakeSales.getName(),
pipeline: [ {
$merge: {
into: db.cakeSales.getName(),
whenMatched: [ {
$addFields: { "salesYear": "$$year" } }
] }
}
],
cursor: {},
let : { year: "2020" }
} )
db.cakeSales.find()
The example:该示例:
-
creates a collection named创建一个名为cakeSalescakeSales的集合 -
runs an运行一个aggregatecommand that defines ayearvariable in theaggregatecommand let and adds the year tocakeSalesusing whenMatchedaggregate命令,该命令在year命令let中定义一个year变量,并使用whenMatched将年份添加到cakeSales -
retrieves the检索cakeSalesdocumentcakeSales文档
Output:输出:
{ "_id" : 1, "flavor" : "chocolate", "salesTotal" : 1580,
"salesTrend" : "up", "salesYear" : "2020" }
Use Variables Defined in the Merge Stage and Aggregate Command使用合并阶段和聚合命令中定义的变量
You can define variables in the 您可以在$merge stage and, starting in MongoDB 5.0, the aggregate command.$merge阶段定义变量,并从MongoDB 5.0开始定义aggregate命令。
If two variables with the same name are defined in the 如果$merge stage and the aggregate command, the $merge stage variable is used.$merge 阶段和aggregate命令中定义了两个名称相同的变量,则使用$merge阶段变量。
In this example, the 在本例中,使用year: "2020" $merge stage variable is used instead of the year: "2019" aggregate command variable:$merge阶段变量year: "2020",而不是aggregate命令变量year: "2019":
db.cakeSales.insertOne(
{ _id: 1, flavor: "chocolate", salesTotal: 1580,
salesTrend: "up" }
)
db.runCommand( {
aggregate: db.cakeSales.getName(),
pipeline: [ {
$merge: {
into: db.cakeSales.getName(),
let : { year: "2020" },
whenMatched: [ {
$addFields: { "salesYear": "$$year" }
} ]
}
} ],
cursor: {},
let : { year: "2019" }
} )
db.cakeSales.find()
Output:输出:
{
_id: 1,
flavor: 'chocolate',
salesTotal: 1580,
salesTrend: 'up',
salesYear: '2020'
}