$fill (aggregation)
On this page本页内容
Definition定义Syntax语法Behavior and Restrictions行为和限制partitionByFieldsRestrictions限制linearBehavior行为locfBehavior行为Comparison of$filland Aggregation Operators$fill和聚合运算符的比较Examples实例Fill Missing Field Values with a Constant Value用常数值填充缺少的字段值Fill Missing Field Values with Linear Interpolation使用线性插值填充缺失字段值Fill Missing Field Values Based on the Last Observed Value根据上次观测值填充缺失的字段值Fill Data for Distinct Partitions填充不同分区的数据Indicate if a Field was Populated Using指示字段是否使用$fill$fill填充
Definition定义
$fill-
New in version 5.3.5.3版新增。Populates填充文档中的空字段值和缺少的字段值。nulland missing field values within documents.You can use您可以使用$fillto populate missing data points:$fill来填充缺失的数据点:In a sequence based on surrounding values.在基于周围值的序列中。With a fixed value.具有固定值。
Syntax语法
The $fill stage has this syntax:$fill阶段具有以下语法:
{
$fill: {
partitionBy: <expression>,
partitionByFields: [ <field 1>, <field 2>, ... , <field n> ],
sortBy: {
<sort field 1>: <sort order>,
<sort field 2>: <sort order>,
...,
<sort field n>: <sort order>
},
output: {
<field 1>: { value: <expression> },
<field 2>: { method: <string> },
...
}
}
}
The $fill stage takes a document with these fields:$fill阶段接受具有以下字段的文档:
partitionBy | $fill stage, a group of documents is known as a partition.$fill阶段,一组文档被称为分区。$fill uses one partition for the entire collection.partitionBy和partitionByFields,$fill将为整个集合使用一个分区。partitionBy和partitionByFields,$fill是互斥的。 | |
partitionByFields | $fill stage, each group of documents is known as a partition.$fill阶段,每组文档被称为一个分区。$fill uses one partition for the entire collection.partitionBy和partitionByFields,$fill将为整个集合使用一个分区。partitionBy和partitionByFields是互斥的。partitionByFields Restrictions. partitionByFields限制。 | |
sortBy | output.<field>中指定了method,则为必要的。 | $sort stage. $sort阶段相同的语法。 |
output | ||
output.<field> | value or method. value或method。
|
Behavior and Restrictions行为和限制
partitionByFields Restrictions限制
如果$fill returns an error if any field name in the partitionByFields array:partitionByFields数组中有任何字段名存在以下情况,则$fill返回错误:
Evaluates to a non-string value.计算为非字符串值。Begins with以$.$开头。
linear Behavior行为
The linear fill method fills null and missing fields using linear interpolation based on the surrounding non-
null values in the sequence.linear填充方法使用基于序列中周围非null值的线性插值来填充空字段和缺失字段。
For each document where the field is对于字段为nullor missing,linearFillfills those fields in proportion to the missing value range between surrounding non-nullvalues according to the sortBy order.null或缺失的每个文档,linearFill根据sortBy顺序,按照周围非null值之间的缺失值范围的比例填充这些字段。To determine the values for missing fields,要确定缺失字段的值,linearFilluses:linearFill使用:The difference of surrounding non-周围非nullvalues.null值的差异。The number of要在周围值之间填充的空字段数。nullfields to fill between the surrounding values.
Thelinearmethod can fill multiple consecutivenullvalues if those values are preceded and followed by non-nullvalues according to the sortBy order.linear方法可以填充多个连续的null值,如果这些值前面和后面都是根据sortBy顺序的非null值。ExampleIf a collection contains these documents:如果集合包含以下文档:{ index: 0, value: 0 },
{ index: 1, value: null },
{ index: 2, value: null },
{ index: 3, value: null },
{ index: 4, value: 10 }After using the使用linearfill method to fill thenullvalues, the documents become:linear填充方法填充null值后,文档变为:{ index: 0, value: 0 },
{ index: 1, value: 2.5 },
{ index: 2, value: 5 },
{ index: 3, value: 7.5 },
{ index: 4, value: 10 }前面和后面没有非nullvalues that are not preceded and followed by non-nullvalues remainnull.null值的null值保持为null。To use the要使用linearfill method, you must also use the sortBy field to sort your data.linear填充方法,还必须使用sortBy字段对数据进行排序。
For a complete example using the 有关使用linear fill method, see Fill Missing Field Values with Linear Interpolation.linear填充方法的完整示例,请参阅使用线性插值填充缺少的字段值。
locf Behavior行为
locf stands for last observation carried forward.代表最后一次观测。
If a field being filled contains both如果要填充的字段同时包含nulland non-null values,locfsets thenulland missing values to the field's last known non-null value according to the sortBy order.null值和非null值,则locf会根据sortBy顺序将null值和缺失值设置为字段最后一个已知的非null值。To use the要使用locffill method, you must also use the sortBy field to sort your data.locf填充方法,还必须使用sortBy字段对数据进行排序。
For a complete example using the 有关使用locf fill method, see Fill Missing Field Values Based on the Last Observed Value.locf填充方法的完整示例,请参阅根据上次观测值填充缺少的字段值。
Comparison of $fill and Aggregation Operators$fill和聚合运算符的比较
$fill and Aggregation OperatorsTo fill 要填充文档中的空字段值和缺失字段值,可以使用:null and missing field values within a document you can use:
The$fillstage.$fill阶段。When you use the使用$fillstage, the field you specify in the output is the same field used as the source data.$fill阶段时,在输出中指定的字段与源数据使用的字段相同。The$linearFilland$locfaggregation operators.$linearFill和$locf聚合运算符。When you使用$linearFillor$locf, you can set values for a different field than the field used as the source data.$linearFill或$locf时,可以为不同于用作源数据的字段的字段设置值。
Examples实例
The examples in this section show how to use 本节中的示例显示了如何使用$fill to fill missing values:$fill来填充缺失的值:
With a constant value具有常量值With linear interpolation使用线性插值Based on the last observed value基于上次观测值For distinct partitions对于不同的分区
Fill Missing Field Values with a Constant Value用常数值填充缺少的字段值
A shoe store maintains a 鞋店有一个dailySales collection that contains a document summarizing each day's sales. dailySales集合,其中包含一份总结每天销售的文档。The shoe store sells these types of shoes:鞋店出售这些类型的鞋:
bootssandalssneakers
Create the following 创建以下dailySales collection:dailySales集合:
db.dailySales.insertMany( [
{
"date": ISODate("2022-02-02"),
"bootsSold": 10,
"sandalsSold": 20,
"sneakersSold": 12
},
{
"date": ISODate("2022-02-03"),
"bootsSold": 7,
"sneakersSold": 18
},
{
"date": ISODate("2022-02-04"),
"sneakersSold": 5
}
] )
Not all of the documents in the 并非dailySales collection contain each shoe type. dailySales集合中的所有文档都包含每种鞋型。If a shoe type is missing, it means there were no shoes of that type sold on the corresponding date.如果缺少一种鞋型,则意味着在相应日期没有销售该类型的鞋。
The following example uses 以下示例使用$fill to set the quantities sold to 0 for the missing shoe types for each day's sales:$fill将每天销售中缺少的鞋款的销售数量设置为0:
db.dailySales.aggregate( [
{
$fill:
{
output:
{
"bootsSold": { value: 0 },
"sandalsSold": { value: 0 },
"sneakersSold": { value: 0 }
}
}
}
] )
In the preceding pipeline:在前面的管道中:
$fillfills in values for missing fields.填写缺失字段的值。outputspecifies:指定:The names of the fields to fill in.要填写的字段的名称。The value to set the filled in fields to. In this example, the output specifies a constant value of将填充字段设置为的值。在本例中,输出指定一个常量值0.0。
Example output:示例输出:
[
{
_id: ObjectId("6202df9f394d47411658b51e"),
date: ISODate("2022-02-02T00:00:00.000Z"),
bootsSold: 10,
sandalsSold: 20,
sneakersSold: 12
},
{
_id: ObjectId("6202df9f394d47411658b51f"),
date: ISODate("2022-02-03T00:00:00.000Z"),
bootsSold: 7,
sneakersSold: 18,
sandalsSold: 0
},
{
_id: ObjectId("6202df9f394d47411658b520"),
date: ISODate("2022-02-04T00:00:00.000Z"),
sneakersSold: 5,
bootsSold: 0,
sandalsSold: 0
}
]
Fill Missing Field Values with Linear Interpolation使用线性插值填充缺失字段值
Create a 创建一个stock collection that contains tracks a single company's stock price at hourly intervals:stock集合,其中包含按小时间隔跟踪单个公司股价的信息:
db.stock.insertMany( [
{
time: ISODate("2021-03-08T09:00:00.000Z"),
price: 500
},
{
time: ISODate("2021-03-08T10:00:00.000Z"),
},
{
time: ISODate("2021-03-08T11:00:00.000Z"),
price: 515
},
{
time: ISODate("2021-03-08T12:00:00.000Z")
},
{
time: ISODate("2021-03-08T13:00:00.000Z")
},
{
time: ISODate("2021-03-08T14:00:00.000Z"),
price: 485
}
] )
The 集合中的某些文档缺少price field is missing for some of the documents in the collection.price字段。
To populate the missing 要使用线性插值price values using linear interpolation, use
$fill with the linear fill method:填充缺失的
price值,请使用$fill和linear填充方法:
db.stock.aggregate( [
{
$fill:
{
sortBy: { time: 1 },
output:
{
"price": { method: "linear" }
}
}
}
] )
In the preceding pipeline:在前面的管道中:
$fillfills in values for missing fields.填写缺失字段的值。sortBy: { time: 1 }sorts the documents by the按照timefield in ascending order, from earliest to latest.time字段从最早到最晚的升序对文档进行排序。outputspecifies:指定:priceas the field for which to fill in missing values.作为要填充缺失值的字段。{ method: "linear" }as the fill method.作为填充方法。Thelinearfill method fills missingpricevalues using linear interpolationbased on the surrounding
pricevalues in the sequence.linear填充方法使用基于序列中周围price值的线性插值来填充缺失的
price值。
Example output:示例输出:
[
{
_id: ObjectId("620ad41c394d47411658b5e9"),
time: ISODate("2021-03-08T09:00:00.000Z"),
price: 500
},
{
_id: ObjectId("620ad41c394d47411658b5ea"),
time: ISODate("2021-03-08T10:00:00.000Z"),
price: 507.5
},
{
_id: ObjectId("620ad41c394d47411658b5eb"),
time: ISODate("2021-03-08T11:00:00.000Z"),
price: 515
},
{
_id: ObjectId("620ad41c394d47411658b5ec"),
time: ISODate("2021-03-08T12:00:00.000Z"),
price: 505
},
{
_id: ObjectId("620ad41c394d47411658b5ed"),
time: ISODate("2021-03-08T13:00:00.000Z"),
price: 495
},
{
_id: ObjectId("620ad41c394d47411658b5ee"),
time: ISODate("2021-03-08T14:00:00.000Z"),
price: 485
}
]
Fill Missing Field Values Based on the Last Observed Value根据上次观测值填充缺失的字段值
Create a 创建一个restaurantReviews collection that contains review scores for a single restaurant over time:restaurantReviews集合,该集合包含单个餐厅在一段时间内的评分:
db.restaurantReviews.insertMany( [
{
date: ISODate("2021-03-08"),
score: 90
},
{
date: ISODate("2021-03-09"),
score: 92
},
{
date: ISODate("2021-03-10")
},
{
date: ISODate("2021-03-11")
},
{
date: ISODate("2021-03-12"),
score: 85
},
{
date: ISODate("2021-03-13")
}
] )
The 集合中的某些文档缺少score field is missing for some of the documents in the collection.score字段。
To populate the missing 要填充缺失的score fields and ensure that there are no gaps in the data, use $fill. score字段并确保数据中没有空白,请使用$fill。In the following example, 在以下示例中,$fill uses the locf fill method to fill the missing score values with the previous score in the sequence:$fill使用locf填充方法用序列中的前一个score填充缺失的分值:
db.restaurantReviews.aggregate( [
{
$fill:
{
sortBy: { date: 1 },
output:
{
"score": { method: "locf" }
}
}
}
] )
In the preceding pipeline:在前面的管道中:
$fillfills in missing填写缺失的scorevalues.score值。sortBy: { date: 1 }sorts the documents by the按照datefield in ascending order, from earliest to latest.date字段从最早到最晚的升序对文档进行排序。outputspecifies:指定:scoreas the field for which to fill in missing values.作为要填充缺失值的字段。{ method: "locf" }as the fill method.作为填充方法。Thelocffill method fills missingscorevalues with the last observedscorein the sequence.locf填充方法使用序列中最后一次观察到的score来填充缺失的分数值。
Example output:示例输出:
[
{
_id: ObjectId("62040bc9394d47411658b553"),
date: ISODate("2021-03-08T00:00:00.000Z"),
score: 90
},
{
_id: ObjectId("62040bc9394d47411658b554"),
date: ISODate("2021-03-09T00:00:00.000Z"),
score: 92
},
{
_id: ObjectId("62040bc9394d47411658b555"),
date: ISODate("2021-03-10T00:00:00.000Z"),
score: 92
},
{
_id: ObjectId("62040bc9394d47411658b556"),
date: ISODate("2021-03-11T00:00:00.000Z"),
score: 92
},
{
_id: ObjectId("62040bc9394d47411658b557"),
date: ISODate("2021-03-12T00:00:00.000Z"),
score: 85
},
{
_id: ObjectId("62040bc9394d47411658b558"),
date: ISODate("2021-03-13T00:00:00.000Z"),
score: 85
}
]
Fill Data for Distinct Partitions填充不同分区的数据
Consider the previous example with restaurant reviews but instead of tracking a single restaurant, the collection now contains reviews for multiple restaurants.考虑一下前面的餐厅评论示例,但该集合现在包含多家餐厅的评论,而不是跟踪单个餐厅。
Create a collection named 创建一个名为restaurantReviewsMultiple and populate the collection with these documents:restaurantReviewsMultiple的集合使用以下文档对集合进行填充:
db.restaurantReviewsMultiple.insertMany( [
{
date: ISODate("2021-03-08"),
restaurant: "Joe's Pizza",
score: 90
},
{
date: ISODate("2021-03-08"),
restaurant: "Sally's Deli",
score: 75
},
{
date: ISODate("2021-03-09"),
restaurant: "Joe's Pizza",
score: 92
},
{
date: ISODate("2021-03-09"),
restaurant: "Sally's Deli"
},
{
date: ISODate("2021-03-10"),
restaurant: "Joe's Pizza"
},
{
date: ISODate("2021-03-10"),
restaurant: "Sally's Deli",
score: 68
},
{
date: ISODate("2021-03-11"),
restaurant: "Joe's Pizza",
score: 93
},
{
date: ISODate("2021-03-11"),
restaurant: "Sally's Deli"
}
] )
The 集合中的某些文档缺少score field is missing for some of the documents in the collection.score字段。
To populate the missing 要填充缺失的score fields and ensure that there are no gaps in the data, use $fill. score字段并确保数据中没有空白,请使用$fill。In the following example, 在以下示例中,$fill uses the locf fill method to fill the missing score values with the previous score in the sequence:$fill使用locf填充方法用序列中的前一个score填充缺失的score值:
db.restaurantReviewsMultiple.aggregate( [
{
$fill:
{
sortBy: { date: 1 },
partitionBy: { "restaurant": "$restaurant" },
output:
{
"score": { method: "locf" }
}
}
}
] )
In the preceding pipeline:在前面的管道中:
$fillfills in missing填写缺失的scorevalues.score值。sortBy: { date: 1 }sorts the documents by the按照datefield in ascending order, from earliest to latest.date字段从最早到最晚的升序对文档进行排序。partitionBy: { "restaurant": "$restaurant" }partitions the data by按restaurant. There are two restaurants:Joe's PizzaandSally's Deli.restaurant划分数据。有两家餐馆:Joe's Pizza和Sally's Deli。outputspecifies:指定:scoreas the field for which to fill in missing values.作为要填充缺失值的字段。{ method: "locf" }as the fill method.作为填充方法。Thelocffill method fills missingscorevalues with the last observedscorein the sequence.locf填充方法使用序列中最后一次观察到的score来填充缺失的分数值。
Example output:示例输出:
[
{
_id: ObjectId("620559f4394d47411658b58f"),
date: ISODate("2021-03-08T00:00:00.000Z"),
restaurant: "Joe's Pizza",
score: 90
},
{
_id: ObjectId("620559f4394d47411658b591"),
date: ISODate("2021-03-09T00:00:00.000Z"),
restaurant: "Joe's Pizza",
score: 92
},
{
_id: ObjectId("620559f4394d47411658b593"),
date: ISODate("2021-03-10T00:00:00.000Z"),
restaurant: "Joe's Pizza",
score: 92
},
{
_id: ObjectId("620559f4394d47411658b595"),
date: ISODate("2021-03-11T00:00:00.000Z"),
restaurant: "Joe's Pizza",
score: 93
},
{
_id: ObjectId("620559f4394d47411658b590"),
date: ISODate("2021-03-08T00:00:00.000Z"),
restaurant: "Sally's Deli",
score: 75
},
{
_id: ObjectId("620559f4394d47411658b592"),
date: ISODate("2021-03-09T00:00:00.000Z"),
restaurant: "Sally's Deli",
score: 75
},
{
_id: ObjectId("620559f4394d47411658b594"),
date: ISODate("2021-03-10T00:00:00.000Z"),
restaurant: "Sally's Deli",
score: 68
},
{
_id: ObjectId("620559f4394d47411658b596"),
date: ISODate("2021-03-11T00:00:00.000Z"),
restaurant: "Sally's Deli",
score: 68
}
]
Indicate if a Field was Populated Using $fill指示字段是否使用$fill填充
$fillWhen you populate missing values, the output does not indicate if a value was populated with the 填充缺少的值时,输出不会指示值是用$fill operator or if the value existed in the document originally. $fill运算符填充的,还是该值最初存在于文档中。To distinguish between filled and preexisting values, you can use a 要区分填充值和预先存在的值,可以在$set stage before $fill and set a new field based on whether the value exists.$fill之前使用$set阶段,并根据值是否存在设置新字段。
For example, create a 例如,创建一个restaurantReviews collection that contains review scores for a restaurant over time:restaurantReviews集合,该集合包含一段时间内某家餐厅的评分:
db.restaurantReviews.insertMany( [
{
date: ISODate("2021-03-08"),
score: 90
},
{
date: ISODate("2021-03-09"),
score: 92
},
{
date: ISODate("2021-03-10")
},
{
date: ISODate("2021-03-11")
},
{
date: ISODate("2021-03-12"),
score: 85
},
{
date: ISODate("2021-03-13")
}
] )
The 集合中的某些文档缺少score field is missing for some of the documents in the collection. score字段。You can populate missing 您可以使用score values with the $fill operator.$fill运算符填充缺失的score值。
Create a pipeline to perform the following actions:创建一个管道以执行以下操作:
Add a new field to each document (using向每个文档添加一个新字段(使用$set) indicating if the document'sscorefield exists prior to the$filloperator populating values.$set),指示在$fill运算符填充值之前文档的score字段是否存在。This new field is called这个新字段称为valueExisted.valueExisted。Populate missing用序列中最后一次观察到的scorevalues with the last observedscorein the sequence. The fill methodlocfstands for "last observation carried forward".score填充缺失的score值。填充法locf代表“最后一次观测结转”。
The pipeline looks like this:管道如下所示:
db.restaurantReviews.aggregate( [
{
$set: {
"valueExisted": {
"$ifNull": [
{ "$toBool": { "$toString": "$score" } },
false
]
}
}
},
{
$fill: {
sortBy: { date: 1 },
output:
{
"score": { method: "locf" }
}
}
}
] )
Handling Values of Zero处理零值
In the 在$ifNull expression, the score values are converted to strings, then to booleans. $ifNull表达式中,score值会转换为字符串,然后转换为布尔值。The $toBool expression always converts strings to true. $toBool表达式总是将字符串转换为true。If the 如果未将score values are not converted to strings, score values of 0 will have valueExisted set to false.score值转换为字符串,则0的score值的valueExisted将设置为false。
Output:输出:
[
{
_id: ObjectId("63595116b1fac2ee2e957f15"),
date: ISODate("2021-03-08T00:00:00.000Z"),
score: 90,
valueExisted: true
},
{
_id: ObjectId("63595116b1fac2ee2e957f16"),
date: ISODate("2021-03-09T00:00:00.000Z"),
score: 92,
valueExisted: true
},
{
_id: ObjectId("63595116b1fac2ee2e957f17"),
date: ISODate("2021-03-10T00:00:00.000Z"),
valueExisted: false,
score: 92
},
{
_id: ObjectId("63595116b1fac2ee2e957f18"),
date: ISODate("2021-03-11T00:00:00.000Z"),
valueExisted: false,
score: 92
},
{
_id: ObjectId("63595116b1fac2ee2e957f19"),
date: ISODate("2021-03-12T00:00:00.000Z"),
score: 85,
valueExisted: true
},
{
_id: ObjectId("63595116b1fac2ee2e957f1a"),
date: ISODate("2021-03-13T00:00:00.000Z"),
valueExisted: false,
score: 85
}
]