Database Manual / Reference / Query Language / Aggregation Stages

$densify (aggregation stage)(聚合阶段)

Definition定义

$densify

New in version 5.1.在版本5.1中新增。

Creates new documents in a sequence of documents where certain values in a field are missing.在字段中缺少某些值的文档序列中创建新文档。

You can use $densify to:您可以使用$density来:

  • Fill gaps in time series data.填补时间序列数据中的空白。
  • Add missing values between groups of data.在数据组之间添加缺失的值。
  • Populate your data with a specified range of values.用指定范围的值填充数据。

Syntax语法

The $densify stage has this syntax:$density阶段具有以下语法:

{
$densify: {
field: <fieldName>,
partitionByFields: [ <field 1>, <field 2> ... <field n> ],
range: {
step: <number>,
unit: <time unit>,
bounds: < "full" || "partition" > || [ < lower bound >, < upper bound > ]
}
}
}

The $densify stage takes a document with these fields:$density阶段需要一个包含以下字段的文档:

Field字段Necessity必要性Description描述
fieldRequired必需

The field to densify. The values of the specified field must either be all numeric values or all dates.要稠密化的字段。指定field的值必须是所有数值或所有日期。

Documents that do not contain the specified field continue through the pipeline unmodified.不包含指定field的文档将继续通过管道,而不会被修改。

To specify a <field> in an embedded document or in an array, use dot notation.要在嵌入式文档或数组中指定<field>,请使用点符号

For restrictions, see field Restrictions.有关限制,请参阅field的限制

partitionByFieldsOptional可选

The set of fields to act as the compound key to group the documents. In the $densify stage, each group of documents is known as a partition.用作对文档进行分组的复合键的字段集。在$density阶段,每组文档被称为一个分区。

If you omit this field, $densify uses one partition for the entire collection.如果省略此字段,$density将为整个集合使用一个分区。

For an example, see Densification with Partitions.例如,请参阅使用分区进行稠密化

For restrictions, see partitionByFields Restrictions.有关限制,请参阅partitionByFields限制。

rangeRequired必需An object that specifies how the data is densified.指定数据稠密化方式的对象。
range.boundsRequired必需

You can specify range.bounds as either:您可以将range.bounds指定为:

  • An array: 数组:[ < lower bound >, < upper bound > ],
  • A string: either "full" or "partition".一个字符串"full""partition"

If bounds is an array:如果bounds是一个数组:

  • $densify adds documents spanning the range of values within the specified bounds.$density添加了在指定边界内跨越值范围的文档。
  • The data type for the bounds must correspond to the data type in the field being densified.边界的数据类型必须与要稠密化的field中的数据类型相对应。
  • For behavior details, see range.bounds Behavior.有关行为的详细信息,请参阅range.bounds行为

If bounds is "full":如果bounds"full"

  • $densify adds documents spanning the full range of values of the field being densified.添加跨越被稠密化的field的全部值范围的文档。

If bounds is "partition":如果bounds"partition"

  • $densify adds documents to each partition, similar to if you had run a full range densification on each partition individually.将文档添加到每个分区,类似于在每个分区上单独运行full范围稠密化。
range.stepRequired必需

The amount to increment the field value in each document. $densify creates a new document for each step between the existing documents.每个文档中field值的增量。$density为现有文档之间的每个step创建一个新文档。

If range.unit is specified, step must be an integer. Otherwise, step can be any numeric value.如果指定了range.unit,则步长必须是整数。否则,step可以是任何数值。

range.unitRequired if field is a date.如果field是日期,则为必需项。

The unit to apply to the step field when incrementing date values in field.在字段中递增日期值时应用于step字段的单位。

You can specify one of the following values for unit as a string:您可以将以下unit值之一指定为字符串:

  • millisecond
  • second
  • minute
  • hour
  • day
  • week
  • month
  • quarter
  • year

For an example, see Densify Time Series Data.有关示例,请参阅稠密化时间序列数据

Behavior and Restrictions行为和限制

field Restrictions限制

For documents that contain the specified field, $densify errors if:对于包含指定field的文档,如果出现以下情况,则$densify会出错:

  • Any document in the collection has a field value of type date and the unit field is not specified.集合中的任何文档都具有日期类型的field值,并且未指定unit字段。
  • Any document in the collection has a field value of type numeric and the unit field is specified.集合中的任何文档都有一个数字类型的field值,并且指定了unit字段。
  • The field name begins with $. You must rename the field if you want to densify it. To rename fields, use $project.field名以$开头。如果要加密字段,必须重命名字段。要重命名字段,请使用$project
  • New in version 8.1.在版本8.1中新增。 field shares its prefix with any field in the partitionByFields array. For example, the following combinations of field and partitionByFields result in an error:fieldpartitionByFields数组中的任何字段共享其前缀。例如,以下fieldpartitionByFields的组合会导致错误:

    • field: "timestamp", partitionByFields: ["timestamp"]
    • field: "timestamp", partitionByFields: ["timestamp.hours"]
    • field: "timestamp.hours", partitionByFields: ["timestamp"]

partitionByFields Restrictions限制

$densify errors if any field name in the partitionByFields array:如果partitionByFields数组中有任何字段名,则$densify会出错:

  • Evaluates to a non-string value.计算结果为非字符串值。
  • Begins with $.$开头。

range.bounds Behavior行为

If range.bounds is an array:如果range.bounds是一个数组:

  • The lower bound indicates the start value for the added documents, irrespective of documents already in the collection.下限表示添加文档的起始值,与集合中已有的文档无关。
  • The lower bound is inclusive.下限是包容性的。
  • The upper bound is exclusive.上限是排他性的。
  • $densify does not filter out documents with field values outside of the specified bounds.不会筛选出field值超出指定范围的文档。

Note

Starting in MongoDB 8.0, $densify treats bounds with an equal lower and upper bound as an empty set and does not generate a document with the bound as the field value.从MongoDB 8.0开始,$densify将具有相等下限和上限的边界视为空集,并且不会生成以绑定为字段值的文档。

In prior versions, $densify treats bounds with an equal lower and upper bound as a closed interval and generates a document with the bound value as a field value if the collection does not already contain a document with the bound value.在早期版本中,如果集合中尚未包含具有绑定值的文档,则$densify将具有相等下限和上限的边界视为闭区间,并生成具有绑定值作为字段值的文档。

For example, a range.bounds of [10, 10] generates an extra document with field value 10 in versions prior to 8.0, but does not generate such a document in 8.0 and later.例如,在8.0之前的版本中,range.bounds[10,10]会生成一个字段值为10的额外文档,但在8.0及更高版本中不会生成这样的文档。

Order of Output输出顺序

$densify does not guarantee sort order of the documents it outputs.不保证它输出的文档的排序顺序。

To guarantee sort order, use $sort on the field you want to sort by.要保证排序顺序,请在要排序的字段上使用$sort

Examples示例

MongoDB Shell

Densify Time Series Data强化时间序列数据

Create a weather collection that contains temperature readings over four hour intervals.创建一个包含四小时内温度读数的weather集合。

db.weather.insertMany( [
{
"metadata": { "sensorId": 5578, "type": "temperature" },
"timestamp": ISODate("2021-05-18T00:00:00.000Z"),
"temp": 12
},
{
"metadata": { "sensorId": 5578, "type": "temperature" },
"timestamp": ISODate("2021-05-18T04:00:00.000Z"),
"temp": 11
},
{
"metadata": { "sensorId": 5578, "type": "temperature" },
"timestamp": ISODate("2021-05-18T08:00:00.000Z"),
"temp": 11
},
{
"metadata": { "sensorId": 5578, "type": "temperature" },
"timestamp": ISODate("2021-05-18T12:00:00.000Z"),
"temp": 12
}
] )

This example uses the $densify stage to fill in the gaps between the four-hour intervals to achieve hourly granularity for the data points:此示例使用$density阶段填充四小时间隔之间的间隙,以实现数据点的每小时粒度:

db.weather.aggregate( [
{
$densify: {
field: "timestamp",
range: {
step: 1,
unit: "hour",
bounds:[ ISODate("2021-05-18T00:00:00.000Z"), ISODate("2021-05-18T08:00:00.000Z") ]
}
}
}
] )

In the example:在示例中:

  • The $densify stage fills in the gaps of time in between the recorded temperatures.$densify阶段填补了记录温度之间的时间间隔。

    • field: "timestamp" densifies the timestamp field.稠密化timestamp字段。
  • range:

    • step: 1 increments the timestamp field by 1 unit.timestamp字段递增1个单位。
    • unit: hour densifies the timestamp field by the hour.按小时稠密化timestamp字段。
    • bounds: [ ISODate("2021-05-18T00:00:00.000Z"), ISODate("2021-05-18T08:00:00.000Z") ] sets the range of time that is densified.设置稠密化的时间范围。

In the following output, the $densify stage fills in the gaps of time between the hours of 00:00:00 and 08:00:00.在以下输出中,$density阶段填补了00:00:0008:00:00之间的时间间隔。

[
{
_id: ObjectId("618c207c63056cfad0ca4309"),
metadata: { sensorId: 5578, type: 'temperature' },
timestamp: ISODate("2021-05-18T00:00:00.000Z"),
temp: 12
},
{ timestamp: ISODate("2021-05-18T01:00:00.000Z") },
{ timestamp: ISODate("2021-05-18T02:00:00.000Z") },
{ timestamp: ISODate("2021-05-18T03:00:00.000Z") },
{
_id: ObjectId("618c207c63056cfad0ca430a"),
metadata: { sensorId: 5578, type: 'temperature' },
timestamp: ISODate("2021-05-18T04:00:00.000Z"),
temp: 11
},
{ timestamp: ISODate("2021-05-18T05:00:00.000Z") },
{ timestamp: ISODate("2021-05-18T06:00:00.000Z") },
{ timestamp: ISODate("2021-05-18T07:00:00.000Z") },
{
_id: ObjectId("618c207c63056cfad0ca430b"),
metadata: { sensorId: 5578, type: 'temperature' },
timestamp: ISODate("2021-05-18T08:00:00.000Z"),
temp: 11
}
{
_id: ObjectId("618c207c63056cfad0ca430c"),
metadata: { sensorId: 5578, type: 'temperature' },
timestamp: ISODate("2021-05-18T12:00:00.000Z"),
temp: 12
}
]

Densification with Partitions用隔板进行致密化

Create a coffee collection that contains data for two varieties of coffee beans:创建一个包含两种咖啡豆数据的coffee集合:

db.coffee.insertMany( [
{
"altitude": 600,
"variety": "Arabica Typica",
"score": 68.3
},
{
"altitude": 750,
"variety": "Arabica Typica",
"score": 69.5
},
{
"altitude": 950,
"variety": "Arabica Typica",
"score": 70.5
},
{
"altitude": 1250,
"variety": "Gesha",
"score": 88.15
},
{
"altitude": 1700,
"variety": "Gesha",
"score": 95.5,
"price": 1029
}
] )

Densify the Full Range of Values稠密化全部值范围

This example uses $densify to densify the altitude field for each coffee variety:此示例使用$density来稠密化每种咖啡variety(品种)的altitude(海拔高度)字段:

db.coffee.aggregate( [
{
$densify: {
field: "altitude",
partitionByFields: [ "variety" ],
range: {
bounds: "full",
step: 200
}
}
}
] )

The example aggregation:示例聚合:

  • Partitions the documents by variety to create one grouping for Arabica Typica and one for Gesha coffee.variety(品种)对文档进行分区,为Arabica Typica咖啡和Gesha咖啡创建一个分组。
  • Specifies a full range, meaning that the data is densified across the full range of existing documents for each partition.指定一个full范围,这意味着在每个分区的现有文档的整个范围内对数据进行加密。
  • Specifies a step of 200, meaning new documents are created at altitude intervals of 200.指定step(步长)为200,这意味着以200altitude(海拔高度)间隔创建新文档。

The aggregation outputs the following documents:聚合输出以下文档:

[
{
_id: ObjectId("618c031814fbe03334480475"),
altitude: 600,
variety: 'Arabica Typica',
score: 68.3
},
{
_id: ObjectId("618c031814fbe03334480476"),
altitude: 750,
variety: 'Arabica Typica',
score: 69.5
},
{ variety: 'Arabica Typica', altitude: 800 },
{
_id: ObjectId("618c031814fbe03334480477"),
altitude: 950,
variety: 'Arabica Typica',
score: 70.5
},
{ variety: 'Gesha', altitude: 600 },
{ variety: 'Gesha', altitude: 800 },
{ variety: 'Gesha', altitude: 1000 },
{ variety: 'Gesha', altitude: 1200 },
{
_id: ObjectId("618c031814fbe03334480478"),
altitude: 1250,
variety: 'Gesha',
score: 88.15
},
{ variety: 'Gesha', altitude: 1400 },
{ variety: 'Gesha', altitude: 1600 },
{
_id: ObjectId("618c031814fbe03334480479"),
altitude: 1700,
variety: 'Gesha',
score: 95.5,
price: 1029
},
{ variety: 'Arabica Typica', altitude: 1000 },
{ variety: 'Arabica Typica', altitude: 1200 },
{ variety: 'Arabica Typica', altitude: 1400 },
{ variety: 'Arabica Typica', altitude: 1600 }
]

This image visualizes the documents created with $densify:此图将使用$density创建的文档可视化:

State of the coffee collection after full-range densifiction
  • The darker squares represent the original documents in the collection.较深的方块代表集合中的原始文件。
  • The lighter squares represent the documents created with $densify.较浅的方块代表用$density创建的文档。

Densify Values within Each Partition每个分区内的密度值

This example uses $densify to only densify gaps in the altitude field within each variety:此示例使用$density仅稠密化每个variety(品种)内altitude(海拔高度)字段中的间隙:

db.coffee.aggregate( [
{
$densify: {
field: "altitude",
partitionByFields: [ "variety" ],
range: {
bounds: "partition",
step: 200
}
}
}
] )

The example aggregation:示例聚合:

  • Partitions the documents by variety to create one grouping for Arabica Typica and one for Gesha coffee.variety(品种对文档进行分区,为Arabica Typica咖啡和Gesha咖啡创建一个分组。
  • Specifies a partition range, meaning that the data is densified within each partition.指定partition范围,这意味着数据在每个分区内加密。

    • For the Arabica Typica partition, the range is 600-950.对于Arabica Typica分区,范围为600-950
    • For the Gesha partition, the range is 1250-1700.对于Gesha分区,范围为1250-1700
  • Specifies a step of 200, meaning new documents are created at altitude intervals of 200.指定step(步长)为200,这意味着以200altitude(海拔高度)间隔创建新文档。

The aggregation outputs the following documents:聚合输出以下文档:

[
{
_id: ObjectId("618c031814fbe03334480475"),
altitude: 600,
variety: 'Arabica Typica',
score: 68.3
},
{
_id: ObjectId("618c031814fbe03334480476"),
altitude: 750,
variety: 'Arabica Typica',
score: 69.5
},
{ variety: 'Arabica Typica', altitude: 800 },
{
_id: ObjectId("618c031814fbe03334480477"),
altitude: 950,
variety: 'Arabica Typica',
score: 70.5
},
{
_id: ObjectId("618c031814fbe03334480478"),
altitude: 1250,
variety: 'Gesha',
score: 88.15
},
{ variety: 'Gesha', altitude: 1450 },
{ variety: 'Gesha', altitude: 1650 },
{
_id: ObjectId("618c031814fbe03334480479"),
altitude: 1700,
variety: 'Gesha',
score: 95.5,
price: 1029
}
]

This image visualizes the documents created with $densify:此图将使用$density创建的文档可视化:

State of the coffee collection after partition range densification
  • The darker squares represent the original documents in the collection.较深的方块代表集合中的原始文件。
  • The lighter squares represent the documents created with $densify.较浅的方块代表用$density创建的文档。
C#

The C# examples on this page use the sample_weatherdata.data collection from the Atlas sample datasets. 本页上的C#示例使用Atlas示例数据集中的sample_weatherdata.data集合。To learn how to create a free MongoDB Atlas cluster and load the sample datasets, see Get Started in the MongoDB .NET/C# Driver documentation.要了解如何创建免费的MongoDB Atlas集群并加载示例数据集,请参阅MongoDB NET/C#驱动程序文档中的入门

The following Weather and Point classes model the documents in the sample_weatherdata.data collection:以下WeatherPoint类对sample_weatherdata.data集合中的文档进行建模:

public class Weather
{
public Guid Id { get; set; }

public Point Position { get; set; }

[BsonElement("ts")]
public DateTime Timestamp { get; set; }
}

public class Point
{
public float[] Coordinates { get; set; }
}

The sample_weatherdata.data collection contains the following documents, which contain measurements for the same position field, one hour apart:sample_weatherdata.data集合包含以下文档,其中包含相隔一小时的同一position字段的测量值:

Document{{ _id=5553a..., position=Document{{type=Point, coordinates=[-47.9, 47.6]}}, ts=Mon Mar 05 08:00:00 EST 1984, ... }}
Document{{ _id=5553b..., position=Document{{type=Point, coordinates=[-47.9, 47.6]}}, ts=Mon Mar 05 09:00:00 EST 1984, ... }}

To use the MongoDB .NET/C# driver to add a $densify stage to an aggregation pipeline, call the Densify() method on a PipelineDefinition object.要使用MongoDB NET/C#驱动程序向聚合管道添加$density阶段,请在PipelineDefinition对象上调用Densify()方法。

The following example creates a pipeline stage that adds a document at every 15-minute interval between the previous two documents. The code then groups these documents by the values of their Position.Coordinates field.以下示例创建了一个管道阶段,该阶段在前两个文档之间每隔15分钟添加一个文档。然后,代码将这些文档按其Position.Coordinates(位置坐标)字段的值进行分组。

var densifyTimeRange = new DensifyDateTimeRange(
new DensifyLowerUpperDateTimeBounds(
lowerBound: new DateTime(1984, 3, 5, 8, 0, 0),
upperBound: new DateTime(1984, 3, 5, 9, 0, 0)
),
step: 15,
unit: DensifyDateTimeUnit.Minutes
);

var pipeline = new EmptyPipelineDefinition<Weather>()
.Densify(
field: w => w.Timestamp,
range: densifyTimeRange,
partitionByFields: [w => w.Position.Coordinates]);

The previous aggregation stage generates the following highlighted documents in the collection:前一个聚合阶段在集合中生成以下突出显示的文档:

Document{{ _id=5553a..., position=Document{{type=Point, coordinates=[-47.9, 47.6]}}, ts=Mon Mar 05 08:00:00 EST 1984, ... }}
Document{{ position=Document{{coordinates=[-47.9, 47.6]}}, ts=Mon Mar 05 08:15:00 EST 1984 }}
Document{{ position=Document{{coordinates=[-47.9, 47.6]}}, ts=Mon Mar 05 08:30:00 EST 1984 }}
Document{{ position=Document{{coordinates=[-47.9, 47.6]}}, ts=Mon Mar 05 08:45:00 EST 1984 }}
Document{{ _id=5553b..., position=Document{{type=Point, coordinates=[-47.9, 47.6]}}, ts=Mon Mar 05 09:00:00 EST 1984, ... }}
Node.js

The Node.js examples on this page use the sample_weatherdata.data collection from the Atlas sample datasets. 此页面上的Node.js示例使用Atlas示例数据集中的sample_weatherdata.data集合。To learn how to create a free MongoDB Atlas cluster and load the sample datasets, see Get Started in the MongoDB Node.js driver documentation.要了解如何创建免费的MongoDB Atlas集群并加载示例数据集,请参阅MongoDB Node.js驱动程序文档中的入门

The sample_weatherdata.data collection contains the following documents, which contain measurements for the same position field, one hour apart:sample_weatherdata.data集合包含以下文档,其中包含相隔一小时的同一position(位置)字段的测量值:

{_id: new ObjectId(...), ts: 1984-03-05T13:00:00.000Z, position: {type: 'Point', coordinates: [-47.9, 47.6]}, ... },
{_id: new ObjectId(...), ts: 1984-03-05T14:00:00.000Z, position: {type: 'Point', coordinates: [-47.9, 47.6]}, ... }

To use the MongoDB Node.js driver to add a $densify stage to an aggregation pipeline, use the $densify operator in a pipeline object.要使用MongoDB Node.js驱动程序向聚合管道添加$density阶段,请在管道对象中使用$density运算符。

The following example creates a pipeline stage that adds a document at every 15-minute interval between the previous two documents. The code then groups these documents by the values of their position.coordinates field. The example then runs the aggregation pipeline:以下示例创建了一个管道阶段,该阶段在前两个文档之间每隔15分钟添加一个文档。然后,代码根据position.coordinates(位置坐标)字段的值对这些文档进行分组。然后,该示例运行聚合管道:

const pipeline = [
{
$densify: {
field: "ts",
partitionByFields: ["position.coordinates"],
range: {
step: 15,
unit: "minute",
bounds: [new Date(1984, 3, 5, 8, 0, 0), new Date(1984, 3, 5, 9, 0, 0)]
}
}
}
];

const cursor = collection.aggregate(pipeline);
return cursor;

The previous aggregation stage generates the following highlighted documents in the collection:前一个聚合阶段在集合中生成以下突出显示的文档:

{ _id: new ObjectId(...), ts: 1984-03-05T13:00:00.000Z, position: {type: 'Point', coordinates: [-47.9, 47.6]}, ... },
{ position: { coordinates: [-47.9, 47.6] }, ts: 1984-03-05T13:15:00.000Z },
{ position: { coordinates: [-47.9, 47.6] }, ts: 1984-03-05T13:30:00.000Z },
{ position: { coordinates: [-47.9, 47.6] }, ts: 1984-03-05T13:45:00.000Z },
{ _id: new ObjectId(...), ts: 1984-03-05T14:00:00.000Z, position: {type: 'Point', coordinates: [-47.9, 47.6]}, ... }