$densify (aggregation)

On this page本页内容

Definition定义

$densify

New in version 5.1.在版本5.1中新增

Creates new documents in a sequence of documents where certain values in a field are missing.在缺少字段中某些值的文档序列中创建新文档。

You can use $densify to:您可以使用$densify来:

  • Fill gaps in time series data.填补时间序列数据中的空白。
  • Add missing values between groups of data.在数据组之间添加缺少的值。
  • Populate your data with a specified range of values.使用指定的值范围填充数据。

Syntax语法

The $densify stage has this syntax:$densify阶段具有以下语法:

{
   $densify: {
      field: <fieldName>,
      partitionByFields: [ <field 1>, <field 2> ... <field n> ],
      range: {
         step: <number>,
         unit: <time unit>,
         bounds: < "full" || "partition" > || [ < lower bound >, < upper bound > ]
      }
   }
}

The $densify stage takes a document with these fields:$densify阶段获取包含以下字段的文档:

Field字段NecessityDescription描述
fieldRequired必需

The field to densify. 要加密的字段。The values of the specified field must either be all numeric values or all dates.指定field的值必须是所有数值或所有日期。

Documents that do not contain the specified field continue through the pipeline unmodified.不包含指定field的文档未经修改地继续通过管道。

To specify a <field> in an embedded document or in an array, use dot notation.要在嵌入文档或数组中指定<field>,请使用点表示法

For restrictions, see field Restrictions.有关限制,请参阅field限制

partitionByFieldsOptional可选

The set of fields to act as the compound key to group the documents. 用作对文档进行分组的复合键的字段集。In the $densify stage, each group of documents is known as a partition.$densify阶段,每组文档称为分区

If you omit this field, $densify uses one partition for the entire collection.如果省略此字段,$densify对整个集合使用一个分区。

For an example, see Densifiction with Partitions.例如,请参阅带分区的密度小说

For restrictions, see partitionByFields Restrictions.有关限制,请参阅partitionByFields限制

rangeRequired必需

An object that specifies how the data is densified.指定数据加密方式的对象。

range.boundsRequired必需

You can specify range.bounds as either:您可以将range.bounds指定为:

  • An array: 数组:[ < lower bound >, < upper bound > ],
  • A string: either "full" or "partition".一个字符串:"full""partition"

If bounds is an array:如果bounds是一个数组:

  • $densify adds documents spanning the range of values within the specified bounds.添加跨越指定范围内的值范围的文档。
  • The data type for the bounds must correspond to the data type in the field being densified.边界的数据类型必须与加密字段中的数据类型相对应。
  • For behavior details, see range.bounds Behavior.有关行为详细信息,请参阅range.bounds行为

If bounds is "full":如果bounds"full"

  • $densify adds documents spanning the full range of values of the field being densified.添加跨越要加密的field的整个值范围的文档。

If bounds is "partition":如果bounds"partition"

  • $densify adds documents to each partition, similar to if you had run a full range densification on each partition individually.将文档添加到每个分区,类似于在每个分区上单独运行full范围加密。
range.stepRequired必需

The amount to increment the field value in each document. 每个文档中字段值的增量。$densify creates a new document for each step between the existing documents.为现有文档之间的每个步骤创建一个新文档。

If range.unit is specified, step must be an integer. 如果指定了range.unit,则step必须是整数。Otherwise, step can be any numeric value.否则,step可以是任何数值。

range.unitRequired if field is a date.如果字段是日期,则为必填项。

The unit to apply to the step field when incrementing date values in field.递增字段中的日期值时应用于步骤字段的单位。

You can specify one of the following values for unit as a string:您可以将unit的以下值之一指定为字符串:

  • millisecond
  • second
  • minute
  • hour
  • week
  • month
  • quarter
  • year

For an example, see Densify Time Series Data.有关示例,请参阅加密时间序列数据

Behavior and Restrictions行为和限制

field Restrictions限制

For documents that contain the specified field, $densify errors if:对于包含指定字段的文档,如果出现以下情况,$densify将出错:

  • Any document in the collection has a field value of type date and the unit field is not specified.集合中的任何文档都具有日期类型的field值,并且未指定单位字段。
  • Any document in the collection has a field value of type numeric and the unit field is specified.集合中的任何文档都具有数值类型的field值,并且指定了单位字段。
  • The field name begins with $. field名以$开头。You must rename the field if you want to densify it. 如果要加密字段,则必须重命名该字段。To rename fields, use $project.要重命名字段,请使用$project

partitionByFields Restrictions限制

$densify errors if any field name in the partitionByFields array:如果partitionByFields数组中有任何字段名出现以下情况,$densify将出错:

  • Evaluates to a non-string value.计算为非字符串值。
  • Begins with $.$开始。

range.bounds Behavior行为

If range.bounds is an array:如果range.bounds是一个数组:

  • The lower bound indicates the start value for the added documents, irrespective of documents already in the collection.下限表示添加文档的起始值,与集合中已存在的文档无关。
  • The lower bound is inclusive.下限包括在内。
  • The upper bound is exclusive.上界是排他性的。
  • $densify does not filter out documents with field values outside of the specified bounds.不筛选字段值超出指定边界的文档。

Order of Output输出顺序

$densify does not guarantee sort order of the documents it outputs.不保证输出文档的排序顺序。

To guarantee sort order, use $sort on the field you want to sort by.要保证排序顺序,请在要排序的字段上使用$sort

Examples示例

Densify Time Series Data加密时间序列数据

Create a weather collection that contains temperature readings over four hour intervals.创建一个weather集合,包含四小时间隔的温度读数。

db.weather.insertMany( [
   {
       "metadata": { "sensorId": 5578, "type": "temperature" },
       "timestamp": ISODate("2021-05-18T00:00:00.000Z"),
       "temp": 12
   },
   {
       "metadata": { "sensorId": 5578, "type": "temperature" },
       "timestamp": ISODate("2021-05-18T04:00:00.000Z"),
       "temp": 11
   },
   {
       "metadata": { "sensorId": 5578, "type": "temperature" },
       "timestamp": ISODate("2021-05-18T08:00:00.000Z"),
       "temp": 11
   },
   {
       "metadata": { "sensorId": 5578, "type": "temperature" },
       "timestamp": ISODate("2021-05-18T12:00:00.000Z"),
       "temp": 12
   }
] )

This example uses the $densify stage to fill in the gaps between the four-hour intervals to achieve hourly granularity for the data points:本示例使用$densify阶段填充四个小时间隔之间的间隙,以实现数据点的每小时粒度:

db.weather.aggregate( [
   {
      $densify: {
         field: "timestamp",
         range: {
            step: 1,
            unit: "hour",
            bounds:[ ISODate("2021-05-18T00:00:00.000Z"), ISODate("2021-05-18T08:00:00.000Z") ]
         }
      }
   }
] )

In the example:在该示例中:

  • The $densify stage fills in the gaps of time in between the recorded temperatures.$density阶段填补了记录温度之间的时间间隙。

    • field: "timestamp" densifies the timestamp field.加密timestamp字段。
    • range:

      • step: 1 increments the timestamp field by 1 unit.timestamp字段递增1个单位。
      • unit: hour densifies the timestamp field by the hour.按小时加密timestamp字段。
      • bounds: [ ISODate("2021-05-18T00:00:00.000Z"), ISODate("2021-05-18T08:00:00.000Z") ] sets the range of time that is densified.设置加密的时间范围。

In the following output, the $densify stage fills in the gaps of time between the hours of 00:00:00 and 08:00:00.在以下输出中,$densify阶段将填充00:00:0008:00:00之间的时间间隔。

[
  {
    _id: ObjectId("618c207c63056cfad0ca4309"),
    metadata: { sensorId: 5578, type: 'temperature' },
    timestamp: ISODate("2021-05-18T00:00:00.000Z"),
    temp: 12
  },
  { timestamp: ISODate("2021-05-18T01:00:00.000Z") },
  { timestamp: ISODate("2021-05-18T02:00:00.000Z") },
  { timestamp: ISODate("2021-05-18T03:00:00.000Z") },
  {
    _id: ObjectId("618c207c63056cfad0ca430a"),
    metadata: { sensorId: 5578, type: 'temperature' },
    timestamp: ISODate("2021-05-18T04:00:00.000Z"),
    temp: 11
  },
  { timestamp: ISODate("2021-05-18T05:00:00.000Z") },
  { timestamp: ISODate("2021-05-18T06:00:00.000Z") },
  { timestamp: ISODate("2021-05-18T07:00:00.000Z") },
  {
    _id: ObjectId("618c207c63056cfad0ca430b"),
    metadata: { sensorId: 5578, type: 'temperature' },
    timestamp: ISODate("2021-05-18T08:00:00.000Z"),
    temp: 11
  }
  {
    _id: ObjectId("618c207c63056cfad0ca430c"),
    metadata: { sensorId: 5578, type: 'temperature' },
    timestamp: ISODate("2021-05-18T12:00:00.000Z"),
    temp: 12
  }
]

Densifiction with Partitions带隔板的密度小说

Create a coffee collection that contains data for two varieties of coffee beans:创建包含两种咖啡豆数据的coffee集合:

db.coffee.insertMany( [
   {
      "altitude": 600,
      "variety": "Arabica Typica",
      "score": 68.3
   },
   {
      "altitude": 750,
      "variety": "Arabica Typica",
      "score": 69.5
   },
   {
      "altitude": 950,
      "variety": "Arabica Typica",
      "score": 70.5
   },
   {
      "altitude": 1250,
      "variety": "Gesha",
      "score": 88.15
   },
   {
     "altitude": 1700,
     "variety": "Gesha",
     "score": 95.5,
     "price": 1029
   }
] )

Densify the Full Range of Values加密整个值范围

This example uses $densify to densify the altitude field for each coffee variety:本示例使用$densify为每个咖啡variety(品种)加密altitude(海拔)字段:

db.coffee.aggregate( [
   {
      $densify: {
         field: "altitude",
         partitionByFields: [ "variety" ],
         range: {
            bounds: "full",
            step: 200
         }
      }
   }
] )

The example aggregation:示例聚合:

  • Partitions the documents by variety to create one grouping for Arabica Typica and one for Gesha coffee.variety划分文档,为Arabica TypicaGesha咖啡创建一个分组。
  • Specifies a full range, meaning that the data is densified across the full range of existing documents for each partition.指定full范围,这意味着数据在每个分区的现有文档的整个范围内进行加密。
  • Specifies a step of 200, meaning new documents are created at altitude intervals of 200.指定step200,表示以200altitude(海拔)间隔创建新文档。

The aggregation outputs the following documents:聚合输出以下文件:

[
   {
     _id: ObjectId("618c031814fbe03334480475"),
     altitude: 600,
     variety: 'Arabica Typica',
     score: 68.3
   },
   {
     _id: ObjectId("618c031814fbe03334480476"),
     altitude: 750,
     variety: 'Arabica Typica',
     score: 69.5
   },
   { variety: 'Arabica Typica', altitude: 800 },
   {
     _id: ObjectId("618c031814fbe03334480477"),
     altitude: 950,
     variety: 'Arabica Typica',
     score: 70.5
   },
   { variety: 'Gesha', altitude: 600 },
   { variety: 'Gesha', altitude: 800 },
   { variety: 'Gesha', altitude: 1000 },
   { variety: 'Gesha', altitude: 1200 },
   {
     _id: ObjectId("618c031814fbe03334480478"),
     altitude: 1250,
     variety: 'Gesha',
     score: 88.15
   },
   { variety: 'Gesha', altitude: 1400 },
   { variety: 'Gesha', altitude: 1600 },
   {
     _id: ObjectId("618c031814fbe03334480479"),
     altitude: 1700,
     variety: 'Gesha',
     score: 95.5,
     price: 1029
   },
   { variety: 'Arabica Typica', altitude: 1000 },
   { variety: 'Arabica Typica', altitude: 1200 },
   { variety: 'Arabica Typica', altitude: 1400 },
   { variety: 'Arabica Typica', altitude: 1600 }
 ]

This image visualizes the documents created with $densify:此图像显示了使用$densify创建的文档:

State of the coffee collection after full-range densifiction
  • The darker squares represent the original documents in the collection.较深的方块表示集合中的原始文档。
  • The lighter squares represent the documents created with $densify.较浅的方块表示使用$densify创建的文档。

Densify Values within Each Partition加密每个分区内的值

This example uses $densify to only densify gaps in the altitude field within each variety:本示例使用$densify仅加密每个varietyaltitude字段中的间隙:

db.coffee.aggregate( [
   {
      $densify: {
         field: "altitude",
         partitionByFields: [ "variety" ],
         range: {
            bounds: "partition",
            step: 200
         }
      }
   }
] )

The example aggregation:示例聚合:

  • Partitions the documents by variety to create one grouping for Arabica Typica and one for Gesha coffee.variety划分文档,为Arabica TypicaGesha咖啡创建一个分组。
  • Specifies a partition range, meaning that the data is densified within each partition.指定partition范围,这意味着数据在每个分区内加密。

    • For the Arabica Typica partition, the range is 600-900.对于Arabica Typica分区,范围为600-900
    • For the Gesha partition, the range is 1250-1700.对于Gesha分区,范围为1250-1700
  • Specifies a step of 200, meaning new documents are created at altitude intervals of 200.指定step200,表示以200altitude间隔创建新文档。

The aggregation outputs the following documents:聚合输出以下文件:

[
   {
     _id: ObjectId("618c031814fbe03334480475"),
     altitude: 600,
     variety: 'Arabica Typica',
     score: 68.3
   },
   {
     _id: ObjectId("618c031814fbe03334480476"),
     altitude: 750,
     variety: 'Arabica Typica',
     score: 69.5
   },
   { variety: 'Arabica Typica', altitude: 800 },
   {
     _id: ObjectId("618c031814fbe03334480477"),
     altitude: 950,
     variety: 'Arabica Typica',
     score: 70.5
   },
   {
     _id: ObjectId("618c031814fbe03334480478"),
     altitude: 1250,
     variety: 'Gesha',
     score: 88.15
   },
   { variety: 'Gesha', altitude: 1450 },
   { variety: 'Gesha', altitude: 1650 },
   {
     _id: ObjectId("618c031814fbe03334480479"),
     altitude: 1700,
     variety: 'Gesha',
     score: 95.5,
     price: 1029
   }
 ]

This image visualizes the documents created with $densify:此图像显示了使用$densify创建的文档:

State of the coffee collection after partition range densification
  • The darker squares represent the original documents in the collection.较深的方块表示集合中的原始文档。
  • The lighter squares represent the documents created with $densify.较浅的方块表示使用$densify创建的文档。
←  $currentOp (aggregation)$documents (aggregation) →