Best Practices for Time Series Collections时间序列集合的最佳实践

~~This page describes best practices to improve performance and data usage for time series collections.~~本页介绍了提高时间序列集合性能和数据使用率的最佳实践。

Compression Best Practices压缩最佳实践

~~To optimize data compression for time series collections, perform the following actions:~~要优化时间序列集合的数据压缩，请执行以下操作：

Omit Fields Containing Empty Objects and Arrays from Documents从文档中省略包含空对象和数组的字段

~~If your data contains empty objects, arrays, or strings, omit the empty fields from your documents to optimize compression.~~如果数据包含空对象、数组或字符串，请从文档中省略空字段以优化压缩。

~~For example, consider the following documents:~~例如，考虑以下文件：

{
   timestamp: ISODate("2020-01-23T00:00:00.441Z"),
   coordinates: [1.0, 2.0]
},
{
   timestamp: ISODate("2020-01-23T00:00:10.441Z"),
   coordinates: []
},
{
   timestamp: ISODate("2020-01-23T00:00:20.441Z"),
   coordinates: [3.0, 5.0]
}

coordinates fields with populated values and coordinates fields with an empty array result in a schema change for the compressor. The schema change causes the second and third documents in the sequence to remain uncompressed.具有填充值的coordinates字段和具有空数组的coordinates字段会导致压缩器的模式更改。模式更改导致序列中的第二个和第三个文档保持未压缩状态。

~~Optimize compression by omitting the fields with empty values, as shown in the following documents:~~通过省略null值字段来优化压缩，如以下文档所示：

{
   timestamp: ISODate("2020-01-23T00:00:00.441Z"),
   coordinates: [1.0, 2.0]
},
{
   timestamp: ISODate("2020-01-23T00:00:10.441Z")
},
{
   timestamp: ISODate("2020-01-23T00:00:20.441Z"),
   coordinates: [3.0, 5.0]
}

Round Numeric Data to Few Decimal Places将数字数据四舍五入到小数点后几位

~~Round numeric data to the precision that your application requires. Rounding numeric data to fewer decimal places improves the compression ratio.~~将数值数据四舍五入到应用程序所需的精度。将数字数据四舍五入到较少的小数位数可以提高压缩比。

Inserts Best Practices插入最佳实践

~~To optimize insert performance for time series collections, perform the following actions:~~要优化时间序列集合的插入性能，请执行以下操作：

Batch Document Writes批处理文档写入

~~When inserting multiple documents:~~插入多个文档时：

~~To avoid network roundtrips, use a single insertMany() statement as opposed to multiple insertOne() statements.~~为了避免网络往返，请使用单个insertMany()语句，而不是多个insertOne()语句。
~~If possible, insert data that contains identical metaField values in the same batches.~~如果可能，请在同一批中插入包含相同metaField值的数据。
~~Set the ordered parameter to false.~~将ordered参数设置为false。

For example, if you have two sensors that correspond to two metaField values, sensor A and sensor B, a batch that contains multiple measurements from a single sensor incurs the cost of one insert, rather than one insert per measurement.例如，如果您有两个传感器对应于两个metaField值，即sensor A和sensor B，则包含单个传感器的多个测量值的批次将产生一个插入的成本，而不是每次测量一个插入。

The following operation inserts six documents, but only incurs the cost of two inserts (one per metaField value), because the documents are ordered by sensor. The ordered parameter is set to false to improve performance:以下操作插入六个文档，但只产生两个插入的成本（每个metaField值一个），因为这些文档是按传感器排序的。有序参数设置为false以提高性能：

db.temperatures.insertMany(
   [
      {
         metaField: {
            sensor: "sensorA"
         },
         timestamp: ISODate("2021-05-18T00:00:00.000Z"),
         temperature: 10
      },
      {
         metaField: {
            sensor: "sensorA"
         },
         timestamp: ISODate("2021-05-19T00:00:00.000Z"),
         temperature: 12
      },
      {
         metaField: {
            sensor: "sensorA"
         },
         timestamp: ISODate("2021-05-20T00:00:00.000Z"),
         temperature: 13
      },
      {
         metaField: {
            sensor: "sensorB"
         },
         timestamp: ISODate("2021-05-18T00:00:00.000Z"),
         temperature: 20
      },
      {
         metaField: {
            sensor: "sensorB"
         },
         timestamp: ISODate("2021-05-19T00:00:00.000Z"),
         temperature: 25
      },
      {
         metadField: {
            sensor: "sensorB"
         },
         timestamp: ISODate("2021-05-20T00:00:00.000Z"),
         temperature: 26
      }
   ],
   { "ordered": false }
)

Use Consistent Field Order in Documents在文档中使用一致的字段顺序

~~Using a consistent field order in your documents improves insert performance.~~在文档中使用一致的字段顺序可以提高插入性能。

~~For example, inserting the following documents, all of which have the same field order, results in optimal insert performance.~~例如，插入以下文档（所有文档都具有相同的字段顺序）可以获得最佳的插入性能。

{
   _id: ObjectId("6250a0ef02a1877734a9df57"),
   timestamp: ISODate("2020-01-23T00:00:00.441Z"),
   name: "sensor1",
   range: 1
},
{
   _id: ObjectId("6560a0ef02a1877734a9df66"),
   timestamp: ISODate("2020-01-23T01:00:00.441Z"),
   name: "sensor1",
   range: 5
}

~~In contrast, the following documents do not achieve optimal insert performance, because their field orders differ:~~相比之下，以下文档没有达到最佳的插入性能，因为它们的字段顺序不同：

{
   range: 1,
   _id: ObjectId("6250a0ef02a1877734a9df57"),
   name: "sensor1",
   timestamp: ISODate("2020-01-23T00:00:00.441Z")
},
{
   _id: ObjectId("6560a0ef02a1877734a9df66"),
   name: "sensor1",
   timestamp: ISODate("2020-01-23T01:00:00.441Z"),
   range: 5
}

Increase the Number of Clients增加客户数量

~~Increasing the number of clients that write data to your collections can improve performance.~~增加向集合写入数据的客户端数量可以提高性能。

Sharding Best Practices分片化最佳实践

~~To optimize sharding on your time series collection, perform the following action:~~要优化时间序列集合的分片，请执行以下操作：

Use the `metaField` as your Shard Key将`metaField`用作分片键

~~Using the metaField to shard your collection provides sufficienct cardinality as a shard key for time series collections.~~使用metaField对集合进行分片，可以为时间序列集合的分片键提供足够的基数。

Note

~~Starting in MongoDB 8.0, the use of the timeField as a shard key in time series collections is deprecated.~~从MongoDB 8.0开始，不推荐在时间序列集合中使用timeField作为分片键。

Query Best Practices查询最佳实践

~~To optimize queries on your time series collection, perform the following actions:~~要优化对时间序列集合的查询，请执行以下操作：

Set a Strategic `metaField` When Creating the Collection创建集合时设置战略`metaField`

~~Your choice of metaField has the biggest impact on optimizing queries in your application.~~您对metaField的选择对优化应用程序中的查询影响最大。

~~Select fields that rarely or never change as part of your metaField.~~选择很少或从不更改的字段作为metaField的一部分。
~~If possible, select identifiers or other stable values that are common in filter expressions as part of your metaField.~~如果可能，请选择标识符或其他在筛选器表达式中常见的稳定值作为metaField的一部分。
~~Avoid selecting fields that are not used for filtering as part of your metaField. Instead, use those fields as measurements.~~避免选择不用于筛选的字段作为metaField的一部分。相反，使用这些字段作为测量值。

~~For more information, see metaField Considerations.~~有关更多信息，请参阅metaField注意事项。

Set Appropriate Bucket Granularity设置适当的桶粒度

When you create a time series collection, MongoDB groups incoming time series data into buckets. By accurately setting granularity, you control how frequently data is bucketed based on the ingestion rate of your data.当您创建时间序列集合时，MongoDB会将传入的时间序列数据分组到桶中。通过精确设置粒度，您可以根据数据的摄取率控制数据被分组的频率。

Starting in MongoDB 6.3, you can use the custom bucketing parameters bucketMaxSpanSeconds and bucketRoundingSeconds to specify bucket boundaries and more precisely control how time series data is bucketed.从MongoDB 6.3开始，您可以使用自定义桶参数bucketMaxSpanSeconds和bucketRoundingSeconds来指定桶边界，并更精确地控制时间序列数据的分桶方式。

~~You can improve performance by setting the granularity or custom bucketing parameters to the best match for the time span between incoming measurements from the same data source.~~ 通过将granularity（粒度）或自定义桶参数设置为与来自同一数据源的传入测量值之间的时间跨度最佳匹配，可以提高性能。For example, if you are recording weather data from thousands of sensors but only record data from each sensor once per 5 minutes, you can either set granularity to "minutes" or set the custom bucketing parameters to 300 (seconds).例如，如果您正在记录来自数千个传感器的天气数据，但每5分钟只记录一次每个传感器的数据，您可以将granularity设置为"minutes"，也可以将自定义通参数设置为300（秒）。

~~In this case, setting the granularity to hours groups up to a month's worth of data ingest events into a single bucket, resulting in longer traversal times and slower queries.~~ 在这种情况下，将granularity设置为hours将多达一个月的数据摄取事件分组到一个桶中，导致遍历时间更长，查询速度较慢。~~Setting it to seconds leads to multiple buckets per polling interval, many of which might contain only a single document.~~将其设置为seconds会导致每个轮询间隔有多个桶，其中许多桶可能只包含一个文档。

~~The following table shows the maximum time interval included in one bucket of data when using a given granularity value:~~下表显示了使用给定granularity值时一个数据桶中包含的最大时间间隔：

`granularity`	`granularity` ~~bucket limit~~桶极限
`seconds`	1 hour
`minutes`	24 hours
`hours`	30 days

Tip

~~Timing of Automatic Removal~~自动拆卸的时间

Create Secondary Indexes创建二级索引

~~To improve query performance, create one or more secondary indexes on your timeField and metaField to support common query patterns.~~ 为了提高查询性能，请在timeField和metaField上创建一个或多个辅助索引，以支持常见的查询模式。~~In versions 6.3 and higher, MongoDB creates a secondary index on the timeField and metaField automatically.~~在6.3及更高版本中，MongoDB会自动在timeField和metaField上创建辅助索引。

Additional Index Best Practices其他索引最佳实践

~~Use the metaField index for filtering and equality.~~使用metaField索引进行筛选和相等。
~~Use the timeField and other indexed fields for range queries.~~使用timeField和其他索引字段进行范围查询。
~~General indexing strategies also apply to time series collections. For more information, see Indexing Strategies.~~一般索引策略也适用于时间序列集合。有关更多信息，请参阅索引策略。

Query the `metaField` on Sub-Fields查询子字段上的`metaField`

~~MongoDB reorders the metaField of time-series collections, which may cause servers to store data in a different field order than applications.~~ MongoDB对时间序列集合的metaField进行重新排序，这可能会导致服务器以与应用程序不同的字段顺序存储数据。~~If a metaField is an object, queries on the metaField may produce inconsistent results because metaField order may vary between servers and applications.~~ 如果metaField是对象，则对metaField的查询可能会产生不一致的结果，因为服务器和应用程序之间的metaField顺序可能不同。~~To optimize queries on a time-series metaField, query the metaField on scalar sub-fields rather than the entire metaField.~~要优化对时间序列metaField的查询，请在标量子字段而不是整个metaField上查询metaField。

~~The following example creates a time series collection:~~以下示例创建了一个时间序列集合：

db.weather.insertMany( [
   {
      metaField: { sensorId: 5578, type: "temperature" },
      timestamp: ISODate( "2021-05-18T00:00:00.000Z" ),
      temp: 12
   },
   {
      metaField: { sensorId: 5578, type: "temperature" },
      timestamp: ISODate( "2021-05-18T04:00:00.000Z" ),
      temp: 11
   }
] )

~~The following query on the sensorId and type scalar sub-fields returns the first document that matches the query criteria:~~以下对sensorId和type标量子字段的查询将返回符合查询条件的第一个文档：

db.weather.findOne( {
   "metaField.sensorId": 5578,
   "metaField.type": "temperature"
} )

~~Example output:~~示例输出：

{
  _id: ObjectId("6572371964eb5ad43054d572"),
  metaField: { sensorId: 5578, type: 'temperature' },
  timestamp: ISODate( "2021-05-18T00:00:00.000Z" ),
  temp: 12
}

Use $group Instead of Distinct()使用`$group`而不是`Distinct()`

Due to the unique data structure of time series collections, MongoDB can't efficiently index them for distinct values. Avoid using the distinct command or db.collection.distinct() helper method on time series collections. 由于时间序列集合的独特数据结构，MongoDB无法有效地为它们索引不同的值。避免在时间序列集合上使用distinct命令或db.collection.distinct()辅助方法。~~Instead, use a $group aggregation to group documents by distinct values.~~相反，使用$group聚合按不同的值对文档进行分组。

~~For example, to query for distinct meta.type values on documents where meta.project = 10, instead of:~~例如，在meta.project = 10的文档上查询不同的meta.type值，不使用：

db.foo.distinct("meta.type", {"meta.project": 10})

~~Use:~~而使用：

db.foo.createIndex({"meta.project":1, "meta.type":1})
db.foo.aggregate([{$match: {"meta.project": 10}},
                  {$group: {_id: "$meta.type"}}])

~~This works as follows:~~其工作原理如下：

~~Creating a compound index on meta.project and meta.type and supports the aggregation.~~在meta.project和meta.type上创建复合索引，并支持聚合。
~~The $match stage filters for documents where meta.project = 10.~~$match阶段筛选器用于筛选meta.project = 10的文档。
~~The $group stage uses meta.type as the group key to output one document per unique value.~~$group阶段使用meta.type作为组键，为每个唯一值输出一个文档。

Back

~~Add Secondary Indexes~~添加次要索引

~~Limitations~~局限性