Docs HomeMongoDB Manual

Best Practices for Time Series Collections时间序列集合的最佳实践

This page describes best practices to improve performance and data usage for time series collections.本页介绍了提高时间序列集合的性能和数据使用率的最佳做法。

Optimize Inserts优化插入

To optimize insert performance for time series collections, perform the following actions.要优化时间序列集合的插入性能,请执行以下操作。

Batch Document Writes批处理文档写入

When inserting multiple documents:插入多个文档时:

  • To avoid network roundtrips, use a single insertMany() statement as opposed to multiple insertOne() statements.为了避免网络往返,请使用单个insertMany()语句,而不是多个insertOne()语句。
  • If possible, construct batches to contain multiple measurements per series (as defined by metadata).如果可能,构建批次以包含每个系列的多个测量值(由元数据定义)。
  • To improve performance, set the ordered parameter to false.若要提高性能,请将ordered参数设置为false

For example, if you have two sensors, sensor A and sensor B, a batch containing multiple measurements from a single sensor incurs the cost of one insert, rather than one insert per measurement.例如,如果您有两个传感器,即sensor Asensor B,那么包含来自单个传感器的多个测量值的批次将产生一个插入物的成本,而不是每次测量一个插入件的成本。

The following operation inserts six documents, but only incurs the cost of two inserts (one per batch), because the documents are ordered by sensor. 以下操作插入六个文档,但只需要插入两个文档(每批一个),因为这些文档是按传感器订购的。The ordered parameter is set to false to improve performance:ordered参数设置为false以提高性能:

db.temperatures.insertMany( [
{
"metadata": {
"sensor": "sensorA"
},
"timestamp": ISODate("2021-05-18T00:00:00.000Z"),
"temperature": 10
},
{
"metadata": {
"sensor": "sensorA"
},
"timestamp": ISODate("2021-05-19T00:00:00.000Z"),
"temperature": 12
},
{
"metadata": {
"sensor": "sensorA"
},
"timestamp": ISODate("2021-05-20T00:00:00.000Z"),
"temperature": 13
},
{
"metadata": {
"sensor": "sensorB"
},
"timestamp": ISODate("2021-05-18T00:00:00.000Z"),
"temperature": 20
},
{
"metadata": {
"sensor": "sensorB"
},
"timestamp": ISODate("2021-05-19T00:00:00.000Z"),
"temperature": 25
},
{
"metadata": {
"sensor": "sensorB"
},
"timestamp": ISODate("2021-05-20T00:00:00.000Z"),
"temperature": 26
}
], {
"ordered": false
})

Use Consistent Field Order in Documents在文档中使用一致的字段顺序

Using a consistent field order in your documents improves insert performance.在文档中使用一致的字段顺序可以提高插入性能。

For example, inserting these documents achieves optimal insert performance:例如,插入这些文档可以获得最佳的插入性能:

{
"_id": ObjectId("6250a0ef02a1877734a9df57"),
"timestamp": ISODate("2020-01-23T00:00:00.441Z"),
"name": "sensor1",
"range": 1
},
{
"_id": ObjectId("6560a0ef02a1877734a9df66"),
"timestamp": ISODate("2020-01-23T01:00:00.441Z"),
"name": "sensor1",
"range": 5
}

In contrast, these documents do not achieve optimal insert performance, because their field orders differ:相比之下,这些文档无法实现最佳插入性能,因为它们的字段顺序不同:

{
"range": 1,
"_id": ObjectId("6250a0ef02a1877734a9df57"),
"name": "sensor1",
"timestamp": ISODate("2020-01-23T00:00:00.441Z")
},
{
"_id": ObjectId("6560a0ef02a1877734a9df66"),
"name": "sensor1",
"timestamp": ISODate("2020-01-23T01:00:00.441Z"),
"range": 5
}

Increase the Number of Clients增加客户端数量

Increasing the number of clients writing data to your collections can improve performance.增加向集合写入数据的客户端数量可以提高性能。

Important

Disable Retryable Writes禁用可重试写入

To write data with multiple clients, you must disable retryable writes. Retryable writes for time series collections do not combine writes from multiple clients.若要使用多个客户端写入数据,必须禁用可重试写入。时间序列集合的可重试写入不会合并来自多个客户端的写入。

To learn more about retryable writes and how to disable them, see retryable writes.要了解有关可重试写入以及如何禁用它们的更多信息,请参阅可重试写入

Optimize Compression优化压缩

To optimize data compression for time series collections, perform the following actions.要优化时间序列集合的数据压缩,请执行以下操作。

Omit Fields Containing Empty Objects and Arrays from Documents从文档中省略包含空对象和数组的字段

To optimize compression, if your data contains empty objects or arrays, omit the empty fields from your documents.若要优化压缩,如果数据包含空对象或数组,请从文档中省略空字段。

For example, consider the following documents:例如,请考虑以下文档:

{
"timestamp": ISODate("2020-01-23T00:00:00.441Z"),
"coordinates": [1.0, 2.0]
},
{
"timestamp": ISODate("2020-01-23T00:00:10.441Z"),
"coordinates": []
},
{
"timestamp": ISODate("2020-01-23T00:00:20.441Z"),
"coordinates": [3.0, 5.0]
}

The alternation between coordinates fields with populated values and an empty array result in a schema change for the compressor. The schema change causes the second and third documents in the sequence remain uncompressed.具有填充值的coordinates字段和空数组之间的交替会导致压缩器的模式更改。模式更改导致序列中的第二个和第三个文档保持未压缩状态。

In contrast, the following documents where the empty array is omitted receive the benefit of optimal compression:相比之下,以下省略了空数组的文档获得了最佳压缩的好处:

{
"timestamp": ISODate("2020-01-23T00:00:00.441Z"),
"coordinates": [1.0, 2.0]
},
{
"timestamp": ISODate("2020-01-23T00:00:10.441Z")
},
{
"timestamp": ISODate("2020-01-23T00:00:20.441Z"),
"coordinates": [3.0, 5.0]
}

Round Numeric Data to Few Decimal Places将数字数据舍入到小数位数较少的位置

Round numeric data to the precision required for your application. Rounding numeric data to fewer decimal places improves the compression ratio.将数字数据四舍五入到应用程序所需的精度。将数字数据四舍五入到小数位数较少的位置可提高压缩比。

Optimize Query Performance优化查询性能

Set Appropriate Bucket Granularity设置适当的桶粒度

When you create a time series collection, MongoDB groups incoming time series data into buckets. By accurately setting granularity, you control how frequently data is bucketed based on the ingestion rate of your data.创建时间序列集合时,MongoDB会将传入的时间序列数据分组到桶中。通过准确设置粒度,可以根据数据的摄取率控制数据的分块频率。

Starting in MongoDB 6.3, you can use the custom bucketing parameters bucketMaxSpanSeconds and bucketRoundingSeconds to specify bucket boundaries and more precisely control how time series data is bucketed.从MongoDB 6.3开始,您可以使用自定义的bucketMaxSpanSecondsbucketRoundingSeconds来指定桶边界,并更精确地控制时间序列数据的桶方式。

You can improve performance by setting the granularity or custom bucketing parameters to the best match for the time span between incoming measurements from the same data source. 您可以通过将granularity或自定义分段参数设置为与来自同一数据源的传入测量之间的时间跨度最匹配来提高性能。For example, if you are recording weather data from thousands of sensors but only record data from each sensor once per 5 minutes, you can either set granularity to "minutes" or set the custom bucketing parameters to 300 (seconds).例如,如果您正在记录数千个传感器的天气数据,但每5分钟只记录一次每个传感器的数据,则可以将granularity设置为"minutes",也可以将自定义桶形参数设置为300(秒)。

In this case, setting the granularity to hours groups up to a month's worth of data ingest events into a single bucket, resulting in longer traversal times and slower queries. 在这种情况下,将granularity设置为hours,将最多一个月的数据摄取事件分组到一个桶中,导致遍历时间更长,查询速度较慢。Setting it to seconds leads to multiple buckets per polling interval, many of which might contain only a single document.将其设置为seconds会导致每个轮询间隔有多个桶,其中许多桶可能只包含一个文档。

The following table shows the maximum time interval included in one bucket of data when using a given granularity value:下表显示了使用给定granularity值时一个数据桶中包含的最大时间间隔:

granularitygranularity bucket limit桶极限
seconds1 hour1小时
minutes24 hours24小时
hours30 days30天

Create Secondary Indexes创建辅助索引

To improve query performance, create one or more secondary indexes on your timeField and metaField to support common query patterns. 要提高查询性能,请在timeFieldmetaField创建一个或多个辅助索引,以支持常见的查询模式。In versions 6.3 and higher, MongoDB creates a secondary index on the timeField and metaField automatically.在6.3及更高版本中,MongoDB会自动在timeFieldmetaField上创建辅助索引。

Query metaFields on Sub-Fields查询子字段上的元字段

MongoDB reorders the metaFields of time-series collections, which may cause servers to store data in a different field order than applications. MongoDB对时间序列集合的metaFields进行重新排序,这可能会导致服务器以与应用程序不同的字段顺序存储数据。If metaFields are objects, queries on entire metaFields may produce inconsistent results because metaField order may vary between servers and applications. 如果元字段是对象,则对整个元字段的查询可能会产生不一致的结果,因为元字段的顺序可能因服务器和应用程序而异。To optimize queries on time-series metaFields, query timeseries metaFields on scalar sub-fields rather than entire metaFields.若要优化对时间序列元字段的查询,请在标量子字段而不是整个元字段上查询时间序列元域。

The following example creates a time series collection:以下示例创建一个时间序列集合:

db.weather.insertMany( [
{
"metaField": { "sensorId": 5578, "type": "temperature" },
"timestamp": ISODate( "2021-05-18T00:00:00.000Z" ),
"temp": 12
},
{
"metaField": { "sensorId": 5578, "type": "temperature" },
"timestamp": ISODate( "2021-05-18T04:00:00.000Z" ),
"temp": 11
}
] )

The following query on the sensorId and type scalar sub-fields returns the first document that matches the query criteria:以下对sensorIdtype标量子字段的查询返回与查询条件匹配的第一个文档:

db.weather.findOne( {
"metaField.sensorId": 5578,
"metaField.type": "temperature"
} )

Example output:示例输出:

{
_id: ObjectId("6572371964eb5ad43054d572"),
metaField: { sensorId: 5578, type: 'temperature' },
timestamp: ISODate( "2021-05-18T00:00:00.000Z" ),
temp: 12
}