Best Practices for Time Series Collections时间序列集合的最佳实践
On this page本页内容
Optimize Inserts优化插入Batch Document Writes批处理文档写入Use Consistent Field Order in Documents在文档中使用一致的字段顺序Increase the Number of Clients增加客户端数量Optimize Compression优化压缩Omit Fields Containing Empty Objects and Arrays from Documents从文档中省略包含空对象和数组的字段Round Numeric Data to Few Decimal Places将数字数据舍入到小数位数较少的位置Optimize Query Performance优化查询性能Set Appropriate Bucket Granularity设置适当的桶粒度Create Secondary Indexes创建辅助索引Query metaFields on Sub-Fields查询子字段上的元字段
This page describes best practices to improve performance and data usage for time series collections.本页介绍了提高时间序列集合的性能和数据使用率的最佳做法。
Optimize Inserts优化插入
To optimize insert performance for time series collections, perform the following actions.要优化时间序列集合的插入性能,请执行以下操作。
Batch Document Writes批处理文档写入
When inserting multiple documents:插入多个文档时:
To avoid network roundtrips, use a single为了避免网络往返,请使用单个insertMany()
statement as opposed to multipleinsertOne()
statements.insertMany()
语句,而不是多个insertOne()
语句。If possible, construct batches to contain multiple measurements per series (as defined by metadata).如果可能,构建批次以包含每个系列的多个测量值(由元数据定义)。To improve performance, set the若要提高性能,请将ordered
parameter tofalse
.ordered
参数设置为false
。
For example, if you have two sensors, 例如,如果您有两个传感器,即sensor A
and sensor B
, a batch containing multiple measurements from a single sensor incurs the cost of one insert, rather than one insert per measurement.sensor A
和sensor B
,那么包含来自单个传感器的多个测量值的批次将产生一个插入物的成本,而不是每次测量一个插入件的成本。
The following operation inserts six documents, but only incurs the cost of two inserts (one per batch), because the documents are ordered by sensor. 以下操作插入六个文档,但只需要插入两个文档(每批一个),因为这些文档是按传感器订购的。The ordered
parameter is set to false
to improve performance:ordered
参数设置为false
以提高性能:
db.temperatures.insertMany( [
{
"metadata": {
"sensor": "sensorA"
},
"timestamp": ISODate("2021-05-18T00:00:00.000Z"),
"temperature": 10
},
{
"metadata": {
"sensor": "sensorA"
},
"timestamp": ISODate("2021-05-19T00:00:00.000Z"),
"temperature": 12
},
{
"metadata": {
"sensor": "sensorA"
},
"timestamp": ISODate("2021-05-20T00:00:00.000Z"),
"temperature": 13
},
{
"metadata": {
"sensor": "sensorB"
},
"timestamp": ISODate("2021-05-18T00:00:00.000Z"),
"temperature": 20
},
{
"metadata": {
"sensor": "sensorB"
},
"timestamp": ISODate("2021-05-19T00:00:00.000Z"),
"temperature": 25
},
{
"metadata": {
"sensor": "sensorB"
},
"timestamp": ISODate("2021-05-20T00:00:00.000Z"),
"temperature": 26
}
], {
"ordered": false
})
Use Consistent Field Order in Documents在文档中使用一致的字段顺序
Using a consistent field order in your documents improves insert performance.在文档中使用一致的字段顺序可以提高插入性能。
For example, inserting these documents achieves optimal insert performance:例如,插入这些文档可以获得最佳的插入性能:
{
"_id": ObjectId("6250a0ef02a1877734a9df57"),
"timestamp": ISODate("2020-01-23T00:00:00.441Z"),
"name": "sensor1",
"range": 1
},
{
"_id": ObjectId("6560a0ef02a1877734a9df66"),
"timestamp": ISODate("2020-01-23T01:00:00.441Z"),
"name": "sensor1",
"range": 5
}
In contrast, these documents do not achieve optimal insert performance, because their field orders differ:相比之下,这些文档无法实现最佳插入性能,因为它们的字段顺序不同:
{
"range": 1,
"_id": ObjectId("6250a0ef02a1877734a9df57"),
"name": "sensor1",
"timestamp": ISODate("2020-01-23T00:00:00.441Z")
},
{
"_id": ObjectId("6560a0ef02a1877734a9df66"),
"name": "sensor1",
"timestamp": ISODate("2020-01-23T01:00:00.441Z"),
"range": 5
}
Increase the Number of Clients增加客户端数量
Increasing the number of clients writing data to your collections can improve performance.增加向集合写入数据的客户端数量可以提高性能。
Disable Retryable Writes禁用可重试写入
To write data with multiple clients, you must disable retryable writes. Retryable writes for time series collections do not combine writes from multiple clients.若要使用多个客户端写入数据,必须禁用可重试写入。时间序列集合的可重试写入不会合并来自多个客户端的写入。
To learn more about retryable writes and how to disable them, see retryable writes.要了解有关可重试写入以及如何禁用它们的更多信息,请参阅可重试写入。
Optimize Compression优化压缩
To optimize data compression for time series collections, perform the following actions.要优化时间序列集合的数据压缩,请执行以下操作。
Omit Fields Containing Empty Objects and Arrays from Documents从文档中省略包含空对象和数组的字段
To optimize compression, if your data contains empty objects or arrays, omit the empty fields from your documents.若要优化压缩,如果数据包含空对象或数组,请从文档中省略空字段。
For example, consider the following documents:例如,请考虑以下文档:
{
"timestamp": ISODate("2020-01-23T00:00:00.441Z"),
"coordinates": [1.0, 2.0]
},
{
"timestamp": ISODate("2020-01-23T00:00:10.441Z"),
"coordinates": []
},
{
"timestamp": ISODate("2020-01-23T00:00:20.441Z"),
"coordinates": [3.0, 5.0]
}
The alternation between 具有填充值的coordinates
fields with populated values and an empty array result in a schema change for the compressor. The schema change causes the second and third documents in the sequence remain uncompressed.coordinates
字段和空数组之间的交替会导致压缩器的模式更改。模式更改导致序列中的第二个和第三个文档保持未压缩状态。
In contrast, the following documents where the empty array is omitted receive the benefit of optimal compression:相比之下,以下省略了空数组的文档获得了最佳压缩的好处:
{
"timestamp": ISODate("2020-01-23T00:00:00.441Z"),
"coordinates": [1.0, 2.0]
},
{
"timestamp": ISODate("2020-01-23T00:00:10.441Z")
},
{
"timestamp": ISODate("2020-01-23T00:00:20.441Z"),
"coordinates": [3.0, 5.0]
}
Round Numeric Data to Few Decimal Places将数字数据舍入到小数位数较少的位置
Round numeric data to the precision required for your application. Rounding numeric data to fewer decimal places improves the compression ratio.将数字数据四舍五入到应用程序所需的精度。将数字数据四舍五入到小数位数较少的位置可提高压缩比。
Optimize Query Performance优化查询性能
Set Appropriate Bucket Granularity设置适当的桶粒度
When you create a time series collection, MongoDB groups incoming time series data into buckets. By accurately setting granularity, you control how frequently data is bucketed based on the ingestion rate of your data.创建时间序列集合时,MongoDB会将传入的时间序列数据分组到桶中。通过准确设置粒度,可以根据数据的摄取率控制数据的分块频率。
Starting in MongoDB 6.3, you can use the custom bucketing parameters 从MongoDB 6.3开始,您可以使用自定义的bucketMaxSpanSeconds
and bucketRoundingSeconds
to specify bucket boundaries and more precisely control how time series data is bucketed.bucketMaxSpanSeconds
和bucketRoundingSeconds
来指定桶边界,并更精确地控制时间序列数据的桶方式。
You can improve performance by setting the 您可以通过将granularity
or custom bucketing parameters to the best match for the time span between incoming measurements from the same data source. granularity
或自定义分段参数设置为与来自同一数据源的传入测量之间的时间跨度最匹配来提高性能。For example, if you are recording weather data from thousands of sensors but only record data from each sensor once per 5 minutes, you can either set 例如,如果您正在记录数千个传感器的天气数据,但每5分钟只记录一次每个传感器的数据,则可以将granularity
to "minutes"
or set the custom bucketing parameters to 300
(seconds).granularity
设置为"minutes"
,也可以将自定义桶形参数设置为300
(秒)。
In this case, setting the 在这种情况下,将granularity
to hours
groups up to a month's worth of data ingest events into a single bucket, resulting in longer traversal times and slower queries. granularity
设置为hours
,将最多一个月的数据摄取事件分组到一个桶中,导致遍历时间更长,查询速度较慢。Setting it to 将其设置为seconds
leads to multiple buckets per polling interval, many of which might contain only a single document.seconds
会导致每个轮询间隔有多个桶,其中许多桶可能只包含一个文档。
The following table shows the maximum time interval included in one bucket of data when using a given 下表显示了使用给定granularity
value:granularity
值时一个数据桶中包含的最大时间间隔:
granularity | granularity |
---|---|
seconds | |
minutes | |
hours |
See also: 另请参阅:
Create Secondary Indexes创建辅助索引
To improve query performance, create one or more secondary indexes on your 要提高查询性能,请在timeField
and metaField
to support common query patterns. timeField
和metaField
上创建一个或多个辅助索引,以支持常见的查询模式。In versions 6.3 and higher, MongoDB creates a secondary index on the 在6.3及更高版本中,MongoDB会自动在timeField
and metaField
automatically.timeField
和metaField
上创建辅助索引。
Query metaFields on Sub-Fields查询子字段上的元字段
MongoDB reorders the metaFields of time-series collections, which may cause servers to store data in a different field order than applications. MongoDB对时间序列集合的metaFields进行重新排序,这可能会导致服务器以与应用程序不同的字段顺序存储数据。If metaFields are objects, queries on entire metaFields may produce inconsistent results because metaField order may vary between servers and applications. 如果元字段是对象,则对整个元字段的查询可能会产生不一致的结果,因为元字段的顺序可能因服务器和应用程序而异。To optimize queries on time-series metaFields, query timeseries metaFields on scalar sub-fields rather than entire metaFields.若要优化对时间序列元字段的查询,请在标量子字段而不是整个元字段上查询时间序列元域。
The following example creates a time series collection:以下示例创建一个时间序列集合:
db.weather.insertMany( [
{
"metaField": { "sensorId": 5578, "type": "temperature" },
"timestamp": ISODate( "2021-05-18T00:00:00.000Z" ),
"temp": 12
},
{
"metaField": { "sensorId": 5578, "type": "temperature" },
"timestamp": ISODate( "2021-05-18T04:00:00.000Z" ),
"temp": 11
}
] )
The following query on the 以下对sensorId
and type
scalar sub-fields returns the first document that matches the query criteria:sensorId
和type
标量子字段的查询返回与查询条件匹配的第一个文档:
db.weather.findOne( {
"metaField.sensorId": 5578,
"metaField.type": "temperature"
} )
Example output:示例输出:
{
_id: ObjectId("6572371964eb5ad43054d572"),
metaField: { sensorId: 5578, type: 'temperature' },
timestamp: ISODate( "2021-05-18T00:00:00.000Z" ),
temp: 12
}