This page describes best practices to improve performance and data usage for time series collections.本页介绍了提高时间序列集合性能和数据使用率的最佳实践。
Compression Best Practices压缩最佳实践
To optimize data compression for time series collections, perform the following actions:要优化时间序列集合的数据压缩,请执行以下操作:
Omit Fields Containing Empty Objects and Arrays from Documents从文档中省略包含空对象和数组的字段
If your data contains empty objects, arrays, or strings, omit the empty fields from your documents to optimize compression.如果数据包含空对象、数组或字符串,请从文档中省略空字段以优化压缩。
For example, consider the following documents:例如,考虑以下文件:
{
timestamp: ISODate("2020-01-23T00:00:00.441Z"),
coordinates: [1.0, 2.0]
},
{
timestamp: ISODate("2020-01-23T00:00:10.441Z"),
coordinates: []
},
{
timestamp: ISODate("2020-01-23T00:00:20.441Z"),
coordinates: [3.0, 5.0]
}
具有填充值的coordinates fields with populated values and coordinates fields with an empty array result in a schema change for the compressor. The schema change causes the second and third documents in the sequence to remain uncompressed.coordinates字段和具有空数组的coordinates字段会导致压缩器的模式更改。模式更改导致序列中的第二个和第三个文档保持未压缩状态。
Optimize compression by omitting the fields with empty values, as shown in the following documents:通过省略null值字段来优化压缩,如以下文档所示:
{
timestamp: ISODate("2020-01-23T00:00:00.441Z"),
coordinates: [1.0, 2.0]
},
{
timestamp: ISODate("2020-01-23T00:00:10.441Z")
},
{
timestamp: ISODate("2020-01-23T00:00:20.441Z"),
coordinates: [3.0, 5.0]
}Round Numeric Data to Few Decimal Places将数字数据四舍五入到小数点后几位
Round numeric data to the precision that your application requires. Rounding numeric data to fewer decimal places improves the compression ratio.将数值数据四舍五入到应用程序所需的精度。将数字数据四舍五入到较少的小数位数可以提高压缩比。
Inserts Best Practices插入最佳实践
To optimize insert performance for time series collections, perform the following actions:要优化时间序列集合的插入性能,请执行以下操作:
Batch Document Writes批处理文档写入
When inserting multiple documents:插入多个文档时:
To avoid network roundtrips, use a single为了避免网络往返,请使用单个insertMany()statement as opposed to multipleinsertOne()statements.insertMany()语句,而不是多个insertOne()语句。If possible, insert data that contains identical如果可能,请在同一批中插入包含相同metaFieldvalues in the same batches.metaField值的数据。Set the将orderedparameter tofalse.ordered参数设置为false。
For example, if you have two sensors that correspond to two 例如,如果您有两个传感器对应于两个metaField values, sensor A and sensor B, a batch that contains multiple measurements from a single sensor incurs the cost of one insert, rather than one insert per measurement.metaField值,即sensor A和sensor B,则包含单个传感器的多个测量值的批次将产生一个插入的成本,而不是每次测量一个插入。
The following operation inserts six documents, but only incurs the cost of two inserts (one per 以下操作插入六个文档,但只产生两个插入的成本(每个metaField value), because the documents are ordered by sensor. The ordered parameter is set to false to improve performance:metaField值一个),因为这些文档是按传感器排序的。有序参数设置为false以提高性能:
db.temperatures.insertMany(
[
{
metaField: {
sensor: "sensorA"
},
timestamp: ISODate("2021-05-18T00:00:00.000Z"),
temperature: 10
},
{
metaField: {
sensor: "sensorA"
},
timestamp: ISODate("2021-05-19T00:00:00.000Z"),
temperature: 12
},
{
metaField: {
sensor: "sensorA"
},
timestamp: ISODate("2021-05-20T00:00:00.000Z"),
temperature: 13
},
{
metaField: {
sensor: "sensorB"
},
timestamp: ISODate("2021-05-18T00:00:00.000Z"),
temperature: 20
},
{
metaField: {
sensor: "sensorB"
},
timestamp: ISODate("2021-05-19T00:00:00.000Z"),
temperature: 25
},
{
metadField: {
sensor: "sensorB"
},
timestamp: ISODate("2021-05-20T00:00:00.000Z"),
temperature: 26
}
],
{ "ordered": false }
)Use Consistent Field Order in Documents在文档中使用一致的字段顺序
Using a consistent field order in your documents improves insert performance.在文档中使用一致的字段顺序可以提高插入性能。
For example, inserting the following documents, all of which have the same field order, results in optimal insert performance.例如,插入以下文档(所有文档都具有相同的字段顺序)可以获得最佳的插入性能。
{
_id: ObjectId("6250a0ef02a1877734a9df57"),
timestamp: ISODate("2020-01-23T00:00:00.441Z"),
name: "sensor1",
range: 1
},
{
_id: ObjectId("6560a0ef02a1877734a9df66"),
timestamp: ISODate("2020-01-23T01:00:00.441Z"),
name: "sensor1",
range: 5
}
In contrast, the following documents do not achieve optimal insert performance, because their field orders differ:相比之下,以下文档没有达到最佳的插入性能,因为它们的字段顺序不同:
{
range: 1,
_id: ObjectId("6250a0ef02a1877734a9df57"),
name: "sensor1",
timestamp: ISODate("2020-01-23T00:00:00.441Z")
},
{
_id: ObjectId("6560a0ef02a1877734a9df66"),
name: "sensor1",
timestamp: ISODate("2020-01-23T01:00:00.441Z"),
range: 5
}Increase the Number of Clients增加客户数量
Increasing the number of clients that write data to your collections can improve performance.增加向集合写入数据的客户端数量可以提高性能。
Sharding Best Practices分片化最佳实践
To optimize sharding on your time series collection, perform the following action:要优化时间序列集合的分片,请执行以下操作:
Use the metaField as your Shard Key将metaField用作分片键
metaField as your Shard KeyUsing the 使用metaField to shard your collection provides sufficienct cardinality as a shard key for time series collections.metaField对集合进行分片,可以为时间序列集合的分片键提供足够的基数。
Note
Starting in MongoDB 8.0, the use of the 从MongoDB 8.0开始,不推荐在时间序列集合中使用timeField as a shard key in time series collections is deprecated.timeField作为分片键。
Query Best Practices查询最佳实践
To optimize queries on your time series collection, perform the following actions:要优化对时间序列集合的查询,请执行以下操作:
Set a Strategic metaField When Creating the Collection创建集合时设置战略metaField
metaField When Creating the CollectionYour choice of 您对metaField has the biggest impact on optimizing queries in your application.metaField的选择对优化应用程序中的查询影响最大。
Select fields that rarely or never change as part of your metaField.选择很少或从不更改的字段作为metaField的一部分。If possible, select identifiers or other stable values that are common in filter expressions as part of your metaField.如果可能,请选择标识符或其他在筛选器表达式中常见的稳定值作为metaField的一部分。Avoid selecting fields that are not used for filtering as part of your metaField. Instead, use those fields as measurements.避免选择不用于筛选的字段作为metaField的一部分。相反,使用这些字段作为测量值。
For more information, see metaField Considerations.有关更多信息,请参阅metaField注意事项。
Set Appropriate Bucket Granularity设置适当的桶粒度
When you create a time series collection, MongoDB groups incoming time series data into buckets. By accurately setting granularity, you control how frequently data is bucketed based on the ingestion rate of your data.当您创建时间序列集合时,MongoDB会将传入的时间序列数据分组到桶中。通过精确设置粒度,您可以根据数据的摄取率控制数据被分组的频率。
Starting in MongoDB 6.3, you can use the custom bucketing parameters 从MongoDB 6.3开始,您可以使用自定义桶参数bucketMaxSpanSeconds and bucketRoundingSeconds to specify bucket boundaries and more precisely control how time series data is bucketed.bucketMaxSpanSeconds和bucketRoundingSeconds来指定桶边界,并更精确地控制时间序列数据的分桶方式。
You can improve performance by setting the 通过将granularity or custom bucketing parameters to the best match for the time span between incoming measurements from the same data source. granularity(粒度)或自定义桶参数设置为与来自同一数据源的传入测量值之间的时间跨度最佳匹配,可以提高性能。For example, if you are recording weather data from thousands of sensors but only record data from each sensor once per 5 minutes, you can either set 例如,如果您正在记录来自数千个传感器的天气数据,但每5分钟只记录一次每个传感器的数据,您可以将granularity to "minutes" or set the custom bucketing parameters to 300 (seconds).granularity设置为"minutes",也可以将自定义通参数设置为300(秒)。
In this case, setting the 在这种情况下,将granularity to hours groups up to a month's worth of data ingest events into a single bucket, resulting in longer traversal times and slower queries. granularity设置为hours将多达一个月的数据摄取事件分组到一个桶中,导致遍历时间更长,查询速度较慢。Setting it to 将其设置为seconds leads to multiple buckets per polling interval, many of which might contain only a single document.seconds会导致每个轮询间隔有多个桶,其中许多桶可能只包含一个文档。
The following table shows the maximum time interval included in one bucket of data when using a given 下表显示了使用给定granularity value:granularity值时一个数据桶中包含的最大时间间隔:
granularity | granularity |
|---|---|
seconds | 1 hour |
minutes | 24 hours |
hours | 30 days |
Create Secondary Indexes创建二级索引
To improve query performance, create one or more secondary indexes on your 为了提高查询性能,请在timeField and metaField to support common query patterns. timeField和metaField上创建一个或多个辅助索引,以支持常见的查询模式。In versions 6.3 and higher, MongoDB creates a secondary index on the 在6.3及更高版本中,MongoDB会自动在timeField and metaField automatically.timeField和metaField上创建辅助索引。
Additional Index Best Practices其他索引最佳实践
Use the metaField index for filtering and equality.使用metaField索引进行筛选和相等。Use the timeField and other indexed fields for range queries.使用timeField和其他索引字段进行范围查询。General indexing strategies also apply to time series collections. For more information, see Indexing Strategies.一般索引策略也适用于时间序列集合。有关更多信息,请参阅索引策略。
Query the metaField on Sub-Fields查询子字段上的metaField
metaField on Sub-FieldsMongoDB reorders the MongoDB对时间序列集合的metaField of time-series collections, which may cause servers to store data in a different field order than applications. metaField进行重新排序,这可能会导致服务器以与应用程序不同的字段顺序存储数据。If a 如果metaField is an object, queries on the metaField may produce inconsistent results because metaField order may vary between servers and applications. metaField是对象,则对metaField的查询可能会产生不一致的结果,因为服务器和应用程序之间的metaField顺序可能不同。To optimize queries on a time-series 要优化对时间序列metaField, query the metaField on scalar sub-fields rather than the entire metaField.metaField的查询,请在标量子字段而不是整个metaField上查询metaField。
The following example creates a time series collection:以下示例创建了一个时间序列集合:
db.weather.insertMany( [
{
metaField: { sensorId: 5578, type: "temperature" },
timestamp: ISODate( "2021-05-18T00:00:00.000Z" ),
temp: 12
},
{
metaField: { sensorId: 5578, type: "temperature" },
timestamp: ISODate( "2021-05-18T04:00:00.000Z" ),
temp: 11
}
] )
The following query on the 以下对sensorId and type scalar sub-fields returns the first document that matches the query criteria:sensorId和type标量子字段的查询将返回符合查询条件的第一个文档:
db.weather.findOne( {
"metaField.sensorId": 5578,
"metaField.type": "temperature"
} )
Example output:示例输出:
{
_id: ObjectId("6572371964eb5ad43054d572"),
metaField: { sensorId: 5578, type: 'temperature' },
timestamp: ISODate( "2021-05-18T00:00:00.000Z" ),
temp: 12
}Use $group Instead of Distinct()使用$group而不是Distinct()
Due to the unique data structure of time series collections, MongoDB can't efficiently index them for distinct values. Avoid using the 由于时间序列集合的独特数据结构,MongoDB无法有效地为它们索引不同的值。避免在时间序列集合上使用distinct command or db.collection.distinct() helper method on time series collections. distinct命令或db.collection.distinct()辅助方法。Instead, use a 相反,使用$group aggregation to group documents by distinct values.$group聚合按不同的值对文档进行分组。
For example, to query for distinct 例如,在meta.type values on documents where meta.project = 10, instead of:meta.project = 10的文档上查询不同的meta.type值,不使用:
db.foo.distinct("meta.type", {"meta.project": 10})
Use:而使用:
db.foo.createIndex({"meta.project":1, "meta.type":1})
db.foo.aggregate([{$match: {"meta.project": 10}},
{$group: {_id: "$meta.type"}}])
This works as follows:其工作原理如下:
Creating a compound index on在meta.projectandmeta.typeand supports the aggregation.meta.project和meta.type上创建复合索引,并支持聚合。The$matchstage filters for documents wheremeta.project = 10.$match阶段筛选器用于筛选meta.project = 10的文档。The$groupstage usesmeta.typeas the group key to output one document per unique value.$group阶段使用meta.type作为组键,为每个唯一值输出一个文档。