Time series collections generally behave like normal collections, but with additional exceptions. For information on time series collection behavior and structure, see Time Series Collections.时间序列集合通常表现得像普通集合,但也有其他例外。有关时间序列集合行为和结构的信息,请参阅时间序列集合。
metaField Considerations元字段注意事项
A metaField should rarely change and can be any data type. A metaField can be an object and can contain subfields. metaField应该很少更改,可以是任何数据类型。metaField可以是对象,也可以包含子字段。Once you define a field as the 将字段定义metaField, you can change the value of the metaField but you cannot redefine the metaField as another field. 为metaField后,可以更改metaField的值,但不能将metaField重新定义为另一个字段。For example, if you create time series documents with the 例如,如果创建时间序列文档时将metaField defined as field A, you cannot later convert a field B to be the metaField. metaField定义为字段A,则以后无法将字段B转换为metaField。However, if the value of 但是,如果A is an object, you can add new subfields to A.A的值是一个对象,则可以向A添加新的子字段。
Note
Using an array as a 将数组用作metaField may cause unexpected collection behavior because array equality depends on specific order.metaField可能会导致意外的集合行为,因为数组相等性取决于特定的顺序。
MongoDB uses the MongoDB使用metaField to partition data for efficient organization and retrieval. When you create a time series collection, MongoDB groups documents into buckets. Documents within a bucket share an identical metaField value and have timeField values that are close together.metaField对数据进行分区,以实现高效的组织和检索。当您创建时间序列集合时,MongoDB会将文档分组到桶中。桶中的文档共享相同的metaField值,并且具有相近的timeField值。
The number of buckets in a time series collection depends on the number of unique 时间序列集合中的桶数取决于唯一metaField values. metaField值的数量。Collections with fine-grained or dynamic 具有细粒度或动态metaField values may generate more, sparsely packed, short-lived buckets than collections with simple metaField values that rarely or never change. metaField值的集合可能会生成比具有很少或永远不会改变的简单metaField值的集合更多、稀疏打包、短暂的桶。Fine-grained and dynamic 细粒度和动态的metaField values typically decrease storage and query effiency.metaField值通常会降低存储和查询效率。
metaField Best Practices最佳实践
Select fields that rarely or never change as part of your metaField.选择很少或从不更改的字段作为metaField的一部分。If possible, select identifiers or other stable values that are common in filter expressions as part of your metaField.如果可能,请选择标识符或其他在筛选器表达式中常见的稳定值作为metaField的一部分。Avoid selecting fields that are not used for filtering as part of your metaField. Instead, use those fields as measurements.避免选择不用于筛选的字段作为metaField的一部分。相反,使用这些字段作为测量值。
Storage and Cardinality存储和基数
When you insert data into a time series collection, the internal collection automatically organizes the data into an optimized storage format using buckets. 当您将数据插入时间序列集合时,内部集合会使用桶自动将数据组织成优化的存储格式。If a suitable bucket exists, MongoDB inserts new data into that bucket. If a suitable bucket does not exist, MongoDB creates a new bucket. 如果存在合适的桶,MongoDB会将新数据插入到该桶中。如果不存在合适的桶,MongoDB会创建一个新的桶。To optimize storage, choose a 要优化存储,请选择一个很少更改的metaField that rarely changes to create time series collections with fewer, more densely packed buckets.metaField,以创建具有更少、更密集的桶的时间序列集合。
Collections with fine-grained or changing 具有细粒度或不断变化的metaField values generate many sparsely packed, short-lived buckets, increasing the cardinality of your collection. Increasing cardinality leads to decreased storage and query efficiency.metaField值的集合会生成许多稀疏、短暂的桶,从而增加集合的基数。增加基数会导致存储和查询效率降低。
Granularity粒度
You can use the 您可以使用granularity parameter to specify how frequently MongoDB buckets your time series data based on the data ingestion rate. The following table shows the maximum time interval included in one bucket of data when using a given granularity value:granularity参数指定MongoDB根据数据摄取率存储时间序列数据的频率。下表显示了使用给定granularity值时一个数据桶中包含的最大时间间隔:
granularity | granularity |
|---|---|
seconds | 1 hour |
minutes | 24 hours |
hours | 30 days |
By default, 默认情况下,granularity is set to seconds. You can improve performance by setting the granularity value to the closest match to the time span between incoming measurements from the same data source. granularity设置为seconds。通过将granularity值设置为与来自同一数据源的传入测量值之间的时间跨度最接近的匹配,可以提高性能。For example, if you are recording weather data from thousands of sensors but only record data from each sensor once per 5 minutes, set 例如,如果您正在记录来自数千个传感器的天气数据,但每5分钟只记录一次每个传感器的数据,请将granularity to "minutes". granularity设置为"minutes"。The less frequently you append new documents, the greater the storage and performance benefits of coarser granularity.添加新文档的频率越低,粗粒度的存储和性能优势就越大。
Setting the 将granularity to hours groups up to a month's worth of data ingest events into a single bucket, resulting in longer traversal times and slower queries. Setting it to seconds leads to multiple buckets per polling interval, many of which might contain only a single document.granularity设置为hours,可以将最多一个月的数据摄取事件分组到一个桶中,从而导致更长的遍历时间和更慢的查询速度。将其设置为seconds会导致每个轮询间隔有多个桶,其中许多桶可能只包含一个文档。
You should also consider typical queries when choosing the 在选择granularity value. For example, if you expect your queries to fetch 1 day of data at a time, use "minutes". A finer granularity, like "seconds", creates buckets that cover one hour. This requires more buckets to represent the same data, which negatively affects storage and query performance. granularity值时,您还应该考虑典型的查询。例如,如果您希望查询一次获取1天的数据,请使用“分钟”。更精细的粒度,如“秒”,可以创建覆盖一小时的桶。这需要更多的桶来表示相同的数据,这会对存储和查询性能产生负面影响。A coarser granularity, like "hours" (which has a 30-day bucket span), requires queries to fetch 30 days of data at a time and then filter out most of it.更粗的粒度,如“小时”(具有30天的桶跨度),要求查询一次获取30天的数据,然后筛选掉其中的大部分。
For examples, see Set Granularity for Time Series Data.有关示例,请参阅设置时间序列数据的粒度。
Compression and Hardware压缩和硬件
All time series collection use a compressed bucket format when you append data into opened or reopened buckets. Compressing time series data in the cache supports high cardinality workloads while preserving efficient query performance.当您将数据附加到打开或重新打开的桶中时,所有时间序列集合都使用压缩桶格式。压缩缓存中的时间序列数据支持高基数工作负载,同时保持高效的查询性能。
Zone Sharding区域分割
Zone sharding does not support time series collections. The balancer always distributes data in sharded time series collections evenly across all shards in the cluster.区域分片不支持时间序列集合。平衡器总是将分片时间序列集合中的数据均匀地分布在集群中的所有分片上。