IoT Power Consumption物联网功耗
Minimum MongoDB Version: 5.0 (due to use of time series collections, $setWindowFields stage & $integral operator)
Scenario情形
You are monitoring various air-conditioning units running in two buildings on an industrial campus. 你正在监测一个工业园区的两栋大楼里运行的各种空调机组。Every 30 minutes, a device in each unit sends the unit's current power consumption reading back to base, which a central database persists. 每30分钟,每个单元中的一个设备都会将单元的当前功耗读数发送回基站,由中央数据库保存。You want to analyse this data to see how much energy in kilowatt-hours (kWh) each air-conditioning unit has consumed over the last hour for each reading received. 您需要分析这些数据,以了解每个空调机组在过去一小时内收到的每个读数消耗了多少能量(千瓦时)。Furthermore, you want to compute the total energy consumed by all the air-conditioning units combined in each building for every hour.此外,还需要计算每小时每栋建筑中所有空调机组的总能耗。
Sample Data Population样本数据总体
Drop any old version of the database (if it exists) and then populate a new 删除数据库的任何旧版本(如果存在),然后用两栋不同建筑中空调机组每天3小时的设备读数填充新的device_readings collection with device readings spanning 3 hours of a day for air-conditioning units in two different buildings.device_readings集合。
db = db.getSiblingDB("book-iot-power-consumption");
db.dropDatabase();
// Use a time-series collection for optimal processing使用时间序列集合进行优化处理
// NOTE: This command can be commented out and the full example will still work这个命令可以被注释掉,完整的示例仍然有效
db.createCollection("device_readings", {
"timeseries": {
"timeField": "timestamp",
"metaField": "deviceID",
"granularity": "minutes"
}
});
// Create compound index to aid performance for partitionBy & sortBy of setWindowFields创建复合索引以帮助setWindowFields的partitionBy和sortBy的性能
db.device_readings.createIndex({"deviceID": 1, "timestamp": 1});
// Insert 18 records into the device readings collection将18条记录插入设备读数集合
db.device_readings.insertMany([
// 11:29am device readings
{
"buildingID": "Building-ABC",
"deviceID": "UltraAirCon-111",
"timestamp": ISODate("2021-07-03T11:29:59Z"),
"powerKilowatts": 8,
},
{
"buildingID": "Building-ABC",
"deviceID": "UltraAirCon-222",
"timestamp": ISODate("2021-07-03T11:29:59Z"),
"powerKilowatts": 7,
},
{
"buildingID": "Building-XYZ",
"deviceID": "UltraAirCon-666",
"timestamp": ISODate("2021-07-03T11:29:59Z"),
"powerKilowatts": 10,
},
// 11:59am device readings
{
"buildingID": "Building-ABC",
"deviceID": "UltraAirCon-222",
"timestamp": ISODate("2021-07-03T11:59:59Z"),
"powerKilowatts": 9,
},
{
"buildingID": "Building-ABC",
"deviceID": "UltraAirCon-111",
"timestamp": ISODate("2021-07-03T11:59:59Z"),
"powerKilowatts": 8,
},
{
"buildingID": "Building-XYZ",
"deviceID": "UltraAirCon-666",
"timestamp": ISODate("2021-07-03T11:59:59Z"),
"powerKilowatts": 11,
},
// 12:29pm device readings
{
"buildingID": "Building-ABC",
"deviceID": "UltraAirCon-222",
"timestamp": ISODate("2021-07-03T12:29:59Z"),
"powerKilowatts": 9,
},
{
"buildingID": "Building-ABC",
"deviceID": "UltraAirCon-111",
"timestamp": ISODate("2021-07-03T12:29:59Z"),
"powerKilowatts": 9,
},
{
"buildingID": "Building-XYZ",
"deviceID": "UltraAirCon-666",
"timestamp": ISODate("2021-07-03T12:29:59Z"),
"powerKilowatts": 10,
},
// 12:59pm device readings
{
"buildingID": "Building-ABC",
"deviceID": "UltraAirCon-222",
"timestamp": ISODate("2021-07-03T12:59:59Z"),
"powerKilowatts": 8,
},
{
"buildingID": "Building-ABC",
"deviceID": "UltraAirCon-111",
"timestamp": ISODate("2021-07-03T12:59:59Z"),
"powerKilowatts": 8,
},
{
"buildingID": "Building-XYZ",
"deviceID": "UltraAirCon-666",
"timestamp": ISODate("2021-07-03T12:59:59Z"),
"powerKilowatts": 11,
},
// 13:29pm device readings
{
"buildingID": "Building-ABC",
"deviceID": "UltraAirCon-222",
"timestamp": ISODate("2021-07-03T13:29:59Z"),
"powerKilowatts": 9,
},
{
"buildingID": "Building-ABC",
"deviceID": "UltraAirCon-111",
"timestamp": ISODate("2021-07-03T13:29:59Z"),
"powerKilowatts": 9,
},
{
"buildingID": "Building-XYZ",
"deviceID": "UltraAirCon-666",
"timestamp": ISODate("2021-07-03T13:29:59Z"),
"powerKilowatts": 10,
},
// 13:59pm device readings
{
"buildingID": "Building-ABC",
"deviceID": "UltraAirCon-222",
"timestamp": ISODate("2021-07-03T13:59:59Z"),
"powerKilowatts": 8,
},
{
"buildingID": "Building-ABC",
"deviceID": "UltraAirCon-111",
"timestamp": ISODate("2021-07-03T13:59:59Z"),
"powerKilowatts": 8,
},
{
"buildingID": "Building-XYZ",
"deviceID": "UltraAirCon-666",
"timestamp": ISODate("2021-07-03T13:59:59Z"),
"powerKilowatts": 11,
},
]);
Aggregation Pipeline聚合管道
Define a pipeline ready to perform an aggregation to calculate the energy an air-conditioning unit has consumed over the last hour for each reading received:定义一个准备执行聚合的管道,以计算空调机组在过去一小时内收到的每个读数所消耗的能量:
var pipelineRawReadings = [
// Calculate each unit's energy consumed over the last hour for each reading计算每个读数在最后一小时内消耗的每个单元的能量
{"$setWindowFields": {
"partitionBy": "$deviceID",
"sortBy": {"timestamp": 1},
"output": {
"consumedKilowattHours": {
"$integral": {
"input": "$powerKilowatts",
"unit": "hour",
},
"window": {
"range": [-1, "current"],
"unit": "hour",
},
},
},
}},
];
Define a pipeline ready to compute the total energy consumed by all the air-conditioning units combined in each building for every hour:定义一个管道,准备计算每小时每栋建筑中所有空调机组的总能耗:
var pipelineBuildingsSummary = [
// Calculate each unit's energy consumed over the last hour for each reading计算每个读数在最后一小时内消耗的每个单元的能量
{"$setWindowFields": {
"partitionBy": "$deviceID",
"sortBy": {"timestamp": 1},
"output": {
"consumedKilowattHours": {
"$integral": {
"input": "$powerKilowatts",
"unit": "hour",
},
"window": {
"range": [-1, "current"],
"unit": "hour",
},
},
},
}},
// Sort each reading by unit/device and then by timestamp按单位/设备排序,然后按时间戳排序
{"$sort": {
"deviceID": 1,
"timestamp": 1,
}},
// Group readings together for each hour for each device using the last calculated energy consumption field for each hour使用每小时最后计算的能耗字段,将每个设备每小时的读数分组在一起
{"$group": {
"_id": {
"deviceID": "$deviceID",
"date": {
"$dateTrunc": {
"date": "$timestamp",
"unit": "hour",
}
},
},
"buildingID": {"$last": "$buildingID"},
"consumedKilowattHours": {"$last": "$consumedKilowattHours"},
}},
// Sum together the energy consumption for the whole building for each hour across all the units in the building将整栋建筑中所有单元每小时的能耗相加
{"$group": {
"_id": {
"buildingID": "$buildingID",
"dayHour": {"$dateToString": {"format": "%Y-%m-%d %H", "date": "$_id.date"}},
},
"consumedKilowattHours": {"$sum": "$consumedKilowattHours"},
}},
// Sort the results by each building and then by each hourly summary按每栋建筑,然后按每小时摘要对结果进行排序
{"$sort": {
"_id.buildingID": 1,
"_id.dayHour": 1,
}},
// Make the results more presentable with meaningful field names使用有意义的字段名称使结果更加直观
{"$set": {
"buildingID": "$_id.buildingID",
"dayHour": "$_id.dayHour",
"_id": "$$REMOVE",
}},
];
Execution执行
Execute an aggregation using the pipeline to calculate the energy an air-conditioning unit has consumed over the last hour for each reading received and also view its explain plan:使用管道执行汇总,以计算空调机组在过去一小时内收到的每个读数所消耗的能量,并查看其解释计划:
db.device_readings.aggregate(pipelineRawReadings);
db.device_readings.explain("executionStats").aggregate(pipelineRawReadings);
Execute an aggregation using the pipeline to compute the total energy consumed by all the air-conditioning units combined in each building for every hour and also view its explain plan:使用管道执行聚合,以计算每小时每栋建筑中所有空调机组的总能耗,并查看其解释计划:
db.device_readings.aggregate(pipelineBuildingsSummary);
db.device_readings.explain("executionStats").aggregate(pipelineBuildingsSummary);
Expected Results预期结果
For the pipeline to calculate the energy an air-conditioning unit has consumed over the last hour for each reading received, results like the following should be returned (redacted for brevity - only showing the first few records):对于计算空调机组在最后一个小时内收到的每个读数所消耗的能量的管道,应返回以下结果(为简洁起见,进行了编辑,仅显示前几条记录):
[
{
_id: ObjectId("60ed5e679ea1f9f74814ca2b"),
buildingID: 'Building-ABC',
deviceID: 'UltraAirCon-111',
timestamp: ISODate("2021-07-03T11:29:59.000Z"),
powerKilowatts: 8,
consumedKilowattHours: 0
},
{
_id: ObjectId("60ed5e679ea1f9f74814ca2f"),
buildingID: 'Building-ABC',
deviceID: 'UltraAirCon-111',
timestamp: ISODate("2021-07-03T11:59:59.000Z"),
powerKilowatts: 8,
consumedKilowattHours: 4
},
{
_id: ObjectId("60ed5e679ea1f9f74814ca32"),
buildingID: 'Building-ABC',
deviceID: 'UltraAirCon-111',
timestamp: ISODate("2021-07-03T12:29:59.000Z"),
powerKilowatts: 9,
consumedKilowattHours: 8.25
},
{
_id: ObjectId("60ed5e679ea1f9f74814ca35"),
buildingID: 'Building-ABC',
deviceID: 'UltraAirCon-111',
timestamp: ISODate("2021-07-03T12:59:59.000Z"),
powerKilowatts: 8,
consumedKilowattHours: 8.5
},
{
_id: ObjectId("60ed5e679ea1f9f74814ca38"),
buildingID: 'Building-ABC',
deviceID: 'UltraAirCon-111',
timestamp: ISODate("2021-07-03T13:29:59.000Z"),
powerKilowatts: 9,
consumedKilowattHours: 8.5
},
...
...
]
For the pipeline to compute the total energy consumed by all the air-conditioning units combined in each building for every hour, the following results should be returned:对于计算每栋建筑中所有空调机组每小时消耗的总能量的管道,应返回以下结果:
[
{
buildingID: 'Building-ABC',
dayHour: '2021-07-03 11',
consumedKilowattHours: 8
},
{
buildingID: 'Building-ABC',
dayHour: '2021-07-03 12',
consumedKilowattHours: 17.25
},
{
buildingID: 'Building-ABC',
dayHour: '2021-07-03 13',
consumedKilowattHours: 17
},
{
buildingID: 'Building-XYZ',
dayHour: '2021-07-03 11',
consumedKilowattHours: 5.25
},
{
buildingID: 'Building-XYZ',
dayHour: '2021-07-03 12',
consumedKilowattHours: 10.5
},
{
buildingID: 'Building-XYZ',
dayHour: '2021-07-03 13',
consumedKilowattHours: 10.5
}
]
Observations观察
-
Integral Trapezoidal Rule.积分梯形规则。As documented in the MongoDB Manual,如MongoDB手册中所述,$integral"returns an approximation for the mathematical integral value, which is calculated using the trapezoidal rule".$integral“返回数学积分值的近似值,该值是使用梯形规则计算的”。For non-mathematicians, this explanation may be hard to understand.对于非数学家来说,这种解释可能很难理解。You may find it easier to comprehend the behaviour of the通过研究下面的插图和下面的解释,您可能会发现更容易理解$integraloperator by studying the illustration below and the explanation that follows:$integral运算符的行为:
Essentially the trapezoidal rule determines the area of a region between two points under a graph by matching the region with a trapezoid shape that approximately fits this region and then calculating the area of this trapezoid.本质上,梯形规则通过将区域与近似拟合该区域的梯形形状匹配,然后计算该梯形的面积,来确定图下两点之间的区域的面积。You can see a set of points on the illustrated graph with the matched trapezoid shape underneath each pair of points.你可以在图示的图表上看到一组点,每对点下面都有匹配的梯形。For this IoT Power Consumption example, the points on the graph represent an air-conditioning unit's power readings captured every 30 minutes.对于这个物联网功耗示例,图上的点表示每30分钟采集一次的空调机组功率读数。The Y-axis is the power rate in Kilowatts, and the X-axis is time to indicate when the device captured each reading.Y轴是以千瓦为单位的功率,X轴是指示设备何时捕获每个读数的时间。Consequently, for this example, the energy consumed by the air-conditioning unit for a given hour's span is the area of the hour's specific section under the graph.因此,在这个例子中,空调单元在给定小时跨度内消耗的能量是图表下该小时特定部分的面积。This section's area is approximately the area of the two trapezoids shown.该部分的面积近似于所示的两个梯形的面积。Using the对于您在$integraloperator for the window of time you define in the$setWindowFieldsstage, you are asking for this approximate area to be calculated, which is the Kilowatt-hours consumed by the air-conditioning unit in one hour.$setWindowFields阶段中定义的时间窗口,使用$integral运算符,您要求计算该近似面积,即空调机组在一小时内消耗的千瓦时。 -
Window Range Definition.窗口范围定义。For every captured document representing a device reading, this example's pipeline identifies a window of 1-hour of previous documents relative to this current document.对于表示设备读取的每个捕获文档,此示例的管道标识了一个窗口,该窗口包含相对于当前文档的1小时以前的文档。The pipeline uses this set of documents as the input for the管道使用这组文档作为$integraloperator.$integral运算符的输入。It defines this window range in the setting它在设置range: [-1, "current"], unit: "hour".range: [-1, "current"], unit: "hour"中定义了这个窗口范围。The pipeline assigns the output of the管道将$integralcalculation to a new field calledconsumedKilowattHours.$integral计算的输出分配给一个名为consumedKilowattHours的新字段。 -
One Hour Range Vs Hours Output.一小时范围与小时输出。The fact that the管道中的$setWindowFieldsstage in the pipeline definesunit: "hour"in two places may appear redundant at face value.$setWindowFields阶段在两个位置定义了unit: "hour",这一事实可能在表面上显得多余。However, this is not the case, and each serves a different purpose.然而,事实并非如此,每种方法都有不同的用途。As described in the previous observation,如前一次观察所述,unit: "hour"for the"window"option helps dictate the size of the window of the previous number of documents to analyse."window"选项的unit: "hour"有助于确定要分析的前一批文档的窗口大小。However,然而,unit: "hour"for the$integraloperator defines that the output should be in hours ("Kilowatt-hours" in this example), yielding the resultconsumedKilowattHours: 8.5for one of the processed device readings.$integral运算符的unit: "hour"定义了输出应以小时为单位(本例中为“千瓦时”),从而产生一个处理过的设备读数消耗的结果consumedKilowattHours: 8.5。However, if the pipeline defined this但是,如果管道将$integralparameter to be"unit": "minute"instead, which is perfectly valid, the output value would be510Kilowatt-minutes (i.e. 8.5 x 60 minutes).$integral参数定义为"unit": "minute"(完全有效),则输出值将为510千瓦分(即8.5 x 60分钟)。 -
Optional Time Series Collection.可选时间序列集合。This example uses a time series collection to store sequences of device measurements over time efficiently.此示例使用时间序列集合来有效地存储随时间变化的设备测量序列。Employing a time series collection is optional, as shown in the使用时间序列集合是可选的,如示例代码中的NOTEJavascript comment in the example code.NOTEJavascript注释所示。The aggregation pipeline does not need to be changed and achieves the same output if you use a regular collection instead.聚合管道不需要更改,如果使用常规集合,则可以获得相同的输出。However, when dealing with large data sets, the aggregation will complete quicker by employing a time series collection.然而,当处理大型数据集时,通过使用时间序列集合,聚合将更快地完成。 -
Index For Partition By & Sort By.分区依据和排序依据的索引。In this example, you define the index在本例中,您定义索引{deviceID: 1, timestamp: 1}to optimise the use of the combination of thepartitionByandsortByparameters in the$setWindowFieldsstage.{deviceID: 1, timestamp: 1}以优化$setWindowFields阶段中partitionBy和sortBy参数组合的使用。This means that the aggregation runtime does not have to perform a slow in-memory sort based on these two fields, and it also avoids the pipeline stage memory limit of 100 MB.这意味着聚合运行时不必基于这两个字段执行缓慢的内存排序,而且它还避免了100MB的流水线阶段内存限制。It is beneficial to use this index regardless of whether you employ a regular collection or adopt a time series collection.无论您是采用常规集合还是采用时间序列集合,使用此索引都是有益的。