Database Manual / Data Modeling / Schema Design Patterns / Group Data

Group Data with the Bucket Pattern使用桶模式对数据进行分组

The bucket pattern separates long series of data into distinct objects. Separating large data series into smaller groups can improve query access patterns and simplify application logic. Bucketing is useful when you have similar objects that relate to a central entity, such as stock trades made by a single user.桶模式将长数据序列分离为不同的对象。将大数据系列分成更小的组可以改善查询访问模式并简化应用程序逻辑。当您有与中央实体相关的类似对象时,Bucketing非常有用,例如单个用户进行的股票事务。

You can use the bucket pattern for pagination by grouping your data based on the elements that your application shows per page. This approach uses MongoDB's flexible data model to store data according to the data your applications needs.您可以使用桶模式进行分页,根据应用程序每页显示的元素对数据进行分组。这种方法使用MongoDB灵活的数据模型,根据应用程序所需的数据存储数据。

Tip

Time series collections apply the bucket pattern automatically, and are suitable for most applications that involve bucketing time series data.时间序列集合自动应用桶模式,适用于大多数涉及bucketing时间序列数据的应用程序。

About this Task关于此任务

Consider the following schema that tracks stock trades. The initial schema does not use the bucket pattern, and stores each trade in an individual document.考虑以下跟踪股票事务的模式。初始模式不使用桶模式,并将每笔事务存储在单独的文档中。

db.trades.insertMany(
[
{
"ticker" : "MDB",
"customerId": 123,
"type" : "buy",
"quantity" : 419,
"date" : ISODate("2023-10-26T15:47:03.434Z")
},
{
"ticker" : "MDB",
"customerId": 123,
"type" : "sell",
"quantity" : 29,
"date" : ISODate("2023-10-30T09:32:57.765Z")
},
{
"ticker" : "GOOG",
"customerId": 456,
"type" : "buy",
"quantity" : 50,
"date" : ISODate("2023-10-31T11:16:02.120Z")
}
]
)

The application shows stock trades made by a single customer at a time, and shows 10 trades per page. To simplify the application logic, use the bucket pattern to group the trades by customerId in groups of 10.该应用程序一次显示单个客户进行的股票事务,每页显示10笔事务。为了简化应用程序逻辑,使用桶模式按customerId将事务分组为10组。

Steps步骤

1

Group the data by customerId.customerId对数据进行分组。

Reorganize the schema to have a single document for each customerId:重新组织架构,使每个customerId都有一个文档:

{
"customerId": 123,
"history": [
{
"type": "buy",
"ticker": "MDB",
"qty": 419,
"date": ISODate("2023-10-26T15:47:03.434Z")
},
{
"type": "sell",
"ticker": "MDB",
"qty": 29,
"date": ISODate("2023-10-30T09:32:57.765Z")
}
]
},
{
"customerId": 456,
"history": [
{
"type" : "buy",
"ticker" : "GOOG",
"quantity" : 50,
"date" : ISODate("2023-10-31T11:16:02.120Z")
}
]
}

With the bucket pattern:使用桶模式:

  • Documents with common customerId values are condensed into a single document, with the customerId being a top-level field.具有通用customerId值的文档被压缩为一个文档,customerId是一个顶级字段。
  • Trades for that customer are grouped into an embedded array field, called history.该客户的事务被分组到一个名为history的嵌入式数组字段中。
2

Add an identifier and count for each bucket.为每个桶添加一个标识符并计数。

db.trades.drop()

db.trades.insertMany(
[
{
"_id": "123_1698349623",
"customerId": 123,
"count": 2,
"history": [
{
"type": "buy",
"ticker": "MDB",
"qty": 419,
"date": ISODate("2023-10-26T15:47:03.434Z")
},
{
"type": "sell",
"ticker": "MDB",
"qty": 29,
"date": ISODate("2023-10-30T09:32:57.765Z")
}
]
},
{
"_id": "456_1698765362",
"customerId": 456,
"count": 1,
"history": [
{
"type" : "buy",
"ticker" : "GOOG",
"quantity" : 50,
"date" : ISODate("2023-10-31T11:16:02.120Z")
}
]
},
]
)

The _id field value is a concatenation of the customerId and the first trade time in seconds (since the unix epoch) in the history field._id字段值是history字段中customerId和第一次事务时间(自unix时代以来)的串联。

The count field indicates how many elements are in that document's history array. 计数字段指示该文档的history数组中有多少个元素。The count field is used to implement pagination logic.count字段用于实现分页逻辑。

Next Steps后续步骤

After you update your schema to use the bucket pattern, update your application logic for reading and writing data. See the following sections:更新模式以使用桶模式后,更新读写数据的应用程序逻辑。请参阅以下章节:

Query for Data with the Bucket Pattern使用桶模式查询数据

In the updated schema, each document contains data for a single page in the application. You can use the _id and count field to determine how to return and update data.在更新的模式中,每个文档都包含应用程序中单个页面的数据。您可以使用_idcount字段来确定如何返回和更新数据。

To query for data on the appropriate page, use a regex query to return data for a specified customerId, and use skip to return to the data for the correct page. 要查询相应页面上的数据,请使用正则表达式查询返回指定customerId的数据,并使用skip返回正确页面的数据。The regex query on _id uses the default _id index, which results in performant queries without the need for an additional index._id的正则表达式查询使用默认的_id索引,这会导致高性能的查询,而不需要额外的索引。

The following query returns data for the first page of trades for customer 123:以下查询返回客户123的第一页事务数据:

db.trades.find( { "_id": /^123_/ } ).sort( { _id: 1 } ).limit(1)

To return data for later pages, specify a skip value of one less than the page you want to show data for. For example, to show data for page 10, run the following query:若要返回后续页面的数据,请指定一个比要显示数据的页面小一个的skip值。例如,要显示第10页的数据,请运行以下查询:

db.trades.find( { "_id": /^123_/ } ).sort( { _id: 1 } ).skip(9).limit(1)

Note

The preceding query returns no results because the sample data only contains documents for the first page.前面的查询没有返回任何结果,因为示例数据仅包含第一页的文档。

Insert Data with the Bucket Pattern使用桶模式插入数据

Now that the schema uses the bucket pattern, update your application logic to insert new trades into the correct bucket. Use an update command to insert the trade into the bucket with the appropriate customerId and bucket.既然模式使用了桶模式,那么更新应用程序逻辑,将新事务插入到正确的桶中。使用update命令将事务插入到带有相应customerId和桶的桶中。

The following command inserts a new trade for customerId: 123:以下命令为customerId:123插入新事务:

db.trades.updateOne( { "_id": /^123_/, "count": { $lt: 10 } },
{
"$push": {
"history": {
"type": "buy",
"ticker": "MSFT",
"qty": 42,
"date": ISODate("2023-11-02T11:43:10")
}
},
"$inc": { "count": 1 },
"$setOnInsert": { "_id": "123_1698939791", "customerId": 123 }
},
{ upsert: true }
)

The application displays 10 trades per page. The update filter searches for a document for customerId: 123 where the count is less than 10, meaning that bucket does not contain a full page of data应用程序每页显示10笔事务。更新筛选器搜索count小于10的customerId:123文档,这意味着桶不包含整页数据.

  • If there is a document that matches "_id": /^123_/ and its count is less than 10, the update command pushes the new trade into the matched document's history array.
  • If there is not a matching document, the update command inserts a new document with the new trade (because upsert is true). 如果没有匹配的文档,update命令会插入一个包含新事务的新文档(因为upserttrue)。The _id field of the new document is a concatenation of the customerId and the time in seconds since the unix epoch of the trade.新文档的_id字段是customerId和自事务的unix纪元以来的时间(秒)的连接。

The logic for update commands avoids unbounded arrays by ensuring that no history array contains more than 10 documents.更新命令的逻辑通过确保没有历史数组包含超过10个文档来避免无界数组

After you run the update operation, the trades collection has the following documents:运行更新操作后,trades集合包含以下文档:

[
{
_id: '123_1698349623',
customerId: 123,
count: 3,
history: [
{
type: 'buy',
ticker: 'MDB',
qty: 419,
date: ISODate("2023-10-26T15:47:03.434Z")
},
{
type: 'sell',
ticker: 'MDB',
qty: 29,
date: ISODate("2023-10-30T09:32:57.765Z")
},
{
type: 'buy',
ticker: 'MSFT',
qty: 42,
date: ISODate("2023-11-02T11:43:10.000Z")
}
]
},
{
_id: '456_1698765362',
customerId: 456,
count: 1,
history: [
{
type: 'buy',
ticker: 'GOOG',
quantity: 50,
date: ISODate("2023-10-31T11:16:02.120Z")
}
]
}
]

Results结果

After you implement the bucket pattern, you don't need to incorporate pagination logic to return results in your application. The way the data is stored matches the way it is used in the application.实现桶模式后,您不需要在应用程序中包含分页逻辑来返回结果。数据的存储方式与应用程序中的使用方式相匹配。

Learn More了解更多