Database Manual / Data Modeling / Schema Design Patterns / Group Data

Group Data with the Outlier Pattern具有异常模式的分组数据

If your collection stores documents of generally the same size and shape, a drastically different document (an outlier) can cause performance issues for common queries.如果集合存储的文档大小和形状大致相同,则完全不同的文档(异常值)可能会导致常见查询的性能问题。

Consider a collection that stores an array field. If a document contains many more array elements than other documents in the collection, you may need to handle that document differently in your schema.考虑一个存储数组字段的集合。如果一个文档包含的数组元素比集合中的其他文档多得多,则可能需要在模式中以不同的方式处理该文档。

Use the outlier pattern to isolate documents that don't match the expected shape from the rest of your collection. Your schema still maintains all of the same data, but common queries are not affected by a single large document.使用异常模式将与预期形状不匹配的文档与集合的其他部分隔离开来。模式仍然维护所有相同的数据,但常见查询不受单个大型文档的影响。

Before You Begin开始之前

Before you modify your schema to handle outliers, consider the pros and cons of the outlier pattern:在修改模式以处理异常值之前,请考虑异常值模式的优缺点:

Pros优点

The outlier pattern improves performance for commonly-run queries. Queries that return typical documents do not need to also return large outlier documents.离群模式提高了常用查询的性能。返回典型文档的查询不需要同时返回大型异常文档。

The outlier pattern also handles edge cases in the application. For example, if your application typically displays 50 results from an array, there won't be a document that contains 2,000 results that disrupts the user experience.离群模式还可以处理应用程序中的边缘情况。例如,如果应用程序通常显示一个数组中的50个结果,则不会有包含2000个结果的文档破坏用户体验。

Cons缺点

The outlier pattern requires more complex logic to handle updates. If you frequently need to update your data, you may want to consider other schema design patterns. For more information, see Updates for Outliers.异常模式需要更复杂的逻辑来处理更新。如果您经常需要更新数据,您可能需要考虑其他模式设计模式。有关更多信息,请参阅异常值更新

About this Task关于此任务

Consider a schema that tracks book sales. Typical documents in the collection look like this:考虑一个跟踪图书销售的模式。集合中的典型文档如下:

db.sales.insertOne(
{
"_id": 1,
"title": "Invisible Cities",
"year": 1972,
"author": "Italo Calvino",
"customers_purchased": [ "user00", "user01", "user02" ]
}
)

The customers_purchased array is unbounded, meaning that as more customers purchase a book, the array grows larger. For most documents, this is not a problem because the store does not expect more than a few sales for a particular book.customers_purchased数组是无限的,这意味着随着越来越多的客户购买一本书,数组会越来越大。对于大多数文档来说,这不是问题,因为商店预计某本书的销量不会超过几本。

Suppose that a new, popular book results in a large number of purchases. The current schema design results in a bloated document, which negatively impacts performance. To address this issue, implement the outlier pattern for documents that don't have a typical amount of sales.假设一本新的畅销书引发了大量购买。当前的模式设计导致文档臃肿,对性能产生负面影响。为了解决这个问题,对没有典型销售额的文档实施离群模式。

Steps步骤

1

Identify a threshold for outliers.确定异常值的阈值。

Given your schema's typical document structure, identify when a document becomes an outlier. The threshold may be based on what the UI for your application demands, or what queries you run on your documents.根据模式的典型文档结构,确定文档何时成为异常值。阈值可能基于应用程序的UI需求,或者您在文档上运行的查询。

In this example, a book with more than 50 sales is an outlier.在这个例子中,一本销量超过50本的书是一个异类。

2

Decide how to handle outliers.决定如何处理异常值。

When addressing large arrays, a common way to handle outliers is to store values beyond the threshold in a separate collection. For books that have more than 50 sales, store the extra customers_purchased values in a separate collection.在处理大型数组时,处理异常值的一种常见方法是将超过阈值的值存储在单独的集合中。对于销量超过50本的书籍,将额外的customers_purchased值存储在单独的集合中。

3

Add an indicator for outlier documents.为异常文档添加一个指标。

For books that have more than 50 sales, add a new document field called has_extras and set the value to true. This field indicates that there are more sales stored in a separate collection.对于销量超过50本的书籍,添加一个名为has_extras的新文档字段,并将值设置为true。此字段表示在单独的集合中存储了更多的销售。

db.sales.insertOne(
{
"_id": 2,
"title": "The Wooden Amulet",
"year": 2023,
"author": "Lesley Moreno",
"customers_purchased": [ "user00", "user01", "user02", ... "user49" ],
"has_extras": true
}
)
4

Store additional sales in a separate collection.将额外销售额存储在单独的集合中。

Create a collection called extra_sales to store sales beyond the initial 50. Link documents from the extra_sales collection to the sales collection with a reference:创建一个名为extra_sales的集合,以存储超过最初50的销售额。将extra_sales集合中的文档链接到带有引用的sales集合:

db.extra_sales.insertOne(
{
"book_id": 2,
"customers_purchased_extra": [ "user50", "user51", "user52", ... "user999" ]
}
)

Results结果

The outlier pattern prevents atypical documents from impacting query performance. The resulting schema avoids large documents in the collection while maintaining a full list of sales.离群模式可防止非典型文档影响查询性能。由此产生的模式避免了集合中的大型文档,同时保留了完整的销售列表。

Consider an application page that shows information about a book and all users who bought that book. After implementing the outlier pattern, the page displays information for most books (typical documents) quickly.考虑一个应用程序页面,显示有关一本书和购买该书的所有用户的信息。在实现离群模式后,页面会快速显示大多数书籍(典型文档)的信息。

For popular books (outliers), the application performs an extra query in the extra_sales collection on book_id. To improve performance for this query, you can create an index on the book_id field.对于流行书籍(异常值),应用程序在book_idextra_sales集合中执行额外的查询。为了提高此查询的性能,您可以在book_id字段上创建索引。

Updates for Outliers异类更新

You need to handle updates for outlier documents differently than typical documents. The logic you use to perform updates depends on your schema design.您需要以不同于典型文档的方式处理异常文档的更新。您用于执行更新的逻辑取决于模式设计。

To perform updates for outliers for the preceding schema, implement the following application logic:要对前面模式的异常值执行更新,请实现以下应用程序逻辑:

  • Check if the document being updated has has_extras set to true.检查正在更新的文档是否已将has_extras设置为true

    • If has_extras is missing or false, add the new purchases to the sales collection.如果has_extras缺失或为false,请将新购买添加到sales集合中。

      • If the resulting customers_purchased array contains more than 50 elements, set has_extras to true.如果生成的customer_purchased数组包含50个以上的元素,请将has_extras设置为true
    • If has_extras is true, add the new purchases to the sales_extras collection for the corresponding book_id.如果has_extrastrue,则将新购买添加到相应book_idsales_extras集合中。

Learn More了解更多