Database Manual / Data Modeling / Schema Design Anti-Patterns

Avoid Unbounded Arrays避免无边界数组

Storing arrays as field values lets you embed data and ensure that data that is accessed together is stored together. However, if you do not limit the number of elements in an array, your documents might exceed the 16MB BSON document size limit. 将数组存储为字段值可以嵌入数据,并确保一起访问的数据存储在一起。但是,如果不限制数组中的元素数量,文档可能会超过16MB BSON文档大小限制An unbounded array can strain application resources and decrease index performance.无界数组会使应用程序资源紧张并降低索引性能。

Instead of embedding entire datasets, use subsetting and referencing to bound arrays, which can improve performance and maintain manageable document sizes. When you subset data, you select only the necessary parts of your data to work with, which reduces memory usage and processing time by focusing only on relevant data. 与其嵌入整个数据集,不如使用子集和引用绑定数组,这可以提高性能并保持可管理的文档大小。当你对数据进行子集时,你只选择数据中必要的部分来处理,这通过只关注相关数据来减少内存使用和处理时间。When you reference data, you link to external data sources rather than embedding them directly in your documents. This approach enhances performance and reduces document size. By using subsetting and referencing, you can bound arrays and manage your date more efficiently.引用数据时,您链接到外部数据源,而不是将它们直接嵌入到文档中。这种方法提高了性能并减小了文档大小。通过使用子集和引用,您可以绑定数组并更有效地管理日期。

Example示例

Consider the following schema that tracks book reviews for a bookstore application. The initial schema uses an array for the reviews field.考虑以下模式,该模式跟踪书店应用程序的书评。初始模式使用数组作为reviews字段。

{
title: "Harry Potter",
author: "J.K. Rowling",
publisher: "Scholastic",
reviews: [
{
user: "Alice",
review: "Great book!",
rating: 5
},
{
user: "Bob",
review: "Didn't like it!",
rating: 1
},
{
user: "Charlie",
review: "Not bad, but could be better.",
rating: 3
}
]
}

In this schema, the reviews field is an unbounded array. Every time a new review is created for this book, the application adds a new sub-document to the reviews array. As more reviews are added, the array can grow too large and strain application resources.在这个模式中,reviews字段是一个无界数组。每当为这本书创建新的评论时,应用程序都会向评论数组中添加一个子文档。随着评论数量的增加,数组可能会变得太大,给应用程序资源带来压力。

In this example, the bookstore application only needs to show three book reviews per book. To avoid unbounded arrays, you can use the subset design pattern or document references, depending on your use case.在这个例子中,书店应用程序只需要为每本书显示三条书评。为了避免无界数组,您可以使用子集设计模式或文档引用,具体取决于用例。

Subset Pattern子集模式

Subsetting data is best for when you need quick access to data that is not frequently updated. Using the subset pattern, you can embed three of the reviews in the book document to return all required information in a single operation. 子集数据最适合您需要快速访问不经常更新的数据。使用子集模式,您可以在图书文档中嵌入三条评论,在一次操作中返回所有必需的信息。The other reviews are stored in a separate reviews collection. This schema design pattern provides the following benefits:其他评论存储在单独的reviews集合中。此模式设计模式具有以下优点:

  • Eliminate the unbounded array消除无界数组
  • Control the document size控制文档大小
  • Avoid use of multiple queries避免使用多个查询

The books collection:books集合:

db.books.insertOne( [
{
title: "Harry Potter",
author: "J.K. Rowling",
publisher: "Scholastic",
reviews: [
{
reviewer: "Alice",
review: "Great book!",
rating: 5
},
{
reviewer: "Charlie",
review: "Didn't like it.",
rating: 1
},
{
reviewer: "Bob",
review: "Not bad, but could be better.",
rating: 3
}
],
}
] )

The reviews collection:reviews(评论)集合:

db.reviews.insertMany( [
{
reviewer: "Jason",
review: "Did not enjoy!",
rating: 1
},
{
reviewer: "Pam",
review: "Favorite book!",
rating: 5
},
{
reviewer: "Bob",
review: "Not bad, but could be better.",
rating: 3
}
] )

This approach duplicates data which causes updates to be expensive. This approach is best if reviews are not frequently updated.这种方法复制数据,导致更新成本高昂。如果评论不经常更新,这种方法是最好的。

Reference Data参考数据

Referencing data is best for when you need to manage large or frequently updated datasets without inflating document sizes.当您需要管理大型或频繁更新的数据集而不扩大文档大小时,引用数据是最好的选择。

To reference data, store reviews in a separate collection and add a review_id field to the documents in the reviews collection. Use the review_id field to reference the reviews in the books collection.要引用数据,请将评论存储在单独的集合中,并将review_id字段添加到评论集合中的文档中。使用review_id字段引用books集合中的评论。

This approach solves the problem of the unbounded array, but it introduces latency because you need to query the reviews collection to retrieve review information for the books collection. Depending on your use case, this additional latency may be an acceptable trade-off to avoid the issues caused by unbounded arrays.这种方法解决了无界数组的问题,但它引入了延迟,因为您需要查询reviews集合来检索books集合的评论信息。根据使用情况,这种额外的延迟可能是可以接受的权衡,以避免无界数组造成的问题。

The books collection:books集合:

db.books.insertMany( [
{
title: "Harry Potter",
author: "J.K. Rowling",
publisher: "Scholastic",
reviews: ["review1", "review2", "review3"]
},
{
title: "Pride and Prejudice",
author: "Jane Austen",
publisher: "Penguin",
reviews: ["review4", "review5"]
}
] )

The reviews collection:reviews集合:

db.reviews.insertMany( [
{
review_id: "review1",
reviewer: "Jason",
review: "Did not enjoy!",
rating: 1
},
{
review_id: "review2",
reviewer: "Pam",
review: "Favorite book!",
rating: 5
},
{
review_id: "review3",
reviewer: "Bob",
review: "Not bad, but could be better.",
rating: 3
},
{
review_id: "review4",
reviewer: "Tina",
review: "Amazing!",
rating: 5
},
{
review_id: "review5",
reviewer: "Jacob",
review: "A little overrated",
rating: 4,
}
] )

Use $lookup to Join on an Array Field使用$lookup连接数组字段

If your books and reviews information is stored in separate collections, the application needs to perform a $lookup operation to join the data.如果booksreviews信息存储在单独的集合中,则应用程序需要执行$lookup操作来连接数据。

The following aggregation operation joins the books and reviews collection from the previous example.以下聚合操作将加入前面示例中的booksreviews集合。

db.books.aggregate( [
{
$lookup: {
from: "reviews",
localField: "reviews",
foreignField: "review_id",
as: "reviewDetails"
}
}
] )

The operation returns the following:该操作返回以下内容:

[
{
_id: ObjectId('665de81eeda086b5e22dbcc9'),
title: 'Harry Potter',
author: 'J.K. Rowling',
publisher: 'Scholastic',
reviews: [ 'review1', 'review2', 'review3' ],
reviewDetails: [
{
_id: ObjectId('665de82beda086b5e22dbccb'),
review_id: 'review1',
reviewer: 'Jason',
review: 'Did not enjoy!',
rating: 1
},
{
_id: ObjectId('665de82beda086b5e22dbccc'),
review_id: 'review2',
reviewer: 'Pam',
review: 'Favorite book!',
rating: 5
},
{
_id: ObjectId('665de82beda086b5e22dbccd'),
review_id: 'review3',
reviewer: 'Bob',
review: 'Not bad, but could be better.',
rating: 3
} ]
},
{
_id: ObjectId('665de81eeda086b5e22dbcca'),
title: 'Pride and Prejudice',
author: 'Jane Austen',
publisher: 'Penguin',
reviews: [ 'review4', 'review5' ],
reviewDetails: [
{
_id: ObjectId('665de82beda086b5e22dbcce'),
review_id: 'review4',
reviewer: 'Tina',
review: 'Amazing!',
rating: 5
},
{
_id: ObjectId('665de82beda086b5e22dbccf'),
review_id: 'review5',
reviewer: 'Jacob',
review: 'A little overrated',
rating: 4
} ]
}
]

In this example, the $lookup operation joins the books collection with the reviews collection using the reviews array in the book document and the review_id field in the reviews documents. 在这个例子中,$lookup操作使用book文档中的reviews数组和reviews文档中的review_id字段将books集合与reviews集合连接起来。The reviewDetails document stores the combined data.reviewDetails文档存储组合数据。

Learn More了解更多