Database Manual

Data Modeling数据建模

Data modeling refers to the organization of data within a database and the links between related entities. Data in MongoDB has a flexible schema model, which means:数据建模是指数据库中数据的组织以及相关实体之间的链接。MongoDB中的数据具有灵活的模式模型,这意味着:

Generally, documents in a collection share a similar structure. To ensure consistency in your data model, you can create schema validation rules.通常,集合中的文件共享类似的结构。为了确保数据模型的一致性,您可以创建模式验证规则

Use Cases用例

The flexible data model lets you organize your data to match your application's needs. MongoDB is a document database, meaning you can embed related data in object and array fields.灵活的数据模型允许您组织数据以满足应用程序的需求。MongoDB是一个文档数据库,这意味着您可以在对象和数组字段中嵌入相关数据。

A flexible schema is useful in the following scenarios:灵活的模式在以下场景中很有用:

  • Your company tracks which department each employee works in. You can embed department information inside of the employee collection to return relevant information in a single query.您的公司跟踪每个员工在哪个部门工作。您可以将部门信息嵌入employee集合中,以便在单个查询中返回相关信息。

  • Your e-commerce application shows the five most recent reviews when displaying a product. You can store the recent reviews in the same collection as the product data, and store older reviews in a separate collection because the older reviews are not accessed as frequently.您的电子商务应用程序在显示产品时显示五条最新评论。您可以将最近的评论与产品数据存储在同一个集合中,也可以将旧评论存储在单独的集合中,因为旧评论的访问频率较低。

  • Your clothing store needs to create a single-page application for a product catalog. Different products have different attributes, and therefore use different document fields. However, you can store all of the products in the same collection.你的服装店需要为产品目录创建一个单页应用程序。不同的产品具有不同的属性,因此使用不同的文档字段。但是,您可以将所有产品存储在同一集合中。

Schema Design: Differences between Relational and Document Databases模式设计:关系数据库和文档数据库之间的区别

When you design a schema for a document database like MongoDB, there are a couple of important differences from relational databases to consider.当你为MongoDB这样的文档数据库设计模式时,需要考虑与关系数据库的几个重要区别。

Relational Database Behavior关系数据库行为Document Database Behavior文档数据库行为
You must determine a table's schema before you insert data.在插入数据之前,必须确定表的架构。Your schema can change over time as the needs of your application change.随着应用程序需求的变化,您的模式可能会随着时间的推移而变化。
You often need to join data from several different tables to return the data needed by your application.您通常需要连接来自几个不同表的数据,以返回应用程序所需的数据。The flexible data model lets you store data to match the way your application returns data, and avoid joins. Avoiding joins across multiple collections improves performance and reduces your deployment's workload.灵活的数据模型允许您存储数据,以匹配应用程序返回数据的方式,并避免连接。避免跨多个集合的连接可以提高性能并减少部署的工作量。

Plan Your Schema规划你的架构

To ensure that your data model has a logical structure and achieves optimal performance, plan your schema prior to using your database at a production scale. 为了确保您的数据模型具有逻辑结构并实现最佳性能,请在以生产规模使用数据库之前规划您的模式。To determine your data model, use the following schema design process:要确定您的数据模型,请使用以下架构设计过程

  1. Identify your application's workload.确定应用程序的工作负载。
  2. Map relationships between objects in your collections.映射集合中对象之间的关系。
  3. Apply design patterns.应用设计模式。

Link Related Data链接相关数据

When you design your data model in MongoDB, consider the structure of your documents and the ways your application uses data from related entities.在MongoDB中设计数据模型时,请考虑文档的结构以及应用程序使用相关实体数据的方式。

To link related data, you can either:要链接相关数据,您可以:

  • Embed related data within a single document.将相关数据嵌入到单个文档中。
  • Store related data in a separate collection and access it with a reference.将相关数据存储在单独的集合中,并使用引用访问它。

Embedded Data嵌入式数据

Embedded documents store related data in a single document structure. A document can contain arrays and sub-documents with related data. 嵌入式文档将相关数据存储在单个文档结构中。文档可以包含数组和具有相关数据的子文档。These denormalized data models allow applications to retrieve related data in a single database operation.这些非规范化的数据模型允许应用程序在单个数据库操作中检索相关数据。

Data model with embedded fields that contain all related information.

For many use cases in MongoDB, the denormalized data model is optimal.对于MongoDB中的许多用例,非规范化数据模型是最佳的。

To learn about the strengths and weaknesses of embedding documents, see Embedded Data Models.要了解嵌入文档的优缺点,请参阅嵌入式数据模型

References参考文献

References store relationships between data by including links, called references, from one document to another. 引用通过包含从一个文档到另一个文档的链接(称为引用)来存储数据之间的关系。For example, a customerId field in an orders collection indicates a reference to a document in a customers collection.例如,orders集合中的customerId字段表示对客户集合中文档的引用。

Applications can resolve these references to access the related data. 应用程序可以解析这些引用以访问相关数据。Broadly, these are normalized data models.广义上讲,这些是标准化的数据模型。

Data model using references to link documents. Both the ``contact`` document and the ``access`` document contain a reference to the ``user`` document.

To learn about the strengths and weaknesses of using references, see References.要了解使用参考文献的优缺点,请参阅参考文献

Additional Data Modeling Considerations其他数据建模注意事项

The following factors can impact how you plan your data model.以下因素可能会影响您规划数据模型的方式。

Data Duplication and Consistency数据复制和一致性

When you embed related data in a single document, you may duplicate data between two collections. Duplicating data lets your application query related information about multiple entities in a single query while logically separating entities in your model.当您在单个文档中嵌入相关数据时,您可能会在两个集合之间复制数据。复制数据使您的应用程序可以在单个查询中查询有关多个实体的相关信息,同时在模型中逻辑上分离实体。

For example, a products collection stores the five most recent reviews in a product document. Those reviews are also stored in a reviews collection, which contains all product reviews. When a new review is written, the following writes occur:例如,产品集合将最近的五条评论存储在产品文档中。这些评论也存储在reviews集合中,其中包含所有产品评论。当撰写新的评论时,会出现以下内容:

  • The review is inserted into the reviews collection.该评论被插入到reviews集合中。
  • The array of recent reviews in the products collection is updated with $pop and $push.products集合中的一系列最新评论已更新为$pop$push

If the duplicated data is not updated often, then there is minimal additional work required to keep the two collections consistent. However, if the duplicated data is updated often, using a reference to link related data may be a better approach.如果重复的数据不经常更新,那么保持两个集合的一致性所需的额外工作就很少了。然而,如果重复数据经常更新,使用引用链接相关数据可能是更好的方法。

Before you duplicate data, consider the following factors:在复制数据之前,请考虑以下因素:

  • How often the duplicated data needs to be updated.重复数据需要多久更新一次。
  • The performance benefit for reads when data is duplicated.数据复制时读取的性能优势。

To learn more, see Handle Duplicate Data.要了解更多信息,请参阅处理重复数据

Indexing索引

To improve performance for queries that your application runs frequently, create indexes on commonly queried fields. As your application grows, monitor your deployment's index use to ensure that your indexes are still supporting relevant queries.为了提高应用程序频繁运行的查询的性能,请在常用查询字段上创建索引。随着应用程序的增长,监控部署的索引使用情况,以确保索引仍支持相关查询。

Hardware Constraints硬件限制

When you design your schema, consider your deployment's hardware, especially the amount of available RAM. Larger documents use more RAM, which may cause your application to read from disk and degrade performance. 在设计架构时,请考虑部署的硬件,特别是可用RAM的数量。较大的文档会使用更多的RAM,这可能会导致应用程序从磁盘读取并降低性能。When possible, design your schema so only relevant fields are returned by queries. This practice ensures that your application's working set does not grow unnecessarily large.如果可能,请设计您的模式,以便查询只返回相关字段。这种做法可确保应用程序的工作集不会不必要地变大。

Single Document Atomicity单文档原子性

In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document. This means that if an update operation affects several sub-documents, either all of those sub-documents are updated, or the operation fails entirely and no updates occur.在MongoDB中,写操作在单个文档级别上是原子性的,即使该操作修改了单个文档中的多个嵌入式文档。这意味着,如果更新操作影响多个子文档,则所有这些子文档都会被更新,或者操作完全失败,不会发生更新。

A denormalized data model with embedded data combines all related data in a single document instead of normalizing across multiple documents and collections. This data model allows atomic operations, in contrast to a normalized model where operations affect multiple documents.具有嵌入式数据的非规范化数据模型将所有相关数据组合在一个文档中,而不是跨多个文档和集合进行规范化。与操作影响多个文档的规范化模型相比,此数据模型允许原子操作。

For more information see Atomicity.有关更多信息,请参阅原子性

Learn More了解更多

Learn how to structure documents and define your schema in MongoDB University's Data Modeling course.在MongoDB大学的数据建模课程中学习如何构建文档和定义模式。