Model One-to-One Relationships with Embedded Documents与嵌入文档建立一对一关系模型

On this page本页内容

Overview概述

This page describes a data model that uses embedded documents to describe a one-to-one relationship between connected data. 本页描述了一个数据模型,该模型使用嵌入式文档来描述连接数据之间的一对一关系。Embedding connected data in a single document can reduce the number of read operations required to obtain data. 在单个文档中嵌入连接的数据可以减少获取数据所需的读取操作数。In general, you should structure your schema so your application receives all of its required information in a single read operation.一般来说,您应该构造模式,以便应用程序在一次读取操作中接收所有必需的信息。

Embedded Document Pattern嵌入式文档模式

Consider the following example that maps patron and address relationships. 考虑以下映射用户和地址关系的示例。The example illustrates the advantage of embedding over referencing if you need to view one data entity in context of the other. 如果需要在另一个数据实体的上下文中查看一个数据实体,该示例说明了嵌入优于引用的优势。In this one-to-one relationship between patron and address data, the address belongs to the patron.patronaddress数据之间的这种一对一关系中,address属于patron

In the normalized data model, the address document contains a reference to the patron document.在规范化数据模型中,address文档包含对patron文档的引用。

// patron document
{
   _id: "joe",
   name: "Joe Bookreader"
}
// address document
{
   patron_id: "joe", // reference to patron document
   street: "123 Fake Street",
   city: "Faketon",
   state: "MA",
   zip: "12345"
}

If the address data is frequently retrieved with the name information, then with referencing, your application needs to issue multiple queries to resolve the reference. 如果经常使用name信息检索address数据,那么使用引用时,应用程序需要发出多个查询来解析引用。The better data model would be to embed the address data in the patron data, as in the following document:更好的数据模型是将address数据嵌入patron数据中,如以下文档所示:

{
   _id: "joe",
   name: "Joe Bookreader",
   address: {
              street: "123 Fake Street",
              city: "Faketon",
              state: "MA",
              zip: "12345"
            }
}

With the embedded data model, your application can retrieve the complete patron information with one query.使用嵌入式数据模型,应用程序可以通过一次查询检索完整的用户信息。

Subset Pattern子集模式

A potential problem with the embedded document pattern is that it can lead to large documents that contain fields that the application does not need. 嵌入式文档模式的一个潜在问题是,它可能会导致包含应用程序不需要的字段的大型文档。This unnecessary data can cause extra load on your server and slow down read operations. 这些不必要的数据可能会导致服务器上的额外负载,并减慢读取操作。Instead, you can use the subset pattern to retrieve the subset of data which is accessed the most frequently in a single database call.相反,您可以使用子集模式来检索在单个数据库调用中访问频率最高的数据子集。

Consider an application that shows information on movies. 考虑一个显示电影信息的应用程序。The database contains a movie collection with the following schema:数据库包含具有以下架构的电影集:

{
  "_id": 1,
  "title": "The Arrival of a Train",
  "year": 1896,
  "runtime": 1,
  "released": ISODate("01-25-1896"),
  "poster": "http://ia.media-imdb.com/images/M/MV5BMjEyNDk5MDYzOV5BMl5BanBnXkFtZTgwNjIxMTEwMzE@._V1_SX300.jpg",
  "plot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, ...",
  "fullplot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, the line dissolves. The doors of the railway-cars open, and people on the platform help passengers to get off.",
  "lastupdated": ISODate("2015-08-15T10:06:53"),
  "type": "movie",
  "directors": [ "Auguste Lumière", "Louis Lumière" ],
  "imdb": {
    "rating": 7.3,
    "votes": 5043,
    "id": 12
  },
  "countries": [ "France" ],
  "genres": [ "Documentary", "Short" ],
  "tomatoes": {
    "viewer": {
      "rating": 3.7,
      "numReviews": 59
    },
    "lastUpdated": ISODate("2020-01-09T00:02:53")
  }
}

Currently, the movie collection contains several fields that the application does not need to show a simple overview of a movie, such as fullplot and rating information. 目前,movie集合包含几个应用程序不需要显示电影简单概述的字段,例如fullplot和分级信息。Instead of storing all of the movie data in a single collection, you can split the collection into two collections:您可以将集合拆分为两个集合,而不是将所有电影数据存储在一个集合中:

  • The movie collection contains basic information on a movie. movie集合包含电影的基本信息。This is the data that the application loads by default:这是应用程序默认加载的数据:

    // movie collection
    {
      "_id": 1,
      "title": "The Arrival of a Train",
      "year": 1896,
      "runtime": 1,
      "released": ISODate("1896-01-25"),
      "type": "movie",
      "directors": [ "Auguste Lumière", "Louis Lumière" ],
      "countries": [ "France" ],
      "genres": [ "Documentary", "Short" ],
    }
  • The movie_details collection contains additional, less frequently-accessed data for each movie:movie_details集合包含每部电影的附加数据,访问频率较低:

    // movie_details collection
    {
      "_id": 156,
      "movie_id": 1, // reference to the movie collection
      "poster": "http://ia.media-imdb.com/images/M/MV5BMjEyNDk5MDYzOV5BMl5BanBnXkFtZTgwNjIxMTEwMzE@._V1_SX300.jpg",
      "plot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, ...",
      "fullplot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, the line dissolves. The doors of the railway-cars open, and people on the platform help passengers to get off.",
      "lastupdated": ISODate("2015-08-15T10:06:53"),
      "imdb": {
        "rating": 7.3,
        "votes": 5043,
        "id": 12
      },
      "tomatoes": {
        "viewer": {
          "rating": 3.7,
          "numReviews": 59
        },
        "lastUpdated": ISODate("2020-01-29T00:02:53")
      }
    }

This method improves read performance because it requires the application to read less data to fulfill its most common request. 这种方法提高了读取性能,因为它需要应用程序读取更少的数据来满足其最常见的请求。The application can make an additional database call to fetch the less-frequently accessed data if needed.如果需要,应用程序可以进行额外的数据库调用,以获取访问频率较低的数据。

Tip提示

When considering where to split your data, the most frequently-accessed portion of the data should go in the collection that the application loads first.在考虑将数据拆分到哪里时,数据中访问频率最高的部分应该放在应用程序首先加载的集合中。

Tip提示
See also: 参阅:

To learn how to use the subset pattern to model one-to-many relationships between collections, see Model One-to-Many Relationships with Embedded Documents.要了解如何使用子集模式对集合之间的一对多关系建模,请参阅对嵌入文档的一对多关系建模

Trade-Offs of the Subset Pattern子集模式的权衡

Using smaller documents containing more frequently-accessed data reduces the overall size of the working set. 使用包含更频繁访问的数据的较小文档可以减少工作集的总体大小。These smaller documents result in improved read performance and make more memory available for the application.这些较小的文档可以提高读取性能,并为应用程序提供更多内存。

However, it is important to understand your application and the way it loads data. 但是,了解应用程序及其加载数据的方式很重要。If you split your data into multiple collections improperly, your application will often need to make multiple trips to the database and rely on JOIN operations to retrieve all of the data that it needs.如果不正确地将数据拆分为多个集合,应用程序通常需要多次访问数据库,并依靠JOIN操作来检索所需的所有数据。

In addition, splitting your data into many small collections may increase required database maintenance, as it may become difficult to track what data is stored in which collection.此外,将数据拆分为许多小集合可能会增加所需的数据库维护,因为可能很难跟踪哪个集合中存储了哪些数据。

←  Model Relationships Between DocumentsModel One-to-Many Relationships with Embedded Documents →