If you need to store historical data dating back a number of years, storing your oldest data in the same database as your more recent data can negatively impact performance, especially if the old data does not need to be accessed frequently. Instead, you can design your schema to archive old data and move that data to a separate storage location.如果您需要存储多年前的历史数据,将最旧的数据与最新的数据存储在同一个数据库中可能会对性能产生负面影响,特别是在不需要频繁访问旧数据的情况下。相反,您可以设计模式来存档旧数据并将该数据移动到单独的存储位置。
About this Task关于此任务
There are multiple options for how to store your archived data. For example, you can:如何存储存档数据有多种选择。例如,您可以:
Move data to external file storage such as Amazon S3.将数据移动到外部文件存储,如Amazon S3。Move data to a separate, cheaper cluster.将数据移动到单独的、更便宜的集群。Move data to a separate collection on the same cluster.将数据移动到同一集群上的单独集合。
In most cases, moving data to external file storage is the best option in terms of cost and performance. If external file storage is not possible for your use case, consider moving data to a separate cluster or collection.在大多数情况下,将数据移动到外部文件存储是成本和性能方面的最佳选择。如果用例无法使用外部文件存储,请考虑将数据移动到单独的集群或集合。
Tips for Data Archival数据归档技巧
Before you implement a data archival design pattern, review these best practices:在实施数据归档设计模式之前,请查看以下最佳实践:
Your archived data should use an embedded data model, rather than using references to other collections. When you query archived data, all relevant components of the data must be from the same time.存档数据应该使用嵌入式数据模型,而不是使用对其他集合的引用。查询存档数据时,数据的所有相关组件必须来自同一时间。Embedding data ensures that queries return the related data together.嵌入数据可确保查询同时返回相关数据。The document's age should be contained in a single field.文档的年龄应包含在一个字段中。If you have documents that should never expire or move to the archive, set the document age to如果您有永远不会过期或移动到存档的文档,请将文档年龄设置为keep forever, or a similar string to indicate that the document must stay in the active collection.keep forever(永久保存),或设置类似的字符串以指示文档必须保留在活动集合中。MongoDB Atlas offers Online Archive, which moves infrequently-accessed data from your Atlas cluster to a MongoDB-managed read-only Federated Database Instance on a cloud object storage.MongoDB Atlas提供在线存档,它将不常访问的数据从Atlas集群移动到云对象存储上的MongoDB管理的只读联邦数据库实例。
Scenario场景
In this example, an e-commerce store wants to archive data for sales that occurred more than five years ago. The initial dataset contains all sales, and documents for older sales will be moved to a separate collection.在这个例子中,一家电子商务商店想要存档五年多前发生的销售数据。初始数据集包含所有销售,旧销售的文档将被移动到单独的集合中。
Steps步骤
Populate sample data填充样本数据
db.sales.insertMany( [
{
customer_name: "Hiroshi Tanaka",
products: [
{
product_id: "P1001",
name: "Wireless Headphones",
quantity: 1,
price: 59.99
},
{
product_id: "P1002",
name: "Phone Charger",
quantity: 2,
price: 14.99
}
],
total_amount: 89.97,
date: ISODate("2025-01-30T10:15:00Z")
},
{
customer_name: "Aisha Khan",
products: [
{
product_id: "P1003",
name: "Laptop",
quantity: 1,
price: 899.99
}
],
total_amount: 899.99,
date: ISODate("2018-11-20T15:45:00Z") // Over 5 years ago
},
{
customer_name: "Fatima Al-Farsi",
products: [
{
product_id: "P1006",
name: "Gaming Mouse",
quantity: 1,
price: 49.99
},
{
product_id: "P1007",
name: "Mechanical Keyboard",
quantity: 1,
price: 129.99
}
],
total_amount: 179.98,
date: ISODate("2017-06-15T12:00:00Z") // Over 5 years ago
},
{
customer_name: "Nguyen Minh",
products: [
{
product_id: "P1008",
name: "Bluetooth Speaker",
quantity: 2,
price: 39.99
}
],
total_amount: 79.98,
date: ISODate("2025-01-26T09:20:00Z")
}
] )Write a script to archive old documents编写一个脚本来存档旧文档
Note
The following script uses MongoDB Shell syntax. To see aggregation and query syntax for your driver, see your driver documentation.以下脚本使用MongoDB Shell语法。要查看驱动程序的聚合和查询语法,请参阅驱动程序文档。
// Set a variable to five years before the time that the script runs将变量设置为脚本运行时间前五年
const fiveYearsAgo = new Date();
fiveYearsAgo.setFullYear(fiveYearsAgo.getFullYear() - 5);
// Write old sales to the 'archived_sales' collection
const writeOldSalesPipeline = [
{
$match: {
date: { $lt: fiveYearsAgo }
}
},
{
$merge: {
into: {
db: "test",
coll: "archived_sales",
},
on: "_id"
}
}
]
db.sales.aggregate(writeOldSalesPipeline)
// Delete old sales from the active 'sales' collection从活动“销售”集合中删除旧销售
try {
db.sales.deleteMany(
{ date : { $lt: fiveYearsAgo } }
);
} catch (e) {
print (e);
}Results结果
After you run the script, the 运行脚本后,sales collection no longer contains sales that occurred more than five years ago.sales集合不再包含五年多前发生的销售。
db.sales.find()
Output:输出:
[
{
_id: ObjectId('679ced18fa29d32ca7d1abab'),
customer_name: 'Hiroshi Tanaka',
products: [
{
product_id: 'P1001',
name: 'Wireless Headphones',
quantity: 1,
price: 59.99
},
{
product_id: 'P1002',
name: 'Phone Charger',
quantity: 2,
price: 14.99
}
],
total_amount: 89.97,
date: ISODate('2025-01-30T10:15:00.000Z')
},
{
_id: ObjectId('679ced18fa29d32ca7d1abae'),
customer_name: 'Nguyen Minh',
products: [
{
product_id: 'P1008',
name: 'Bluetooth Speaker',
quantity: 2,
price: 39.99
}
],
total_amount: 79.98,
date: ISODate('2025-01-26T09:20:00.000Z')
}
]
Old sales now exist in the 旧销售现在存在于archived_sales collection.archived_sales集合中。
db.archived_sales.find()
Output:输出:
[
{
_id: ObjectId('679ced18fa29d32ca7d1abac'),
customer_name: 'Aisha Khan',
products: [
{
product_id: 'P1003',
name: 'Laptop',
quantity: 1,
price: 899.99
}
],
total_amount: 899.99,
date: ISODate('2018-11-20T15:45:00.000Z')
},
{
_id: ObjectId('679ced18fa29d32ca7d1abad'),
customer_name: 'Fatima Al-Farsi',
products: [
{
product_id: 'P1006',
name: 'Gaming Mouse',
quantity: 1,
price: 49.99
},
{
product_id: 'P1007',
name: 'Mechanical Keyboard',
quantity: 1,
price: 129.99
}
],
total_amount: 179.98,
date: ISODate('2017-06-15T12:00:00.000Z')
}
]