Database Manual / Administration / Performance

Performance Tuning性能调优

MongoDB deployments can support large-scale databases with high transaction volumes, making performance tuning essential. Regular tuning helps identify issues within the cluster early, allowing you to address them before they impact system responsiveness or stability.MongoDB部署可以支持具有高事务量的大型数据库,因此性能调优至关重要。定期调优有助于及早发现集群内的问题,使您能够在问题影响系统响应性或稳定性之前加以解决。

This document addresses some common methods to optimize your deployment performance by using performance tuning and helpful metrics. These methods apply to both MongoDB Atlas clusters and self-managed deployments. 本文档介绍了一些通过使用性能调优和有用的指标来优化部署性能的常见方法。这些方法适用于MongoDB Atlas集群和自我管理部署。However, the tuning process is significantly easier with MongoDB Atlas, which automates many tasks and streamlines for efficiency. 然而,MongoDB Atlas的调优过程要容易得多,它自动化了许多任务并简化了效率。For more information on performance, see MongoDB Performance.有关性能的更多信息,请参阅MongoDB性能

Run Your Queries at Top Speed以最高速度运行查询

To ensure optimal query performance, you can use metrics that reveal query performance problems and tell you what to do if you find slow queries.为了确保最佳的查询性能,您可以使用指标来揭示查询性能问题,并在发现查询速度慢时告诉您该怎么办。

MongoDB log files record the execution time and method for each query, allowing you to search for slow queries. The database profiler logs queries exceeding a specified threshold.MongoDB日志文件记录了每个查询的执行时间和方法,允许您搜索慢速查询。数据库分析器记录超过指定阈值的查询。

If a query is slow, first access your query plans. For more information on finding query plan data, see Explain Results.如果查询速度较慢,请先访问查询计划。有关查找查询计划数据的更多信息,请参阅解释结果

  • Ensure that your query performed an index scan, rather than a collection scan.确保查询执行了索引扫描,而不是集合扫描。

    An index scan limits the number of documents that MongoDB inspects, while a collection scan requires that MongoDB reads all documents in a collection. To learn more about how to interpret plan results, see Interpret Explain Plan Results.索引扫描限制了MongoDB检查的文档数量,而集合扫描要求MongoDB读取集合中的所有文档。要了解有关如何解释计划结果的更多信息,请参阅解释计划结果

  • If you see a lot of collection scans in your explain plan results, consider adding an index.如果您在解释计划结果中看到很多集合扫描,请考虑添加索引

    Note

    Indexes can slow down writes and updates, so having too many underutilized indexes may hinder document modifications or insertions, depending on your workload.索引会减慢写入和更新的速度,因此,根据工作负载,有太多未充分利用的索引可能会阻碍文档的修改或插入。

Query Metrics查询指标

You can also use the following query metrics to ensure your query is running at top speed:您还可以使用以下查询指标来确保查询以最高速度运行:

  • metrics.queryExecutor.scanned tells you how many documents were scanned to return your query results.告诉您扫描了多少文档以返回查询结果。

    • Ideally, the ratio of scanned documents to returned documents is 1:1, which means MongoDB returns all documents. Typically, the ratio is greater than 1, indicating MongoDB does not return some scanned documents.理想情况下,扫描文档与返回文档的比例为1:1,这意味着MongoDB返回所有文档。通常,该比率大于1,表示MongoDB不返回一些扫描的文档。
    • The ratio can be less than 1 or even 0, indicating a covered query where the index contains all necessary data.该比率可以小于1,甚至小于0,表示索引包含所有必要数据的覆盖查询。
    • If MongoDB is scanning large numbers of documents to respond to your query, you may be missing indexes or need to optimize your query.如果MongoDB正在扫描大量文档以响应查询,则可能缺少索引或需要优化查询。
  • metrics.operation.scanAndOrder indicates the server's effort to sort query results.表示服务器对查询结果进行排序的努力。

    • A high Scan and Order number, such as 20 or more, indicates that the server is having to sort results, increasing query result time and server memory load.高扫描和订单号(如20或更多)表示服务器必须对结果进行排序,从而增加了查询结果时间和服务器内存负载。
    • To fix a high Scan and Order number, sort your indexes according to query requirements, or add any missing indexes. Generally, sort b-tree indexes in ascending order from the leading field in the index, if it's a compound index.要修复高扫描和订单号,请根据查询要求对索引进行排序,或添加任何缺失的索引。通常,如果是复合索引,则从索引中的前导字段开始按升序对b树索引进行排序。
  • The WiredTiger Ticket Number metric reflects the performance of the WiredTiger storage engine.WiredTiger Ticket Number指标反映了WiredTiger存储引擎的性能。

    • WiredTiger read and write tickets are the WiredTiger storage engine's concurrency control mechanism to manage the number of concurrent transactions. Starting in version 7.0, MongoDB uses a dynamic algorithm to adjust the maximum number of concurrent storage engine transactions, optimizing database throughput during cluster overload.WiredTiger读写票是WiredTiger存储引擎的并发控制机制,用于管理并发事务的数量。从7.0版本开始,MongoDB使用动态算法来调整并发存储引擎事务的最大数量,从而在集群过载期间优化数据库吞吐量。
    • The read and write tickets control the maximum number of concurrent transactions. The WiredTiger ticket number should always be at 128. Sustained values below 128 indicates a server delay and consequential potential issues.读写票控制并发事务的最大数量。WiredTiger的票号应始终为128。持续低于128的值表示服务器延迟和由此产生的潜在问题。
    • You can use the serverStatus command to check the current number of read and write tickets and their usage. Look at the queues.execution section to understand the current load and ticket availability.您可以使用serverStatus命令检查当前读写票证的数量及其使用情况。查看queues.execution部分,了解当前的负载和票证可用性。
    • To remedy a low WiredTiger ticket number:要解决WiredTiger票号低的问题:

      • Ensure that the Dynamic Adjustment feature is enabled to manage ticket allocation automatically.确保启用动态调整功能以自动管理票证分配。
      • Ensure that your cluster has sufficient resources, such as CPU and memory, to handle the workload.确保集群有足够的资源(如CPU和内存)来处理工作负载。
      • If you are using MongoDB 3.2 or earlier, upgrade to a later version that uses WiredTiger.如果您使用的是MongoDB 3.2或更早版本,请升级到使用WiredTiger的更高版本。
      • If you need to manually adjust the maximum number of concurrent transactions, you can modify the storageEngineConcurrentReadTransactions and storageEngineConcurrentWriteTransactions parameters.如果需要手动调整并发事务的最大数量,可以修改storageEngineConcurrentReadTransactionsstorageEngineConcurrentWriteTransactions参数。

Note

Take caution when modifying storageEngineConcurrentReadTransactions and storageEngineConcurrentWriteTransactions, as changing these settings can lead to performance issues or errors. We recommend you consult with MongoDB Support before changing these parameters.修改storageEngineConcurrentReadTransactionsstorageEngineConcurrentWriteTransactions时请小心,因为更改这些设置可能会导致性能问题或错误。我们建议您在更改这些参数之前咨询MongoDB支持部门。

Document Structure Antipatterns文档结构反模式

The query plan does not contain any metrics to reveal document structure antipatterns, but you can look for antipatterns when debugging slow queries. Be careful of the following most common bad query practices that hurt performance:查询计划不包含任何指标来揭示文档结构反模式,但您可以在调试慢速查询时查找反模式。请注意以下影响性能的最常见的错误查询做法:

  • Unbound arrays: Arrays in a document that can grow without a size limit cause performance problems, because each time you update the array, MongoDB must rewrite the array into the document. 未绑定数组:文档中的数组可以在没有大小限制的情况下增长,这会导致性能问题,因为每次更新数组时,MongoDB都必须将数组重写到文档中。For more information, see Avoid Unbounded Arrays.有关详细信息,请参阅避免使用无界数组
  • Embedded documents without bounds: MongoDB supports inserting documents within documents, with up to 128 levels of nesting. Each MongoDB document, including embedded documents, has a size limit of 16MB. An excessive number of embedded documents can result in performance problems.无边界的嵌入式文档:MongoDB支持在文档中插入文档,最多128级嵌套。每个MongoDB文档(包括嵌入式文档)的大小限制为16MB。过多的嵌入式文档可能会导致性能问题。

    To mitigate excessive embedded documents, move embedded documents to separate collections and reference them from the original document. For more information, see Bloated Documents.为了减少过多的嵌入文档,请将嵌入文档移动到单独的集合中,并从原始文档中引用它们。有关详细信息,请参阅Bloated文档

Ensure a Top Speed Database确保数据库达到最高速度

MongoDB has thousands of metrics that track all aspects of database performance, including reading, writing, and querying the database, as well as making sure background maintenance tasks like backups don't hinder performance. The following metrics help indicate problems with your database so you can ensure its optimal performance.MongoDB有数千个指标来跟踪数据库性能的各个方面,包括读取、写入和查询数据库,以及确保备份等后台维护任务不会影响性能。以下指标有助于指出数据库的问题,以便您确保其最佳性能。

Replication Lag复制延迟

Replication lag occurs when a secondary member of a replica set falls behind the primary. To understand the cause of your replication lag, you can examine the oplog-related metrics. However, the following problems are the most common causes of replication lag:复制延迟;当副本集的次要成员落后于主要成员时,就会发生这种情况。要了解复制延迟的原因,可以检查oplog相关指标。然而,以下问题是复制延迟的最常见原因:

  • A networking issue between the primary and secondary, making nodes unreachable主节点和次节点之间的网络问题,导致节点无法访问
  • A secondary node applying data slower than the primary node辅助节点应用数据的速度比主节点慢
  • Insufficient write capacity, in which case you should add more shards写入容量不足,在这种情况下,您应该添加更多分片
  • Slow operations on the primary node, blocking replication主节点上的操作缓慢,阻止复制

Locking Performance Problems锁定性能问题

MongoDB's internal locking system is used to support simultaneous queries while avoiding write conflicts and inconsistent reads. Performance problems that are the result of locking occur when the remaining number of available read or write tickets reaches zero, meaning any new read or write requests will be queued until a new read or write ticket is available.MongoDB的内部锁定系统用于支持同时查询,同时避免写入冲突和不一致的读取。当剩余的可用读或写票证数量达到零时,就会出现锁定导致的性能问题,这意味着任何新的读或写请求都将排队,直到有新的读/写票证可用。

Locking performance problems can indicate suboptimal indexes and poor schema design patterns, which can both lead to locks being held longer than necessary.锁定性能问题可能表明索引不够理想和模式设计模式不佳,这都会导致锁的持有时间超过必要时间。

Open Cursors打开游标

If the number of open cursors is rising without a corresponding growth of traffic, this might be the result of poorly indexed queries, or long-running queries due to large result sets.如果打开的游标数量在增加,而流量没有相应增长,这可能是索引不佳的查询或由于结果集较大而导致的长时间运行的查询的结果。

Overloaded Clusters集群过载

When performance tuning, it is important to recognize when your total traffic, or the throughput of transactions through the system, is rising beyond the planned capacity of your cluster. By keeping track of growth in throughput, you can expand your cluster's capacity efficiently.在性能调优时,重要的是要认识到总流量或通过系统的事务吞吐量何时超过了集群的计划容量。通过跟踪吞吐量的增长,您可以有效地扩展集群的容量。

The following metrics can help you track your cluster's throughput. To find these metrics, run the serverStatus command and examine the fields specified below.以下指标可以帮助您跟踪集群的吞吐量。要查找这些指标,请运行serverStatus命令并检查下面指定的字段。

Read and Write Operations读写操作

The Read and Write Operations metrics indicate how much work the cluster does. 读写操作指标指示集群做了多少工作。You can find read operations through the opcounters.query field and write operations through opcounters.insert, opcounters.update, and opcounters.delete, which count the total number of insert, update, and delete operations, respectively.您可以通过opcounters.query字段找到读取操作,通过opcounters.insertopcounters.updateopcounters.delete找到写入操作,这些操作分别计算插入、更新和删除操作的总数。

The ratio of reads to writes depends on the nature of the workloads running on the cluster.读写比率取决于集群上运行的工作负载的性质。

  • Monitoring read and write operations over time allows normal ranges and thresholds to be established.随着时间的推移,监控读写操作可以建立正常的范围和阈值。
  • As trends in read and write operations show growth in throughput, you can gradually increase capacity.随着读写操作的趋势显示吞吐量的增长,您可以逐步增加容量。

Document Metrics and Query Executor文档度量和查询执行器

Document Metrics and Query Executor indicate if the cluster is too busy. Similarly to the Read and Write operations metric, there is no right or wrong number for these metrics, but having a good idea of what's normal helps you discern whether poor performance is coming from large workload size or attributable to other reasons.文档度量;以及;查询执行人;指示集群是否太忙。与读写操作指标类似,这些指标没有正确或错误的数字,但对正常情况有一个很好的了解可以帮助您辨别性能不佳是来自大工作负载还是其他原因。

To retrieve Document Metrics, access the metrics.keysExamined and metrics.totalExecMicros fields. To retrieve Query Executor metrics, examine the metrics.fromPlanCache field. 要检索文档指标,请访问metrics.keysExaminedmetrics.totalExecMicros字段。要检索查询执行器指标,请检查metrics.fromPlanCache字段。You can find all of these fields using the $queryStats aggregation stage.您可以使用$queryStats聚合阶段找到所有这些字段。

  • MongoDB updates document metrics anytime you find a document or insert a document. The more documents that you find, insert, update or delete, the busier your cluster is.MongoDB会在您找到文档或插入文档时随时更新文档指标。您查找、插入、更新或删除的文档越多,集群就越繁忙。

    • Poor performance in a cluster that has plenty of capacity usually indicates query problems.在容量充足的集群中,性能不佳通常表明存在查询问题。
  • The query executor tells you how many queries are being processed by using two data points:查询执行器使用两个数据点告诉正在处理的查询数量:

    • Scanned: The average rate per second over the selected sample period of index items scanned during queries and query-plan evaluation.扫描:在查询和查询计划评估期间扫描的索引项在所选采样期内的平均每秒速率。
    • Scanned objects: The average rate per second over the selected sample period of documents scanned during queries and query-plan evaluation.扫描对象:在查询和查询计划评估期间,所选采样期内每秒扫描文档的平均速率。

Hardware and Network Metrics硬件和网络指标

Hardware and Network metrics can indicate that throughput is rising and will exceed the capacity of computing infrastructure. These metrics are gathered from the operating system and networking infrastructure. To make these metrics useful for diagnostic purposes, you must have a sense of what is normal.硬件和网络指标可以表明吞吐量正在上升,并将超过计算基础设施的容量。这些指标是从操作系统和网络基础设施中集合的。为了使这些指标对诊断有用,您必须了解什么是正常的。

  • If you are running MongoDB on-premises, you may be able to view hardware and network metrics using Ops Manager, depending on your operating system.如果您在本地运行MongoDB,则可以使用Ops Manager查看硬件和网络指标,具体取决于操作系统。
  • While there are many metrics to track, some important metrics to have a baseline range for are:虽然有许多指标需要跟踪,但一些需要设定基线范围的重要指标是:

    • Disk latency磁盘延迟
    • Disk IOPS磁盘IOPS
    • Number of Connections连接数

Cluster and Key Resources集群和关键资源

A MongoDB cluster uses a variety of resources that the underlying computing and networking infrastructure provides.MongoDB集群使用底层计算和网络基础设施提供的各种资源。

Number of Client Connections客户端连接数

The Current Number of Client Connections metric, located in the connections.current field in the serverStatus document, can indicate total load on a system. Keeping track of normal ranges at various times of the day or week can help you quickly identify spikes in traffic.当前客户端连接数;位于serverStatus文档的connections.current字段中的指标可以指示系统上的总负载。跟踪一天或一周中不同时间的正常范围可以帮助您快速识别流量峰值。

A related metric, percentage of connections used, can indicate when MongoDB is getting close to running out of available connections.一个相关的指标,即使用的连接百分比,可以指示MongoDB何时接近耗尽可用连接。

Storage Metrics存储指标

Storage metrics track how MongoDB uses persistent storage. In the WiredTiger storage engine, each collection and each index are individual files. When you update a document in a collection, MongoDB re-writes the entire document.存储指标;跟踪MongoDB如何使用持久存储。在WiredTiger存储引擎中,每个集合和每个索引都是单独的文件。当您更新集合中的文档时,MongoDB会重写整个文档。

  • If memory space metrics such as dbStats.dataSize, dbStats.indexSize, dbStats.storageSize, or the number of documents in the database show a significant unexpected change while the database traffic stays within ordinary ranges, it can indicate problems such as data deletion or corruption, unexpected data growth, or index changes.如果内存空间指标(如dbStats.dataSizedbStats.indexSizedbStats.storageSize或数据库中的文档数量)显示出重大意外变化,而数据库流量保持在正常范围内,则可能表明存在数据删除或损坏、意外数据增长或索引更改等问题。
  • A sudden drop in dbStats.dataSize may indicate a large amount of data deletion. If this drop is unexpected, you should quickly investigate.dbStats.dataSize的突然下降可能表示删除了大量数据。如果这种下降是意外的,你应该迅速调查。

Memory Metrics内存指标

Memory metrics show how MongoDB uses the virtual memory of the computing infrastructure that is hosting the cluster. You can find memory metrics in the mem document in the results of serverStatus.内存指标;展示MongoDB如何使用托管集群的计算基础设施的虚拟内存。您可以在serverStatus的结果中的mem文档中找到内存指标。

  • An increasing number of page faults or a growing amount of data changed but not yet written to disk can indicate problems related to the amount of memory available to the cluster.越来越多的页面错误或越来越多的已更改但尚未写入磁盘的数据可能表明与集群可用内存量相关的问题。
  • Cache metrics can help determine if the working set is outgrowing the available cache.缓存指标可以帮助确定工作集是否超过了可用缓存的容量。

Critical Errors严重错误

MongoDB creates asserts mostly through errors that MongoDB captures as part of its logging process.MongoDB主要通过MongoDB在日志记录过程中捕获的错误来创建断言

Monitoring the number of asserts created at various levels of severity can provide a first level indication of unexpected problems. Asserts can be message asserts, the most serious kind, or warning assets, regular asserts, and user asserts.监控在不同严重程度上创建的断言数量可以提供意外问题的第一级指示。断言可以是最严重的消息断言,也可以是警告资产、常规断言和用户断言。