Docs HomeMongoDB Manual

Operations Checklist操作检查表

The following checklist, along with the Development Checklist list, provides recommendations to help you avoid issues in your production MongoDB deployment.下面的清单以及开发清单列表提供了一些建议,可以帮助您避免在生产MongoDB部署中出现问题。

Filesystem文件系统

  • Align your disk partitions with your RAID configuration.将磁盘分区与RAID配置对齐。
  • Avoid using NFS drives for your dbPath. 避免将NFS驱动器用于dbPathUsing NFS drives can result in degraded and unstable performance. 使用NFS驱动器可能会导致性能下降和不稳定。See: Remote Filesystems (NFS) for more information.有关详细信息,请参阅:远程文件系统(NFS)

    • VMware users should use VMware virtual drives over NFS.VMware用户应该通过NFS使用VMware虚拟驱动器。
  • Linux/Unix: format your drives into XFS or EXT4. If possible, use XFS as it generally performs better with MongoDB.Linux/Unix:将驱动器格式化为XFS或EXT4。如果可能的话,使用XFS,因为它通常在MongoDB中表现更好。

    • With the WiredTiger storage engine, use of XFS is strongly recommended to avoid performance issues found when using EXT4 with WiredTiger.对于WiredTiger存储引擎,强烈建议使用XFS,以避免在将EXT4与WiredTigeer一起使用时出现性能问题。
    • If using RAID, you may need to configure XFS with your RAID geometry.如果使用RAID,您可能需要使用RAID几何结构配置XFS。
  • Windows: use the NTFS file system. Windows:使用NTFS文件系统。Do not use any FAT file system (i.e. FAT 16/32/exFAT).不要使用任何FAT文件系统(即FAT 16/32/exFAT)。

Replication复制

  • Verify that all non-hidden replica set members are identically provisioned in terms of their RAM, CPU, disk, network setup, etc.验证所有非隐藏副本集成员在RAM、CPU、磁盘、网络设置等方面的配置是否相同。
  • Configure the oplog size配置oplog大小 to suit your use case:以适合您的用例:

    • The replication oplog window should cover normal maintenance and downtime windows to avoid the need for a full resync.复制操作日志窗口应包括正常维护和停机时间窗口,以避免需要完全重新同步。
    • The replication oplog window should cover the time needed to restore a replica set member from the last backup.复制操作日志窗口应涵盖从上次备份中恢复副本集成员所需的时间。

      Changed in version 3.4: The replication oplog window no longer needs to cover the time needed to restore a replica set member via initial sync as the oplog records are pulled during the data copy. 在版本3.4中进行了更改:复制操作日志窗口不再需要覆盖通过初始同步恢复副本集成员所需的时间,因为操作日志记录是在数据复制期间提取的。However, the member being restored must have enough disk space in the local database to temporarily store these oplog records for the duration of this data copy stage.但是,要还原的成员在local数据库中必须有足够的磁盘空间,以便在此数据复制阶段期间临时存储这些操作日志记录。

      With earlier versions of MongoDB, replication oplog window should cover the time needed to restore a replica set member by initial sync.对于早期版本的MongoDB,复制操作日志窗口应该涵盖通过初始同步恢复副本集成员所需的时间。

  • Ensure that your replica set includes at least three data-bearing voting members that run with journaling and that you issue writes with w: majority write concern for availability and durability.请确保您的副本集至少包括三个带数据的投票成员,这些成员使用日记运行,并且您使用w: majority 写入关注发出的写入考虑到可用性和持久性。
  • Use hostnames when configuring replica set members, rather than IP addresses.配置复制副本集成员时使用主机名,而不是IP地址。
  • Ensure full bidirectional network connectivity between all mongod instances.确保所有mongod实例之间的完全双向网络连接。
  • Ensure that each host can resolve itself.确保每个主机都可以自行解析。
  • Ensure that your replica set contains an odd number of voting members.请确保您的复制副本集包含奇数个投票成员。
  • Ensure that mongod instances have 0 or 1 votes.请确保mongod实例具有01票。
  • For high availability, deploy your replica set into a minimum of three data centers.为了获得高可用性,请将您的复制副本集部署到至少三个数据中心。

Sharding分片

  • Place your config servers on dedicated hardware for optimal performance in large clusters. 配置服务器放置在专用硬件上,以在大型集群中获得最佳性能。Ensure that the hardware has enough RAM to hold the data files entirely in memory and that it has dedicated storage.确保硬件有足够的RAM将数据文件完全保存在内存中,并且有专用存储器。
  • Deploy mongos routers in accordance with the Production Configuration guidelines.根据生产配置指南部署mongos路由器。
  • Use NTP to synchronize the clocks on all components of your sharded cluster.使用NTP同步您的分片集群的所有组件上的时钟。
  • Ensure full bidirectional network connectivity between mongod, mongos, and config servers.确保mongodmongosconfig服务器之间的完全双向网络连接。
  • Use CNAMEs to identify your config servers to the cluster so that you can rename and renumber your config servers without downtime.使用CNAME识别集群中的配置服务器,这样您就可以在不停机的情况下重命名和重新编号配置服务器。

Journaling: WiredTiger Storage Engine日志:WiredTiger存储引擎

  • Ensure that all instances use journaling.确保所有实例都使用日志记录
  • Place the journal on its own low-latency disk for write-intensive workloads. 将日志放在自己的低延迟磁盘上,以用于写密集型工作负载。Note that this will affect snapshot-style backups as the files constituting the state of the database will reside on separate volumes.请注意,这将影响快照式备份,因为构成数据库状态的文件将位于不同的卷上。

Hardware硬件

  • Use RAID10 and SSD drives for optimal performance.使用RAID10和SSD驱动器可获得最佳性能。
  • SAN and Virtualization:SAN和虚拟化:

    • Ensure that each mongod has provisioned IOPS for its dbPath, or has its own physical drive or LUN.确保每个mongod都为其dbPath提供了IOPS,或者有自己的物理驱动器或LUN。
    • Avoid dynamic memory features, such as memory ballooning, when running in virtual environments.在虚拟环境中运行时,请避免动态内存功能,例如内存膨胀。
    • Avoid placing all replica set members on the same SAN, as the SAN can be a single point of failure.避免将所有复制副本集成员都放在同一个SAN上,因为SAN可能是单个故障点。

Deployments to Cloud Hardware部署到云硬件

  • Windows Azure: Adjust the TCP keepalive (tcp_keepalive_time) to 100-120. Windows Azure:将TCP keepalive(tcp_keepalive_time)调整为100-120。The TCP idle timeout on the Azure load balancer is too slow for MongoDB's connection pooling behavior. Azure负载平衡器上的TCP空闲超时对于MongoDB的连接池行为来说太慢。See: Azure Production Notes for more information.有关详细信息,请参阅:Azure生产说明
  • Use MongoDB version 2.6.4 or later on systems with high-latency storage, such as Windows Azure, as these versions include performance improvements for those systems.在具有高延迟存储的系统(如Windows Azure)上使用MongoDB 2.6.4或更高版本,因为这些版本包括对这些系统的性能改进。

Operating System Configuration操作系统配置

Linux

  • Turn off transparent hugepages. 关闭透明护肩。See Transparent Huge Pages Settings for more information.有关详细信息,请参阅透明大页面设置
  • Adjust the readahead settings on the devices storing your database files.在存储数据库文件的设备上调整预读设置

    • For the WiredTiger storage engine, set readahead between 8 and 32 regardless of storage media type (spinning disk, SSD, etc.), unless testing shows a measurable, repeatable, and reliable benefit in a higher readahead value.对于WiredTiger存储引擎,无论存储介质类型(旋转磁盘、SSD等)如何,都应将预读设置在8到32之间,除非测试显示预读值较高会带来可测量、可重复和可靠的好处。

      MongoDB commercial support can provide advice and guidance on alternate readahead configurations.MongoDB的商业支持可以为备用预读配置提供建议和指导。

  • If using tuned on RHEL / CentOS, you must customize your tuned profile. 如果在RHEL/CONTOS上使用tuned配置文件,则必须自定义您的tuned配置文件。Many of the tuned profiles that ship with RHEL / CentOS can negatively impact performance with their default settings. RHEL/CNTOS附带的许多tuned配置文件的默认设置可能会对性能产生负面影响。Customize your chosen tuned profile to:自定义您选择的调谐配置文件以:

    • Disable transparent hugepages. See Using tuned and ktune for instructions.禁用透明护垫。有关说明,请参阅使用调谐和ktune
    • Set readahead between 8 and 32 regardless of storage media type. 将预读设置在8和32之间,而不考虑存储介质类型。See Readahead settings for more information.有关详细信息,请参阅预读设置
  • Use the noop or deadline disk schedulers for SSD drives.对SSD驱动器使用noopdeadline磁盘调度程序。
  • Use the noop disk scheduler for virtualized drives in guest VMs.noop磁盘调度程序用于来宾虚拟机中的虚拟化驱动器。
  • Disable NUMA or set vm.zone_reclaim_mode to 0 and run mongod instances with node interleaving. See: MongoDB and NUMA Hardware for more information.禁用NUMA或将vmzone_repair_mode设置为0,并使用节点交错运行mongod实例。有关更多信息,请参阅:MongoDB和NUMA硬件
  • Adjust the ulimit values on your hardware to suit your use case. If multiple mongod or mongos instances are running under the same user, scale the ulimit values accordingly. 调整硬件上的ulimit值以适应您的用例。如果多个mongodmongos实例在同一用户下运行,请相应地缩放ulimit值。See: UNIX ulimit Settings for more information.有关详细信息,请参阅:UNIX ulimit设置
  • Use noatime for the dbPath mount point.使用noatime作为dbPath装载点。
  • Configure sufficient file handles (fs.file-max), kernel pid limit (kernel.pid_max), maximum threads per process (kernel.threads-max), and maximum number of memory map areas per process (vm.max_map_count) for your deployment. 为您的部署配置足够的文件句柄(fs.file-max)、内核pid限制(kernel.pid_max)、每个进程的最大线程数(kernel.threads-max)和每个进程的内存映射区域的最大数量(vm.max_map_count)。For large systems, the following values provide a good starting point:对于大型系统,以下值提供了一个良好的起点:

    • fs.file-max value of 98000,
    • kernel.pid_max value of 64000,
    • kernel.threads-max value of 64000, and
    • vm.max_map_count value of 128000
  • Ensure that your system has swap space configured. Refer to your operating system's documentation for details on appropriate sizing.请确保您的系统已配置交换空间。有关适当大小的详细信息,请参阅您的操作系统文档。
  • Ensure that the system default TCP keepalive is set correctly. 请确保正确设置了系统默认TCP保持活动。A value of 300 often provides better performance for replica sets and sharded clusters. 值300通常为副本集和分片集群提供更好的性能。See: Does TCP keepalive time affect MongoDB Deployments? in the Frequently Asked Questions for more information.有关详细信息,请参阅:常见问题解答中的TCPkeepalive会影响MongoDB部署吗?

Windows

  • Consider disabling NTFS "last access time" updates. This is analogous to disabling atime on Unix-like systems.考虑禁用NTFS“上次访问时间”更新。这类似于在类Unix系统上禁用atime
  • Format NTFS disks using the default Allocation unit size of 4096 bytes.使用4096字节的默认分配单元大小格式化NTFS磁盘。

Backups备份

  • Schedule periodic tests of your back up and restore process to have time estimates on hand, and to verify its functionality.安排备份和恢复过程的定期测试,以便手头有时间估计,并验证其功能。

Monitoring监控

  • Use MongoDB Cloud Manager or Ops Manager, an on-premise solution available in MongoDB Enterprise Advanced or another monitoring system to monitor key database metrics and set up alerts for them. Include alerts for the following metrics:使用MongoDB Cloud ManagerOps Manager(MongoDB Enterprise Advanced中提供的内部部署解决方案)或其他监控系统来监控关键数据库指标并设置警报。包括以下指标的警报:

    • replication lag
    • replication oplog window
    • assertions
    • queues
    • page faults
  • Monitor hardware statistics for your servers. 监视服务器的硬件统计信息。In particular, pay attention to the disk use, CPU, and available disk space.特别要注意磁盘使用情况、CPU和可用磁盘空间。

    In the absence of disk space monitoring, or as a precaution:在没有磁盘空间监控的情况下,或作为预防措施:

    • Create a dummy 4 GB file on the storage.dbPath drive to ensure available space if the disk becomes full.storage.dbPath驱动器上创建一个4 GB的伪文件,以确保磁盘已满时有可用空间。
    • A combination of cron+df can alert when disk space hits a high-water mark, if no other monitoring tool is available.如果没有其他可用的监控工具,cron+df的组合可以在磁盘空间达到高水位线时发出警报。

Load Balancing负载平衡

  • Configure load balancers to enable "sticky sessions" or "client affinity", with a sufficient timeout for existing connections.配置负载平衡器以启用“粘性会话”或“客户端相关性”,并为现有连接提供足够的超时。
  • Avoid placing load balancers between MongoDB cluster or replica set components.避免在MongoDB集群或副本集组件之间放置负载均衡器。