Production best practices: performance and reliability生产最佳实践:性能和可靠性

This article discusses performance and reliability best practices for Express applications deployed to production.本文讨论了部署到生产环境的Express应用程序的性能和可靠性最佳实践。

This topic clearly falls into the “devops” world, spanning both traditional development and operations. Accordingly, the information is divided into two parts:这个主题显然属于“devops”领域,涵盖了传统的开发和运营。因此,信息分为两部分:

Things to do in your code代码中要做的事情

Here are some things you can do in your code to improve your application’s performance:以下是您可以在代码中执行的一些操作,以提高应用程序的性能:

Use gzip compression使用gzip压缩

Gzip compressing can greatly decrease the size of the response body and hence increase the speed of a web app. Use the compression middleware for gzip compression in your Express app. For example:Gzip压缩可以大大减小响应体的大小,从而提高web应用程序的速度。在Express应用程序中使用压缩中间件进行gzip压缩。例如:

const compression = require('compression')
const express = require('express')
const app = express()

app.use(compression())

For a high-traffic website in production, the best way to put compression in place is to implement it at a reverse proxy level (see Use a reverse proxy). 对于生产中的高流量网站,实施压缩的最佳方法是在反向代理级别实现它(请参阅使用反向代理)。In that case, you do not need to use compression middleware. For details on enabling gzip compression in Nginx, see Module ngx_http_gzip_module in the Nginx documentation.在这种情况下,您不需要使用压缩中间件。有关在Nginx中启用gzip压缩的详细信息,请参阅Nginx文档中的模块Module ngx_http_gzip_module

Don’t use synchronous functions不使用同步函数

Synchronous functions and methods tie up the executing process until they return. A single call to a synchronous function might return in a few microseconds or milliseconds, however in high-traffic websites, these calls add up and reduce the performance of the app. Avoid their use in production.同步函数和方法会占用执行进程,直到它们返回。对同步函数的单个调用可能会在几微秒或毫秒内返回,但在高流量网站中,这些调用会增加并降低应用程序的性能。避免在生产中使用。

Although Node and many modules provide synchronous and asynchronous versions of their functions, always use the asynchronous version in production. The only time when a synchronous function can be justified is upon initial startup.尽管Node和许多模块提供了其功能的同步和异步版本,但在生产中始终使用异步版本。同步功能唯一可以证明的时间是在初始启动时。

You can use the --trace-sync-io command-line flag to print a warning and a stack trace whenever your application uses a synchronous API. 只要应用程序使用同步API,就可以使用--trace-sync-io命令行标志来打印警告和堆栈跟踪。Of course, you wouldn’t want to use this in production, but rather to ensure that your code is ready for production. 当然,您不想在生产中使用它,而是要确保您的代码已准备好投入生产。See the node command-line options documentation for more information.有关更多信息,请参阅节点命令行选项文档

Do logging correctly正确记录日志

In general, there are two reasons for logging from your app: For debugging and for logging app activity (essentially, everything else). Using console.log() or console.error() to print log messages to the terminal is common practice in development. 一般来说,从应用程序记录有两个原因:调试和记录应用程序活动(本质上,其他一切)。使用console.log()console.error()将日志消息打印到终端是开发中的常见做法。But these functions are synchronous when the destination is a terminal or a file, so they are not suitable for production, unless you pipe the output to another program.但是,当目标是终端或文件时,这些函数是同步的,因此它们不适合生产环境,除非您将输出管道传输到另一个程序。

For debugging用于调试

If you’re logging for purposes of debugging, then instead of using console.log(), use a special debugging module like debug. 如果你是为了调试而记录日志,那么不要使用console.log(),而是使用像debug这样的特殊调试模块。This module enables you to use the DEBUG environment variable to control what debug messages are sent to console.error(), if any. To keep your app purely asynchronous, you’d still want to pipe console.error() to another program. 此模块使您能够使用DEBUG环境变量来控制向console.error()发送哪些调试消息(如果有的话)。为了保持你的应用程序完全异步,你仍然想将console.error()管道连接到另一个程序。But then, you’re not really going to debug in production, are you?但是,你真的不会在生产环境中调试,是吗?

For app activity对于应用程序活动

If you’re logging app activity (for example, tracking traffic or API calls), instead of using console.log(), use a logging library like Pino, which is the fastest and most efficient option available.如果您正在记录应用程序活动(例如,跟踪流量或API调用),而不是使用console.log(),请使用Pino这样的日志库,这是最快、最有效的选项。

Handle exceptions properly正确处理异常

Node apps crash when they encounter an uncaught exception. Not handling exceptions and taking appropriate actions will make your Express app crash and go offline. 节点应用程序在遇到未捕获的异常时会崩溃。不处理异常并采取适当行动将使您的Express应用程序崩溃并脱机。If you follow the advice in Ensure your app automatically restarts below, then your app will recover from a crash. 如果你遵循下面“确保你的应用程序自动重启”中的建议,那么你的应用将从崩溃中恢复。Fortunately, Express apps typically have a short startup time. Nevertheless, you want to avoid crashing in the first place, and to do that, you need to handle exceptions properly.幸运的是,Express应用程序的启动时间通常很短。然而,您首先要避免崩溃,为此,您需要正确处理异常。

To ensure you handle all exceptions, use the following techniques:为确保处理所有异常,请使用以下技术:

Before diving into these topics, you should have a basic understanding of Node/Express error handling: using error-first callbacks, and propagating errors in middleware. 在深入这些主题之前,您应该对Node/Express错误处理有一个基本的了解:使用错误优先回调,并在中间件中传播错误。Node uses an “error-first callback” convention for returning errors from asynchronous functions, where the first parameter to the callback function is the error object, followed by result data in succeeding parameters. Node使用“错误优先回调”约定从异步函数返回错误,其中回调函数的第一个参数是错误对象,后面是后续参数中的结果数据。To indicate no error, pass null as the first parameter. The callback function must correspondingly follow the error-first callback convention to meaningfully handle the error. 要表示没有错误,请将null作为第一个参数传递。回调函数必须相应地遵循错误优先回调惯例,以有意义地处理错误。And in Express, the best practice is to use the next() function to propagate errors through the middleware chain.在Express中,最佳做法是使用next()函数在中间件链中传播错误。

For more on the fundamentals of error handling, see:有关错误处理基本原理的更多信息,请参阅:

Use 使用try-catch

Try-catch is a JavaScript language construct that you can use to catch exceptions in synchronous code. Use try-catch, for example, to handle JSON parsing errors as shown below.是一种JavaScript语言构造,可用于捕获同步代码中的异常。例如,使用try-catch来处理JSON解析错误,如下所示。

Here is an example of using try-catch to handle a potential process-crashing exception.下面是一个使用try-catch处理潜在进程崩溃异常的示例。 This middleware function accepts a query field parameter named “params” that is a JSON object.此中间件函数接受一个名为“params”的查询字段参数,该参数是一个JSON对象。

app.get('/search', (req, res) => {
  // Simulating async operation模拟异步操作
  setImmediate(() => {
    const jsonStr = req.query.params
    try {
      const jsonObj = JSON.parse(jsonStr)
      res.send('Success')
    } catch (e) {
      res.status(400).send('Invalid JSON string')
    }
  })
})

However, try-catch works only for synchronous code. Because the Node platform is primarily asynchronous (particularly in a production environment), try-catch won’t catch a lot of exceptions.但是,try-catch只适用于同步代码。因为Node平台主要是异步的(特别是在生产环境中),所以try-catch不会捕获很多异常。

Use 使用promises

When an error is thrown in an async function or a rejected promise is awaited inside an async function, those errors will be passed to the error handler as if calling next(err)当在async函数中抛出错误或在async函数内等待被拒绝的promise时,这些错误将被传递给错误处理程序,就像调用next(err)一样

app.get('/', async (req, res, next) => {
  const data = await userData() // If this promise fails, it will automatically call `next(err)` to handle the error.

  res.send(data)
})

app.use((err, req, res, next) => {
  res.status(err.status ?? 500).send({ error: err.message })
})

Also, you can use asynchronous functions for your middleware, and the router will handle errors if the promise fails, for example:此外,您可以为中间件使用异步函数,如果promise失败,路由器将处理错误,例如:

app.use(async (req, res, next) => {
  req.locals.user = await getUser(req)

  next() // This will be called if the promise does not throw an error.如果promise没有抛出错误,则将调用此函数。
})

Best practice is to handle errors as close to the site as possible. So while this is now handled in the router, it’s best to catch the error in the middleware and handle it without relying on separate error-handling middleware.最佳做法是在尽可能靠近现场的地方处理错误。因此,虽然现在这是在路由器中处理的,但最好在中间件中捕获错误并处理它,而不依赖于单独的错误处理中间件。

What not to do什么不该做

One thing you should not do is to listen for the uncaughtException event, emitted when an exception bubbles all the way back to the event loop. 您不应该做的一件事是监听uncaughtException事件,该事件是在异常一直冒泡到事件循环时发出的。Adding an event listener for uncaughtException will change the default behavior of the process that is encountering an exception; the process will continue to run despite the exception. uncaughtException添加一个事件监听器将改变遇到异常的进程的默认行为;尽管出现异常,该进程仍将继续运行。This might sound like a good way of preventing your app from crashing, but continuing to run the app after an uncaught exception is a dangerous practice and is not recommended, because the state of the process becomes unreliable and unpredictable.这听起来可能是防止应用程序崩溃的好方法,但在未捕获的异常后继续运行应用程序是一种危险的做法,不建议这样做,因为进程的状态变得不可靠和不可预测。

Additionally, using uncaughtException is officially recognized as crude. 此外,使用uncaughtException被官方认定为粗鲁So listening for uncaughtException is just a bad idea. This is why we recommend things like multiple processes and supervisors: crashing and restarting is often the most reliable way to recover from an error.因此,监听uncughtException只是一个坏主意。这就是为什么我们建议使用多个进程和监督程序:崩溃和重启通常是从错误中恢复的最可靠方法。

We also don’t recommend using domains. It generally doesn’t solve the problem and is a deprecated module.我们也不建议使用域名。它通常不能解决问题,是一个已弃用的模块。

Things to do in your environment / setup在您的环境/设置中要做的事情

Here are some things you can do in your system environment to improve your app’s performance:以下是您可以在系统环境中执行的一些操作,以提高应用程序的性能:

Set NODE_ENV to “production”将NODE_ENV设置为“production”

The NODE_ENV environment variable specifies the environment in which an application is running (usually, development or production). One of the simplest things you can do to improve performance is to set NODE_ENV to production.NODE_ENV环境变量指定应用程序运行的环境(通常是开发或生产环境)。提高性能的最简单方法之一是将NODE_ENV设置为production

Setting NODE_ENV to “production” makes Express:将NODE_ENV设置为“production”会使Express:

Tests indicate that just doing this can improve app performance by a factor of three!测试表明,这样做可以将应用程序性能提高三倍!

If you need to write environment-specific code, you can check the value of NODE_ENV with process.env.NODE_ENV. Be aware that checking the value of any environment variable incurs a performance penalty, and so should be done sparingly.如果需要编写特定于环境的代码,可以使用process.env.NODE_ENV检查NODE_ENV的值。请注意,检查任何环境变量的值都会导致性能损失,因此应谨慎进行。

In development, you typically set environment variables in your interactive shell, for example by using export or your .bash_profile file. 在开发中,您通常在交互式shell中设置环境变量,例如使用export.bash_profile文件。But in general, you shouldn’t do that on a production server; instead, use your OS’s init system (systemd). 但一般来说,你不应该在生产服务器上这样做;相反,使用操作系统的init系统(systemd)。The next section provides more details about using your init system in general, but setting NODE_ENV is so important for performance (and easy to do), that it’s highlighted here.下一节将提供有关一般使用init系统的更多详细信息,但设置NODE_ENV对性能非常重要(而且很容易做到),因此在这里突出显示。

With systemd, use the Environment directive in your unit file. For example:使用systemd,在单元文件中使用Environment指令。例如:

# /etc/systemd/system/myservice.service
Environment=NODE_ENV=production

For more information, see Using Environment Variables In systemd Units.有关更多信息,请参阅以systemd单位使用环境变量

Ensure your app automatically restarts确保您的应用程序自动重新启动

In production, you don’t want your application to be offline, ever. This means you need to make sure it restarts both if the app crashes and if the server itself crashes. Although you hope that neither of those events occurs, realistically you must account for both eventualities by:在生产环境中,你永远不希望你的应用程序离线。这意味着您需要确保在应用程序崩溃和服务器本身崩溃时都能重新启动。虽然你希望这两种情况都不会发生,但实际上,你必须通过以下方式考虑这两种可能性:

Node applications crash if they encounter an uncaught exception. 如果节点应用程序遇到未捕获的异常,则会崩溃。The foremost thing you need to do is to ensure your app is well-tested and handles all exceptions (see handle exceptions properly for details). 您需要做的最重要的事情是确保您的应用程序经过良好测试并处理所有异常(有关详细信息,请参阅正确处理异常)。But as a fail-safe, put a mechanism in place to ensure that if and when your app crashes, it will automatically restart.但作为一种故障保护机制,请建立一种机制,以确保在您的应用程序崩溃时,它会自动重新启动。

Use a process manager使用流程管理器

In development, you started your app simply from the command line with node server.js or something similar. But doing this in production is a recipe for disaster. 在开发过程中,您只需使用node server.js或类似工具从命令行启动应用程序。但在生产中这样做会导致灾难。If the app crashes, it will be offline until you restart it. To ensure your app restarts if it crashes, use a process manager. 如果应用程序崩溃,它将处于脱机状态,直到您重新启动它。为了确保您的应用程序在崩溃时重新启动,请使用进程管理器。A process manager is a “container” for applications that facilitates deployment, provides high availability, and enables you to manage the application at runtime.流程管理器是应用程序的“容器”,它有助于部署,提供高可用性,并使您能够在运行时管理应用程序。

In addition to restarting your app when it crashes, a process manager can enable you to:除了在应用程序崩溃时重新启动外,流程管理器还可以让您:

Historically, it was popular to use a Node.js process manager like PM2. See their documentation if you wish to do this. However, we recommend using your init system for process management.从历史上看,使用像PM2这样的Node.js进程管理器很受欢迎。如果您想这样做,请参阅他们的文档。但是,我们建议使用init系统进行进程管理。

Use an init system使用init系统

The next layer of reliability is to ensure that your app restarts when the server restarts. Systems can still go down for a variety of reasons. 下一层可靠性是确保您的应用程序在服务器重新启动时重新启动。系统仍可能因各种原因而停机。To ensure that your app restarts if the server crashes, use the init system built into your OS. The main init system in use today is systemd.为了确保您的应用程序在服务器崩溃时重新启动,请使用操作系统内置的init系统。目前使用的主要init系统是systemd

There are two ways to use init systems with your Express app:有两种方法可以将init系统与Express应用程序一起使用:

Systemd

Systemd is a Linux system and service manager. Most major Linux distributions have adopted systemd as their default init system.Systemd是一个Linux系统和服务管理器。大多数主要的Linux发行版都采用systemd作为默认的初始化系统。

A systemd service configuration file is called a unit file, with a filename ending in .service. systemd服务配置文件称为单元文件,文件名以.service结尾。Here’s an example unit file to manage a Node app directly. Replace the values enclosed in <angle brackets> for your system and app:这是一个直接管理Node应用程序的示例单元文件。替换您的系统和应用程序中<尖括号>中的值:

[Unit]
Description=<Awesome Express App>

[Service]
Type=simple
ExecStart=/usr/local/bin/node </projects/myapp/index.js>
WorkingDirectory=</projects/myapp>

User=nobody
Group=nogroup

# Environment variables:
Environment=NODE_ENV=production

# Allow many incoming connections
LimitNOFILE=infinity

# Allow core dumps for debugging
LimitCORE=infinity

StandardInput=null
StandardOutput=syslog
StandardError=syslog
Restart=always

[Install]
WantedBy=multi-user.target

For more information on systemd, see the systemd reference (man page).有关systemd的更多信息,请参阅systemd参考(手册页)

Run your app in a cluster在集群中运行您的应用程序

In a multi-core system, you can increase the performance of a Node app by many times by launching a cluster of processes. 在多核系统中,通过启动一组进程,可以将Node应用程序的性能提高很多倍。A cluster runs multiple instances of the app, ideally one instance on each CPU core, thereby distributing the load and tasks among the instances.集群运行应用程序的多个实例,理想情况下每个CPU核心上运行一个实例,从而在实例之间分配负载和任务。

Balancing between application instances using the cluster API

IMPORTANT: Since the app instances run as separate processes, they do not share the same memory space. That is, objects are local to each instance of the app. 重要提示:由于应用程序实例作为单独的进程运行,因此它们不共享相同的内存空间。也就是说,对象是应用程序的每个实例的本地对象。Therefore, you cannot maintain state in the application code. However, you can use an in-memory datastore like Redis to store session-related data and state. 因此,您无法在应用程序代码中维护状态。但是,您可以使用像Redis这样的内存数据存储来存储与会话相关的数据和状态。This caveat applies to essentially all forms of horizontal scaling, whether clustering with multiple processes or multiple physical servers.这一警告基本上适用于所有形式的横向扩展,无论是使用多个进程还是多个物理服务器进行集群。

In clustered apps, worker processes can crash individually without affecting the rest of the processes. 在集群应用程序中,工作进程可以单独崩溃,而不会影响其他进程。Apart from performance advantages, failure isolation is another reason to run a cluster of app processes. Whenever a worker process crashes, always make sure to log the event and spawn a new process using cluster.fork().除了性能优势外,故障隔离是运行应用程序进程集群的另一个原因。每当工作进程崩溃时,一定要记录事件并使用cluster.fork()生成新进程。

Using Node’s cluster module使用Node的集群模块

Clustering is made possible with Node’s cluster module. This enables a master process to spawn worker processes and distribute incoming connections among the workers.使用Node的集群模块可以实现集群。这使得主进程能够生成工作进程,并在工作进程之间分发传入连接。

Using PM2使用PM2

If you deploy your application with PM2, then you can take advantage of clustering without modifying your application code. 如果使用PM2部署应用程序,则可以在不修改应用程序代码的情况下利用集群。You should ensure your application is stateless first, meaning no local data is stored in the process (such as sessions, websocket connections and the like).您应该首先确保您的应用程序是无状态的,这意味着进程中没有存储本地数据(如会话、websocket连接等)。

When running an application with PM2, you can enable cluster mode to run it in a cluster with a number of instances of your choosing, such as the matching the number of available CPUs on the machine. 当使用PM2运行应用程序时,您可以启用集群模式,在具有您选择的多个实例的集群中运行它,例如匹配机器上可用CPU的数量。You can manually change the number of processes in the cluster using the pm2 command line tool without stopping the app.您可以使用pm2命令行工具手动更改集群中的进程数,而无需停止应用程序。

To enable cluster mode, start your application like so:要启用集群模式,请按如下方式启动应用程序:

# Start 4 worker processes
$ pm2 start npm --name my-app -i 4 -- start
# Auto-detect number of available CPUs and start that many worker processes
$ pm2 start npm --name my-app -i max -- start

This can also be configured within a PM2 process file (ecosystem.config.js or similar) by setting exec_mode to cluster and instances to the number of workers to start.这也可以在PM2进程文件(ecosystem.config.js或类似文件)中配置,方法是将exec_mode设置为cluster,将instances设置为要启动的工作进程数量。

Once running, the application can be scaled like so:运行后,应用程序可以按如下方式进行扩展:

# Add 3 more workers
$ pm2 scale my-app +3
# Scale to a specific number of workers
$ pm2 scale my-app 2

For more information on clustering with PM2, see Cluster Mode in the PM2 documentation.有关使用PM2进行群集的更多信息,请参阅PM2文档中的群集模式

Cache request results缓存请求结果

Another strategy to improve the performance in production is to cache the result of requests, so that your app does not repeat the operation to serve the same request repeatedly.另一种提高生产性能的策略是缓存请求的结果,这样你的应用程序就不会重复操作来重复处理同一个请求。

Use a caching server like Varnish or Nginx (see also Nginx Caching) to greatly improve the speed and performance of your app.使用像VarnishNginx这样的缓存服务器(另见Nginx缓存)可以大大提高应用程序的速度和性能。

Use a load balancer使用负载平衡器

No matter how optimized an app is, a single instance can handle only a limited amount of load and traffic. One way to scale an app is to run multiple instances of it and distribute the traffic via a load balancer. 无论应用程序有多优化,单个实例都只能处理有限的负载和流量。扩展应用程序的一种方法是运行它的多个实例,并通过负载均衡器分配流量。Setting up a load balancer can improve your app’s performance and speed, and enable it to scale more than is possible with a single instance.设置负载均衡器可以提高应用程序的性能和速度,并使其能够比单个实例更具扩展性。

A load balancer is usually a reverse proxy that orchestrates traffic to and from multiple application instances and servers. 负载均衡器通常是一个反向代理,负责协调往返于多个应用程序实例和服务器的流量。You can easily set up a load balancer for your app by using Nginx or HAProxy.您可以使用NginxHAProxy为您的应用程序轻松设置负载均衡器。

With load balancing, you might have to ensure that requests that are associated with a particular session ID connect to the process that originated them. 使用负载平衡,您可能必须确保与特定会话ID关联的请求连接到发起它们的进程。This is known as session affinity, or sticky sessions, and may be addressed by the suggestion above to use a data store such as Redis for session data (depending on your application). 这被称为会话亲和性粘性会话,可以通过上述建议来解决,即使用Redis等数据存储来存储会话数据(取决于您的应用程序)。For a discussion, see Using multiple nodes.有关讨论,请参阅使用多个节点

Use a reverse proxy使用反向代理

A reverse proxy sits in front of a web app and performs supporting operations on the requests, apart from directing requests to the app. It can handle error pages, compression, caching, serving files, and load balancing among other things.反向代理位于web应用程序前面,除了将请求定向到应用程序外,还对请求执行支持操作。它可以处理错误页面、压缩、缓存、提供文件和负载平衡等。

Handing over tasks that do not require knowledge of application state to a reverse proxy frees up Express to perform specialized application tasks. 将不需要了解应用程序状态的任务移交给反向代理,可以让Express执行专门的应用程序任务。For this reason, it is recommended to run Express behind a reverse proxy like Nginx or HAProxy in production.因此,建议在生产环境中使用NginxHAProxy等反向代理运行Express。