Return to the PCRE2 index page.返回PCRE2索引页面。
This page is part of the PCRE2 HTML documentation. It was generated automatically from the original man page. 此页面是PCRE2 HTML文档的一部分。它是从原始手册页自动生成的。If there is any nonsense in it, please consult the man page, in case the conversion went wrong. 如果其中有任何废话,请参阅手册页,以防转换出错。
PCRE2 is the name used for a revised API for the PCRE library, which is a set of functions, written in C, that implement regular expression pattern matching using the same syntax and semantics as Perl, with just a few differences. PCRE2是用于PCRE库的修订API的名称,PCRE库是一组用C编写的函数,它们使用与Perl相同的语法和语义来实现正则表达式模式匹配,只是有一些不同。After nearly two decades, the limitations of the original API were making development increasingly difficult. 近二十年后,原始API的局限性使开发变得越来越困难。The new API is more extensible, and it was simplified by abolishing the separate "study" optimizing function; in PCRE2, patterns are automatically optimized where possible. 新的API更具可扩展性,并通过取消单独的“学习”优化功能来简化;在PCRE2中,在可能的情况下自动优化模式。Since forking from PCRE1, the code has been extensively refactored and new features introduced. The old library is now obsolete and is no longer maintained. 自从从PCRE1分叉以来,代码已经进行了广泛的重构,并引入了新功能。旧图书馆现已废弃,不再维护。
As well as Perl-style regular expression patterns, some features that appeared in Python and the original PCRE before they appeared in Perl are available using the Python syntax. 除了Perl风格的正则表达式模式外,Python中出现的一些功能以及在Perl中出现之前的原始PCRE也可以使用Python语法。There is also some support for one or two .NET and Oniguruma syntax items, and there are options for requesting some minor changes that give better ECMAScript (aka JavaScript) compatibility. 还有一些对一到两个.NET和Oniguruma语法项的支持,还有一些请求一些小更改的选项,可以提供更好的ECMAScript(又名JavaScript)兼容性。
The source code for PCRE2 can be compiled to support strings of 8-bit, 16-bit, or 32-bit code units, which means that up to three separate libraries may be installed, one for each code unit size. PCRE2的源代码可以编译为支持8位、16位或32位代码单元的字符串,这意味着最多可以安装三个独立的库,每个代码单元大小一个。The size of code unit is not related to the bit size of the underlying hardware. 代码单元的大小与底层硬件的比特大小无关。In a 64-bit environment that also supports 32-bit applications, versions of PCRE2 that are compiled in both 64-bit and 32-bit modes may be needed. 在同样支持32位应用程序的64位环境中,可能需要以64位和32位模式编译的PCRE2版本。
The original work to extend PCRE to 16-bit and 32-bit code units was done by Zoltan Herczeg and Christian Persch, respectively. 将PCRE扩展到16位和32位代码单元的最初工作分别由Zoltan Herczeg和Christian Persch完成。In all three cases, strings can be interpreted either as one character per code unit, or as UTF-encoded Unicode, with support for Unicode general category properties. 在这三种情况下,字符串可以解释为每个代码单元一个字符,也可以解释为UTF编码的Unicode,并支持Unicode通用类别属性。Unicode support is optional at build time (but is the default). Unicode支持在构建时是可选的(但是默认的)。However, processing strings as UTF code units must be enabled explicitly at run time. 但是,必须在运行时显式启用将字符串作为UTF代码单元进行处理。The version of Unicode in use can be discovered by running 使用中的Unicode版本可以通过运行以下代码来获取:
pcre2test -C
The three libraries contain identical sets of functions, with names ending in _8, _16, or _32, respectively (for example, pcre2_compile_8()). 这三个库包含相同的函数集,名称分别以_8、_16或_32结尾(例如,pcre2_compile_8()
)。However, by defining PCRE2_CODE_UNIT_WIDTH to be 8, 16, or 32, a program that uses just one code unit width can be written using generic names such as pcre2_compile(), and the documentation is written assuming that this is the case. 然而,通过将PCRE2_CODE_UNIT_WIDTH定义为8、16或32,可以使用诸如PCRE2_compile()
之类的通用名称来编写仅使用一个代码单元宽度的程序,并且文档是在假设情况下编写的。In addition to the Perl-compatible matching function, PCRE2 contains an alternative function that matches the same compiled patterns in a different way. 除了Perl兼容的匹配函数外,PCRE2还包含一个替代函数,它以不同的方式匹配相同的编译模式。In certain circumstances, the alternative function has some advantages. 在某些情况下,替代函数具有一些优点。For a discussion of the two matching algorithms, see the pcre2matching page. 有关这两种匹配算法的讨论,请参阅pcre2matching页面。Details of exactly which Perl regular expression features are and are not supported by PCRE2 are given in separate documents. PCRE2支持和不支持的Perl正则表达式特性的详细信息在单独的文档中给出。See the pcre2pattern and pcre2compat pages. 请参阅pcre2pattern和pcre2compat页面。There is a syntax summary in the pcre2syntax page. pcre2syntax页面中有一个语法摘要。Some features of PCRE2 can be included, excluded, or changed when the library is built. PCRE2的一些特性可以在构建库时包括、排除或更改。The pcre2_config() function makes it possible for a client to discover which features are available. pcre2_config()
函数使客户端能够发现哪些功能可用。The features themselves are described in the pcre2build page. pcre2build页面中介绍了这些功能本身。Documentation about building PCRE2 for various operating systems can be found in the README and NON-AUTOTOOLS_BUILD files in the source distribution. 有关为各种操作系统构建PCRE2的文档可以在源发行版的README和NON-AUTOTOOLS_BUILD文件中找到。The libraries contains a number of undocumented internal functions and data tables that are used by more than one of the exported external functions, but which are not intended for use by external callers. 库包含许多未记录的内部函数和数据表,这些函数和表由多个导出的外部函数使用,但不供外部调用方使用。Their names all begin with "_pcre2", which hopefully will not provoke any name clashes. 他们的名字都以“_pcre2”开头,希望这不会引发任何名字冲突。In some environments, it is possible to control which external symbols are exported when a shared library is built, and in these cases the undocumented symbols are not exported. 在某些环境中,可以控制在构建共享库时导出哪些外部符号,在这些情况下,不会导出未记录的符号。
If you are using PCRE2 in a non-UTF application that permits users to supply arbitrary patterns for compilation, you should be aware of a feature that allows users to turn on UTF support from within a pattern. 如果您在非UTF应用程序中使用PCRE2,该应用程序允许用户提供任意模式进行编译,那么您应该了解一个允许用户从模式中打开UTF支持的功能。For example, an 8-bit pattern that begins with "(*UTF)" turns on UTF-8 mode, which interprets patterns and subjects as strings of UTF-8 code units instead of individual 8-bit characters. 例如,以“(*UTF)”开头的8位模式打开UTF-8模式,该模式将模式和主题解释为UTF-8代码单元字符串,而不是单个8位字符。This causes both the pattern and any data against which it is matched to be checked for UTF-8 validity. 这将导致对模式及其匹配的任何数据进行UTF-8有效性检查。If the data string is very long, such a check might use sufficiently many resources as to cause your application to lose performance. 如果数据字符串很长,这样的检查可能会占用足够多的资源,从而导致应用程序失去性能。One way of guarding against this possibility is to use the pcre2_pattern_info() function to check the compiled pattern's options for PCRE2_UTF. 防止这种可能性的一种方法是使用pcre2_pattern_info()
函数检查已编译模式的pcre2_UTF选项。Alternatively, you can set the PCRE2_NEVER_UTF option when calling pcre2_compile(). 或者,您可以在调用PCRE2_compile()
时设置PCRE2_NEVER_UTF选项。This causes a compile time error if the pattern contains a UTF-setting sequence. 如果模式包含UTF设置序列,则会导致编译时错误。The use of Unicode properties for character types such as \d can also be enabled from within the pattern, by specifying "(*UCP)". 通过指定“(*UCP)”,也可以在模式中启用字符类型(如\d)的Unicode属性。This feature can be disallowed by setting the PCRE2_NEVER_UCP option. If your application is one that supports UTF, be aware that validity checking can take time. 可以通过设置PCRE2_NEVER_UP选项来禁止此功能。如果您的应用程序支持UTF,请注意有效性检查可能需要时间。If the same data string is to be matched many times, you can use the PCRE2_NO_UTF_CHECK option for the second and subsequent matches to avoid running redundant checks. 如果要多次匹配同一数据字符串,则可以对第二次和后续匹配使用PCRE2_NO_UTF_CHECK选项,以避免运行冗余检查。The use of the \C escape sequence in a UTF-8 or UTF-16 pattern can lead to problems, because it may leave the current matching point in the middle of a multi-code-unit character. 在UTF-8或UTF-16模式中使用\C转义序列可能会导致问题,因为它可能会将当前匹配点留在多代码单元字符的中间。The PCRE2_NEVER_BACKSLASH_C option can be used by an application to lock out the use of \C, causing a compile-time error if it is encountered. 应用程序可以使用PCRE2_NEVER_BACKSLASH_C选项来锁定\C的使用,如果遇到编译时错误,则会导致编译时错误。It is also possible to build PCRE2 with the use of \C permanently disabled. Another way that performance can be hit is by running a pattern that has a very large search tree against a string that will never match. 也可以在永久禁用\C的情况下构建PCRE2。另一种影响性能的方法是运行一个模式,该模式针对一个永远不会匹配的字符串运行一个非常大的搜索树。Nested unlimited repeats in a pattern are a common example. 模式中嵌套的无限制重复就是一个常见的例子。PCRE2 provides some protection against this: see the pcre2_set_match_limit() function in the pcre2api page. PCRE2对此提供了一些保护:请参阅pcre2api页面中的PCRE2_set_match_limit()
函数。There is a similar function called pcre2_set_depth_limit() that can be used to restrict the amount of memory that is used. 有一个类似的函数叫做pcre2_set_depth_limit()
,可以用来限制所使用的内存量。
The user documentation for PCRE2 comprises a number of different sections. PCRE2的用户文档包括许多不同的部分。In the "man" format, each of these is a separate "man page". 在“man”格式中,每一个都是一个单独的“man page”。In the HTML format, each is a separate page, linked from the index page. 在HTML格式中,每个都是一个独立的页面,与索引页面链接。In the plain text format, the descriptions of the pcre2grep and pcre2test programs are in files called pcre2grep.txt and pcre2test.txt, respectively. 在纯文本格式中,pcre2grep和pcre2test程序的描述分别位于名为pcre2grepor.txt和pcre2test.txt的文件中。The remaining sections, except for the pcre2demo section (which is a program listing), and the short pages for individual functions, are concatenated in pcre2.txt, for ease of searching. The sections are as follows:
pcre2
this document
pcre2-config
show PCRE2 installation configuration information
pcre2api
details of PCRE2's native C API
pcre2build
building PCRE2
pcre2callout
details of the pattern callout feature
pcre2compat
discussion of Perl compatibility
pcre2convert
details of pattern conversion functions
pcre2demo
a demonstration C program that uses PCRE2
pcre2grep
description of the pcre2grep command (8-bit only)
pcre2jit
discussion of just-in-time optimization support
pcre2limits
details of size and other limits
pcre2matching
discussion of the two matching algorithms
pcre2partial
details of the partial matching facility
pcre2pattern
syntax and semantics of supported regular expression patterns
pcre2perform
discussion of performance issues
pcre2posix
the POSIX-compatible C API for the 8-bit library
pcre2sample
discussion of the pcre2demo program
pcre2serialize
details of pattern serialization
pcre2syntax
quick syntax reference
pcre2test
description of the pcre2test command
pcre2unicode
discussion of Unicode and UTF support
Philip Hazel
Retired from University Computing Service 从大学计算机服务退休
Cambridge, England. 英国剑桥。
Putting an actual email address here is a spam magnet. If you want to email me, use my two names separated by a dot at gmail.com. 在这里放一个实际的电子邮件地址是一个垃圾邮件磁铁。如果你想给我发电子邮件,请在gmail.com上使用我的两个名字,用句点隔开。
Last updated: 27 August 2021 上次更新时间:2021年8月27日
Copyright © 1997-2021 University of Cambridge. 版权所有©;1997-2021剑桥大学。
Return to the PCRE2 index page. 返回PCRE2索引页面。