Return to the PCRE2 index page. 返回PCRE2索引页面。
This page is part of the PCRE2 HTML documentation. It was generated automatically from the original man page. If there is any nonsense in it, please consult the man page, in case the conversion went wrong. 此页面是PCRE2 HTML文档的一部分。它是从原始手册页自动生成的。如果其中有任何废话,请参阅手册页,以防转换出错。
DIFFERENCES BETWEEN PCRE2 AND PERL PCRE2和PERL之间的差异
This document describes some of the differences in the ways that PCRE2 and Perl handle regular expressions. The differences described here are with respect to Perl version 5.32.0, but as both Perl and PCRE2 are continually changing, the information may at times be out of date. 本文档描述了PCRE2和Perl处理正则表达式的方式的一些差异。这里描述的差异是关于Perl版本5.32.0的,但由于Perl和PCRE2都在不断变化,信息有时可能已经过时。
1. PCRE2 has only a subset of Perl's Unicode support. Details of what it does have are given in the pcre2unicode page. PCRE2只有Perl的Unicode支持的一个子集。它的详细信息在pcre2unicode页面中给出。
2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but they do not mean what you might think. 与Perl一样,PCRE2允许在带括号的断言上使用重复量词,但它们并不意味着你可能认为的那样。For example, 例如,(?!a){3}
does not assert that the next three characters are not "a". (?!a){3}
不断言接下来的三个字符不是“a”。It just asserts that the next character is not "a" three times (in principle; PCRE2 optimizes this to run the assertion just once). 它只是断言下一个字符不是“a”三次(原则上;PCRE2对此进行了优化,只运行一次断言)。Perl allows some repeat quantifiers on other assertions, for example, Perl允许在其他断言上使用一些重复量词,例如\b*
(but not \b{3}
, though oddly it does allow ^{3}
), but these do not seem to have any use. PCRE2 does not allow any kind of quantifier on non-lookaround assertions. \b*
(但不允许\b{3}
,尽管奇怪的是它确实允许^{3}
),但这些似乎没有任何用处。PCRE2不允许在非环视断言上使用任何类型的量词。
3. Capture groups that occur inside negative lookaround assertions are counted, but their entries in the offsets vector are set only when a negative assertion is a condition that has a matching branch (that is, the condition is false). Perl may set such capture groups in other circumstances. 对出现在否定环视断言内的捕获组进行计数,但只有当否定断言是具有匹配分支的条件(即条件为false
)时,才会设置它们在偏移量向量中的条目。Perl可以在其他情况下设置这样的捕获组。
4. The following Perl escape sequences are not supported: \F, \l, \L, \u, \U, and \N when followed by a character name. 不支持以下Perl转义序列:后面跟字符名时为\F
、\l
、\L
、\u
、\U
和\N
。\N on its own, matching a non-newline character, and \N{U+dd..}
, matching a Unicode code point, are supported. \N
本身,与非换行符匹配,而\N{U+dd..}
与Unicode代码点匹配,都是受支持的。The escapes that modify the case of following letters are implemented by Perl's general string-handling and are not part of its pattern matching engine. 修改以下字母大小写的转义是由Perl的通用字符串处理实现的,并且不是其模式匹配引擎的一部分。If any of these are encountered by PCRE2, an error is generated by default. 如果PCRE2遇到其中任何一个,则默认情况下会生成一个错误。However, if either of the PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \U and \u are interpreted as ECMAScript interprets them. 但是,如果设置了PCRE2_ALT_BSUX或PCRE2_EXTRA_ALT_BSU选项中的任何一个,\U
和\u
将被解释为ECMAScript解释它们。
5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is built with Unicode support (the default). 只有当PCRE2构建时支持Unicode(默认),才支持Perl转义序列\p
、\P
和\X
。The properties that can be tested with \p and \P are limited to the general category properties such as Lu and Nd, script names such as Greek or Han, and the derived properties Any and L&. 可以用\p和\p测试的属性仅限于一般类别属性,如Lu和Nd,脚本名称,如Greek或Han,以及派生属性Any和L&。Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use is limited. PCRE2和Perl都支持Cs(代理)属性,但在PCRE2中它的使用是有限的。See the pcre2pattern documentation for details. 有关详细信息,请参阅pcre2pattern文档。The long synonyms for property names that Perl supports (such as Perl支持的属性名称的长同义词(如\p{Letter}
) are not supported by PCRE2, nor is it permitted to prefix any of these properties with "Is". \p{Letter}
)不受PCRE2的支持,也不允许在这些属性中的任何属性前面加上“is”。
6. PCRE2 supports the PCRE2支持引用子字符串的\Q...\E
escape for quoting substrings. Characters in between are treated as literals. \Q...\E
转义。介于两者之间的字符被视为文字。However, this is slightly different from Perl in that $ and @ are also handled as literals inside the quotes. 然而,这与Perl略有不同,因为$
和@
也作为引号内的文字处理。In Perl, they cause variable interpolation (but of course PCRE2 does not have variables). 在Perl中,它们导致变量插值(但PCRE2当然没有变量)。Also, Perl does "double-quotish backslash interpolation" on any backslashes between \Q and \E which, its documentation says, "may lead to confusing results". 此外,Perl对\Q
和\E
之间的任何反斜杠进行“双引号反斜杠插值”,其文档称,这“可能会导致混淆的结果”。PCRE2 treats a backslash between \Q and \E just like any other character. Note the following examples: PCRE2处理\Q和\E之间的反斜杠就像处理任何其他字符一样。请注意以下示例:
Pattern PCRE2 matches Perl matches
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
\Qabc\$xyz\E abc\$xyz abc\$xyz
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
\QA\B\E A\B A\B
\Q\\E \ \\E
\Q...\E
sequence is recognized both inside and outside character classes by both PCRE2 and Perl. \Q...\E
序列在字符类内部和外部都被PCRE2和Perl识别。
7. Fairly obviously, PCRE2 does not support the 很明显,PCRE2不支持(?{code})
and (??{code})
constructions. (?{code})
和(??{code})
构造。However, PCRE2 does have a "callout" feature, which allows an external function to be called during pattern matching. 然而,PCRE2确实有一个“callout”功能,它允许在模式匹配期间调用外部函数。See the pcre2callout documentation for details. 有关详细信息,请参阅pcre2callout文档。
8. Subroutine calls (whether recursive or not) were treated as atomic groups up to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking into subroutine calls is now supported, as in Perl. 直到PCRE2 10.23版本,子例程调用(无论是否递归)都被视为原子组,但从10.30版本开始,这一点发生了变化,现在支持回溯到子例程调用,就像在Perl中一样。
9. In PCRE2, if any of the backtracking control verbs are used in a group that is called as a subroutine (whether or not recursively), their effect is confined to that group; it does not extend to the surrounding pattern. 在PCRE2中,如果回溯控制动词中的任何一个被用作子例程(无论是否递归)的组中使用,则其效果仅限于该组;它不延伸到周围的图案。This is not always the case in Perl. In particular, if 在Perl中并不总是这样。特别是,如果(*THEN)
is present in a group that is called as a subroutine, its action is limited to that group, even if the group does not contain any |
characters. (*THEN)
存在于作为子例程调用的组中,则其操作仅限于该组,即使该组不包含任何|
字符。Note that such groups are processed as anchored at the point where they are tested. 请注意,这些组被处理为锚定在它们被测试的点上。
10. If a pattern contains more than one backtracking control verb, the first one that is backtracked onto acts. 如果一个模式包含多个回溯控制动词,则第一个回溯到的动词起作用。For example, in the pattern 例如,在模式A(*COMMIT)B(*PRUNE)C
a failure in B triggers (*COMMIT)
, but a failure in C triggers (*PRUNE)
. A(*COMMIT)B(*PRUNE)C
中,B中的故障触发(*COMMIT)
,但C中的故障则触发(*PRUNE)
。Perl's behaviour is more complex; in many cases it is the same as PCRE2, but there are cases where it differs. Perl的行为更加复杂;在许多情况下,它与PCRE2相同,但也有不同的情况。
11. There are some differences that are concerned with the settings of captured strings when part of a pattern is repeated. 当模式的一部分重复时,捕获字符串的设置存在一些差异。For example, matching "aba" against the pattern 例如,将“aba”与模式/^(a(b)?)+$/
in Perl leaves $2
unset, but in PCRE2 it is set to "b". /^(a(b)?)+$/
匹配在Perl中未设置$2
,但在PCRE2中设置为“b”。
12. PCRE2's handling of duplicate capture group numbers and names is not as general as Perl's. PCRE2对重复捕获组号和名称的处理不像Perl那样通用。This is a consequence of the fact the PCRE2 works internally just with numbers, using an external table to translate between numbers and names. 这是因为PCRE2在内部只处理数字,使用外部表在数字和名称之间进行转换。In particular, a pattern such as 特别是,不支持像(?|(?<a>A)|(?<b>B))
, where the two capture groups have the same number but different names, is not supported, and causes an error at compile time. (?|(?<a>A)|(?<b>B))
这样的模式,其中两个捕获组具有相同的编号但名称不同,并且会在编译时导致错误。If it were allowed, it would not be possible to distinguish which group matched, because both names map to capture group number 1. To avoid this confusing situation, an error is given at compile time. 如果允许,将无法区分匹配的组,因为两个名称都映射到捕获组号1。为了避免这种混乱的情况,在编译时会给出一个错误。
13. Perl used to recognize comments in some places that PCRE2 does not, for example, between the ( and ? at the start of a group. If the /x modifier is set, Perl allowed white space between ( and ? though the latest Perls give an error (for a while it was just deprecated). Perl曾经在某些地方识别PCRE2没有识别的注释,例如,位于组的开头在(
和?
之间。如果设置了/x
修饰符,Perl允许在(
和?
之间有空格,尽管最新的Perls给出了一个错误(有一段时间它只是被弃用)。There may still be some cases where Perl behaves differently. 在某些情况下,Perl的行为可能仍然不同。
14. Perl, when in warning mode, gives warnings for character classes such as Perl在警告模式下,会对[A-\d]
or [a-[:digit:]]
. [A-\d]
或[a-[:digit:]]
等字符类发出警告。It then treats the hyphens as literals. PCRE2 has no warning features, so it gives an error in these cases because they are almost certainly user mistakes. 然后,它将连字符视为文字。PCRE2没有警告功能,所以在这些情况下会出错,因为它们几乎肯定是用户的错误。
15. In PCRE2, the upper/lower case character properties Lu and Ll are not affected when case-independent matching is specified. 在PCRE2中,当指定大小写无关匹配时,大小写字符属性Lu和Ll不受影响。For example, \p{Lu} always matches an upper case letter. 例如,\p{Lu}总是与大写字母匹配。I think Perl has changed in this respect; in the release at the time of writing (5.32), 我认为Perl在这方面已经发生了变化;在撰写本文时(5.32)发布的版本中,当指定了大小写独立性时,\p{Lu}
and \p{Ll}
match all letters, regardless of case, when case independence is specified. \p{Lu}
和\p{Ll}
匹配所有字母,而不考虑大小写。
16. From release 5.32.0, Perl locks out the use of 从5.32.0版本开始,Perl锁定了\K
in lookaround assertions. From release 10.38 PCRE2 does the same by default. However, there is an option for re-enabling the previous behaviour. \K
在查找断言中的使用。从10.38版本开始,PCRE2默认情况下也会这样做。但是,可以选择重新启用以前的行为。When this option is set, \K is acted on when it occurs in positive assertions, but is ignored in negative assertions. 当设置此选项时,\K
在正断言中出现时会被执行,但在负断言中会被忽略。
17. PCRE2 provides some extensions to the Perl regular expression facilities. Perl 5.10 included new features that were not in earlier versions of Perl, some of which (such as named parentheses) were in PCRE2 for some time before. This list is with respect to Perl 5.32: PCRE2为Perl正则表达式功能提供了一些扩展。Perl5.10包含了早期版本的Perl中没有的新功能,其中一些功能(如命名括号)在PCRE2中已经存在了一段时间。此列表是关于Perl 5.32的:
(a) Although lookbehind assertions in PCRE2 must match fixed length strings, each alternative toplevel branch of a lookbehind assertion can match a different length of string. Perl requires them all to have the same length. 尽管PCRE2中的查找断言必须匹配固定长度的字符串,但查找断言的每个备选顶层分支都可以匹配不同长度的字符串。Perl要求它们都具有相同的长度。
(b) From PCRE2 10.23, backreferences to groups of fixed length are supported in lookbehinds, provided that there is no possibility of referencing a non-unique number or name. Perl does not support backreferences in lookbehinds. 根据PCRE2 10.23,在lookbehinds中支持对固定长度组的反向引用,前提是不可能引用非唯一的数字或名称。Perl不支持lookbehinds中的反向引用。
(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $ meta-character matches only at the very end of the string. 如果设置了PCRE2_DOLLAR_ENDONLY而未设置PCRE2_MULTILINE,则$
元字符仅在字符串的最后匹配。
(d) A backslash followed by a letter with no special meaning is faulted. (Perl can be made to issue a warning.) 反斜杠后面跟着一个没有特殊含义的字母是错误的。(Perl可以用来发出警告。)
(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is inverted, that is, by default they are not greedy, but if followed by a question mark they are. 如果设置了PCRE2_UNGREEDY,则重复量词的贪婪性被反转,也就是说,默认情况下它们不是贪婪的,但如果后面跟着问号,它们是贪婪的。
(f) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried only at the first matching position in the subject string. PCRE2_ANCHORED可以在匹配时间用于强制仅在主题串中的第一个匹配位置处尝试模式。
(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART options have no Perl equivalents. PCRE2_NOTBOL、PCRE2_NOTEOL、PCRE2-NOTEMPTY和PCRE2_NOTEMPTY_ATSTART选项没有Perl等效项。
(h) The 可以通过PCRE2_BSR_ANYCRLF选项将\R
escape sequence can be restricted to match only CR, LF, or CRLF by the PCRE2_BSR_ANYCRLF option. \R
转义序列限制为仅匹配CR、LF或CRLF。
(i) The callout facility is PCRE2-specific. Perl supports codeblocks and variable interpolation, but not general hooks on every match. 调出功能是特定于PCRE2的。Perl支持代码块和变量插值,但不支持每个匹配的通用挂钩。
(j) The partial matching facility is PCRE2-specific. 部分匹配设施是PCRE2特定的。
(k) The alternative matching function (pcre2_dfa_match() matches in a different way and is not Perl-compatible. 替代匹配函数(pcre2_dfa_match()
以不同的方式进行匹配,并且与Perl不兼容。
(l) PCRE2 recognizes some special sequences such as PCRE2在模式开始时识别一些特殊序列,如(*CR)
or (*NO_JIT)
at the start of a pattern. These set overall options that cannot be changed within the pattern. (*CR)
或(*NO_JIT)
。这些设置了在模式中无法更改的总体选项。
(m) PCRE2 supports non-atomic positive lookaround assertions. This is an extension to the lookaround facilities. The default, Perl-compatible lookarounds are atomic. PCRE2支持非原子正向环视断言。这是环视设施的延伸。默认情况下,与Perl兼容的查找是原子的。
18. The Perl Perl/a
modifier restricts /d
numbers to pure ascii, and the /aa
modifier restricts /i
case-insensitive matching to pure ascii, ignoring Unicode rules. This separation cannot be represented with PCRE2_UCP. /a
修饰符将/d
数字限制为纯ascii,/aa
修饰符将/i
不区分大小写的匹配限制为纯ascii,忽略Unicode规则。这种分离不能用PCRE2_UCP来表示。
19. Perl has different limits than PCRE2. See the pcre2limit documentation for details. Perl与PCRE2有不同的限制。有关详细信息,请参阅pcre2limit文档。Perl went with 5.10 from recursion to iteration keeping the intermediate matches on the heap, which is ~10% slower but does not fall into any stack-overflow limit. Perl从递归到迭代使用了5.10,将中间匹配保持在堆上,这慢了10%,但不受任何堆栈溢出限制。PCRE2 made a similar change at release 10.30, and also has many build-time and run-time customizable limits. PCRE2在10.30版本中进行了类似的更改,并且还具有许多构建时间和运行时可自定义的限制。
Philip Hazel
Retired from University Computing Service
Cambridge, England.
Last updated: 30 August 2021
Copyright © 1997-2021 University of Cambridge.
Return to the PCRE2 index page.