Docs HomeMongoDB Manual

Collation排序规则

Collation allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.排序规则允许用户为字符串比较指定特定于语言的规则,例如大小写和重音标记的规则。

You can specify collation for a collection or a view, an index, or specific operations that support collation.可以为集合或视图、索引或支持排序规则的特定操作指定排序规则。

Collation Document排序规则文档

A collation document has the following fields:排序规则文档包含以下字段:

{
locale: <string>,
caseLevel: <boolean>,
caseFirst: <string>,
strength: <int>,
numericOrdering: <boolean>,
alternate: <string>,
maxVariable: <string>,
backwards: <boolean>
}

When specifying collation, the locale field is mandatory; all other collation fields are optional. 指定排序规则时,locale字段是必需的;所有其他排序规则字段都是可选的。For descriptions of the fields, see Collation Document.有关字段的说明,请参阅排序规则文档

Default collation parameter values vary depending on which locale you specify. 默认排序规则参数值因指定的区域设置而异。For a complete list of default collation parameters and the locales they are associated with, see Collation Default Parameters.有关默认排序规则参数及其关联的区域设置的完整列表,请参阅排序规则默认参数

Field字段Type类型Description描述
localestringThe ICU locale. ICU所在地。See Supported Languages and Locales for a list of supported locales.有关支持的区域设置的列表,请参阅支持的语言和区域设置
To specify simple binary comparison, specify locale value of "simple". 若要指定简单的二进制比较,请指定locale"simple"
strengthintegerOptional.可选的。The level of comparison to perform. 要执行的比较级别。Corresponds to ICU Comparison Levels. 对应ICU比较级别Possible values are:可能的值为:
ValueDescription描述
1Primary level of comparison. 初级比较。Collation performs comparisons of the base characters only, ignoring other differences such as diacritics and case.排序规则只执行基本字符的比较,忽略其他差异,如变音符号和大小写。
2Secondary level of comparison. 二级比较。Collation performs comparisons up to secondary differences, such as diacritics. That is, collation performs comparisons of base characters (primary differences) and diacritics (secondary differences). 排序执行比较,直到次要差异,例如变音符号。也就是说,排序规则执行基本字符(主要差异)和变音符号(次要差异)的比较。Differences between base characters takes precedence over secondary differences.基本字符之间的差异优先于次要差异。
3Tertiary level of comparison. 第三层次的比较。Collation performs comparisons up to tertiary differences, such as case and letter variants. 排序规则执行比较,直到第三级差异,例如大小写和字母变体。That is, collation performs comparisons of base characters (primary differences), diacritics (secondary differences), and case and variants (tertiary differences). 也就是说,排序规则执行基本字符(主要差异)、变音符号(次要差异)以及大小写和变体(第三差异)的比较。Differences between base characters takes precedence over secondary differences, which takes precedence over tertiary differences.基本字符之间的差异优先于二级差异,二级差异优先于三级差异。
This is the default level. 这是默认级别。
4Quaternary Level. 第四纪水平。Limited for specific use case to consider punctuation when levels 1-3 ignore punctuation or for processing Japanese text.当1-3级忽略标点符号或用于处理日语文本时,仅限于考虑标点符号的特定用例。
5Identical Level. 相同级别。Limited for specific use case of tie breaker.限用于断路器的特定使用情况。

See ICU Collation: Comparison Levels for details.有关详细信息,请参阅ICU排序:比较级别

caseLevelbooleanOptional.可选的。Flag that determines whether to include case comparison at strength level 1 or 2.用于确定是在strength级别1还是在strength级别2包括大小写比较的标志。
If true, include case comparison:如果为true,则包括大小写比较:
  • When used with strength:1, collation compares base characters and case.当与strength:1一起使用时,排序规则比较基本字符和大小写。
  • When used with strength:2, collation compares base characters, diacritics (and possible other secondary differences) and case.当与strength:2一起使用时,排序规则比较基本字符、变音符号(以及可能的其他次要差异)和大小写。
If false, do not include case comparison at level 1 or 2. 如果为false,则不包括级别1或级别2的大小写比较。The default is false.默认值为false
For more information, see ICU Collation: Case Level. 有关详细信息,请参阅ICU排序:大小写级别
caseFirststringOptional.可选的。A field that determines sort order of case differences during tertiary level comparisons.一个字段,用于确定三级比较期间事例差异的排序顺序。
Possible values are:可能的值为:
ValueDescription描述
"upper"Uppercase sorts before lowercase.大写排序在小写之前。
"lower"Lowercase sorts before uppercase.小写排序在大写之前。
"off"Default value. 默认值。Similar to "lower" with slight differences. 类似于"lower",略有差异。See http://userguide.icu-project.org/collation/customization for details of differences.请参阅http://userguide.icu-project.org/collation/customization以了解差异的详细信息。
numericOrderingbooleanOptional.可选的。Flag that determines whether to compare numeric strings as numbers or as strings.用于确定将数字字符串作为数字还是字符串进行比较的标志。
If true, compare as numbers. 如果为true,则以数字形式进行比较。For example, "10" is greater than "2".例如,"10"大于"2"
If false, compare as strings. For example, "10" is less than "2".如果为false,则作为字符串进行比较。例如,"10"小于"2"
Default is false.默认值为false
See numericOrdering Restrictions. 请参阅数值排序限制
alternatestringOptional.可选的。Field that determines whether collation should consider whitespace and punctuation as base characters for purposes of comparison.字段,用于确定排序规则是否应将空白和标点符号作为比较的基础字符。
Possible values are:可能的值为:
ValueDescription描述
"non-ignorable"Whitespace and punctuation are considered base characters.空格和标点符号被视为基本字符。
"shifted"Whitespace and punctuation are not considered base characters and are only distinguished at strength levels greater than 3.空白和标点符号不被视为基本字符,仅在强度等级大于3时才进行区分。

See ICU Collation: Comparison Levels for more information.有关详细信息,请参阅ICU排序:比较级别

Default is "non-ignorable".默认值为"non-ignorable"

maxVariablestringOptional.可选的。Field that determines up to which characters are considered ignorable when alternate: "shifted". 字段,用于确定在alternate: "shifted"时最多可忽略哪些字符。Has no effect if alternate: "non-ignorable"如果alternate: "non-ignorable"则没有效果。
Possible values are:可能的值为:
ValueDescription描述
"punct"Both whitespace and punctuation are ignorable and not considered base characters.空白和标点都是可以忽略的,不被视为基本字符。
"space"Whitespace is ignorable and not considered to be base characters.空白是可以忽略的,不被认为是基本字符。
backwardsbooleanOptional.可选的。Flag that determines whether strings with diacritics sort from back of the string, such as with some French dictionary ordering.用于确定带变音符号的字符串是否从字符串后面排序的标志,例如使用某些法语字典排序。
If true, compare from back to front.如果为true,则从后向前进行比较。
If false, compare from front to back.如果为false,则从前到后进行比较。
The default value is false. 默认值为false
normalizationbooleanOptional.可选的。Flag that determines whether to check if text require normalization and to perform normalization. Generally, majority of text does not require this normalization processing.用于确定是否检查文本是否需要规范化并执行规范化的标志。通常,大多数文本不需要这种规范化处理。
If true, check if fully normalized and perform normalization to compare text.如果为true,请检查是否已完全规范化并执行规范化以比较文本。
If false, does not check.如果为false,则不进行检查。
The default value is false.默认值为false
See http://userguide.icu-project.org/collation/concepts#TOC-Normalization for details. 请参阅http://userguide.icu-project.org/collation/concepts#TOC-Normalization以了解细节。

Operations that Support Collation支持排序规则的操作

You can specify collation for the following operations:可以为以下操作指定排序规则:

Note

You cannot specify multiple collations for an operation. 不能为一个操作指定多个排序规则。For example, you cannot specify different collations per field, or if performing a find with a sort, you cannot use one collation for the find and another for the sort.例如,不能为每个字段指定不同的排序规则,或者如果使用排序执行查找,则不能为查找使用一个排序规则,为排序使用另一个排序顺序。

Commands命令mongosh Methods方法
createdb.createCollection()
db.createView()
createIndexes [1]db.collection.createIndex() [1]
aggregatedb.collection.aggregate()
distinctdb.collection.distinct()
findAndModifydb.collection.findAndModify()
db.collection.findOneAndDelete()
db.collection.findOneAndReplace()
db.collection.findOneAndUpdate()
findcursor.collation() to specify collation for db.collection.find()用于为db.collection.find()指定排序规则
mapReducedb.collection.mapReduce()
deletedb.collection.deleteOne()
db.collection.deleteMany()
db.collection.remove()
updatedb.collection.updateOne(),
db.collection.updateMany(),
db.collection.replaceOne()
shardCollectionsh.shardCollection()
countdb.collection.count()
Individual update, replace, and delete operations in db.collection.bulkWrite().db.collection.bulkWrite()中的单个更新、替换和删除操作。
[1](1, 2) Some index types do not support collation. 某些索引类型不支持排序规则。See Collation and Unsupported Index Types for details.有关详细信息,请参阅排序规则和不支持的索引类型

Behavior行为

Local Variants本地变体

Some collation locales have variants, which employ special language-specific rules. 一些排序规则区域设置具有变体,这些变体使用特定于语言的特殊规则。To specify a locale variant, use the following syntax:要指定区域设置变体,请使用以下语法:

{ "locale" : "<locale code>@collation=<variant>" }

For example, to use the unihan variant of the Chinese collation:例如,要使用汉语排序规则的unihan变体:

{ "locale" : "zh@collation=unihan" }

For a complete list of all collation locales and their variants, see Collation Locales.有关所有排序规则区域及其变体的完整列表,请参阅排序规则区域设置

Collation and Views排序规则和视图

  • You can specify a default collation for a view at creation time. 可以在创建时为视图指定默认排序规则If no collation is specified, the view's default collation is the "simple" binary comparison collator. 如果未指定排序规则,则视图的默认排序规则是“简单”二进制比较排序规则。That is, the view does not inherit the collection's default collation.也就是说,视图不继承集合的默认排序规则。
  • String comparisons on the view use the view's default collation. 视图上的字符串比较使用视图的默认排序规则。An operation that attempts to change or override a view's default collation will fail with an error.尝试更改或覆盖视图的默认排序规则的操作将失败,并出现错误。
  • If creating a view from another view, you cannot specify a collation that differs from the source view's collation.如果从其他视图创建视图,则不能指定与源视图的排序规则不同的排序规则。
  • If performing an aggregation that involves multiple views, such as with $lookup or $graphLookup, the views must have the same collation.如果执行涉及多个视图的聚合,例如使用$lookup$graphLookup,则这些视图必须具有相同的排序规则。

Collation and Index Use排序规则和索引使用

To use an index for string comparisons, an operation must also specify the same collation. 若要使用索引进行字符串比较,操作还必须指定相同的排序规则。That is, an index with a collation cannot support an operation that performs string comparisons on the indexed fields if the operation specifies a different collation.也就是说,如果具有排序规则的索引指定了不同的排序规则,则该索引无法支持对索引字段执行字符串比较的操作。

For example, the collection myColl has an index on a string field category with the collation locale "fr".例如,集合myColl在排序规则区域设置为"fr"的字符串字段category上有一个索引。

db.myColl.createIndex( { category: 1 }, { collation: { locale: "fr" } } )

The following query operation, which specifies the same collation as the index, can use the index:以下查询操作指定与索引相同的排序规则,可以使用索引:

db.myColl.find( { category: "cafe" } ).collation( { locale: "fr" } )

However, the following query operation, which by default uses the "simple" binary collator, cannot use the index:但是,以下查询操作(默认情况下使用"simple"二进制排序器)不能使用索引:

db.myColl.find( { category: "cafe" } )

For a compound index where the index prefix keys are not strings, arrays, and embedded documents, an operation that specifies a different collation can still use the index to support comparisons on the index prefix keys.对于索引前缀键不是字符串、数组和嵌入文档的复合索引,指定不同排序规则的操作仍然可以使用索引来支持对索引前缀键的比较。

For example, the collection myColl has a compound index on the numeric fields score and price and the string field category; the index is created with the collation locale "fr" for string comparisons:例如,集合myColl对数字字段scoreprice以及字符串字段category有一个复合索引;索引是使用排序规则区域设置"fr"创建的,用于字符串比较:

db.myColl.createIndex(
{ score: 1, price: 1, category: 1 },
{ collation: { locale: "fr" } } )

The following operations, which use "simple" binary collation for string comparisons, can use the index:以下操作使用"simple"二进制排序规则进行字符串比较,可以使用索引:

db.myColl.find( { score: 5 } ).sort( { price: 1 } )
db.myColl.find( { score: 5, price: { $gt: NumberDecimal( "10" ) } } ).sort( { price: 1 } )

The following operation, which uses "simple" binary collation for string comparisons on the indexed category field, can use the index to fulfill only the score: 5 portion of the query:以下操作使用"simple"二进制排序规则对索引类别字段进行字符串比较,可以使用索引仅完成查询的score: 5部分:

db.myColl.find( { score: 5, category: "cafe" } )

Collation and Unsupported Index Types排序规则和不支持的索引类型

The following indexes only support simple binary comparison and do not support collation:以下索引仅支持简单的二进制比较,不支持排序规则

  • text indexes索引
  • 2d indexes索引
Tip

To create a text or 2d index on a collection that has a non-simple collation, you must explicitly specify {collation: {locale: "simple"} } when creating the index.若要在具有非简单排序规则的集合上创建text索引或2d索引,必须在创建索引时显式指定{collation: {locale: "simple"} }

Restrictions限制

numericOrdering

When specifying the numericOrdering as true the following restrictions apply:numericOrdering指定为true时,将应用以下限制:

  • Only contiguous non-negative integer substrings of digits are considered in the comparisons.在比较中只考虑数字的连续非负整数子串。

    numericOrdering does not support:不支持:

    • +
    • -
    • decimal separators, like decimal points and decimal commas小数分隔符,如小数点和小数逗号
    • exponents指数
  • Only Unicode code points in the Number or Decimal Digit (Nd) category are treated as digits.只有数字或十进制数字(Nd)类别中的Unicode代码点被视为数字。
  • If a digit length exceeds 254 characters, the excess characters are treated as a separate number.如果数字长度超过254个字符,则多余的字符将被视为一个单独的数字。

Consider a collection with the following string number and decimal values:考虑具有以下字符串编号和十进制值的集合:

db.c.insertMany(
[
{ "n" : "1" },
{ "n" : "2" },
{ "n" : "2.1" },
{ "n" : "-2.1" },
{ "n" : "2.2" },
{ "n" : "2.10" },
{ "n" : "2.20" },
{ "n" : "-10" },
{ "n" : "10" },
{ "n" : "20" },
{ "n" : "20.1" }
]
)

The following find query uses a collation document containing the numericOrdering parameter:以下find查询使用包含numericOrdering参数的排序规则文档:

db.c.find(
{ }, { _id: 0 }
).sort(
{ n: 1 }
).collation( {
locale: 'en_US',
numericOrdering: true
} )

The operation returns the following results:该操作返回以下结果:

[
{ n: '-2.1' },
{ n: '-10' },
{ n: '1' },
{ n: '2' },
{ n: '2.1' },
{ n: '2.2' },
{ n: '2.10' },
{ n: '2.20' },
{ n: '10' },
{ n: '20' },
{ n: '20.1' }
]
  • numericOrdering: true sorts the string values in ascending order as if they were numeric values.按升序对字符串值进行排序,就好像它们是数值一样。
  • The two negative values -2.1 and -10 are not sorted in the expected sort order because they have unsupported - characters.两个负值-2.1-10没有按预期的排序顺序排序,因为它们包含不支持的-字符。
  • The value 2.2 is sorted before the value 2.10, due to the fact that the numericOrdering parameter does not support decimal values.由于numericOrdering参数不支持十进制值,因此值2.2排序在值2.10之前。
  • As a result, 2.2 and 2.10 are sorted in lexicographic order.结果,2.22.10按字典顺序排序。