Collation - Database Manual - MongoDB Docs
Database Manual / Reference

Collation排序规则

Collation allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.排序允许用户为字符串比较指定特定于语言的规则,例如字母大小写和重音标记的规则。

You can specify collation for a collection or a view, an index, or specific operations that support collation.您可以为集合或视图、索引或支持排序规则的特定操作指定排序规则。

To specify collation when you query documents in the MongoDB Atlas UI, see Specify Collation.要在MongoDB Atlas UI中查询文档时指定排序规则,请参阅指定排序规则

Collation Document排序规则文档

A collation document has the following fields:排序规则文档包含以下字段:

{
locale: <string>,
caseLevel: <boolean>,
caseFirst: <string>,
strength: <int>,
numericOrdering: <boolean>,
alternate: <string>,
maxVariable: <string>,
backwards: <boolean>
}

When specifying collation, the locale field is mandatory; all other collation fields are optional. For descriptions of the fields, see Collation Document.指定排序规则时,区域设置字段是必填的;所有其他排序字段都是可选的。有关字段的描述,请参阅排序规则文档

Default collation parameter values vary depending on which locale you specify. For a complete list of default collation parameters and the locales they are associated with, see Collation Default Parameters.默认排序规则参数值因指定的区域设置而异。有关默认排序规则参数及其关联的区域设置的完整列表,请参阅排序规则默认参数

Field字段Type类型Description描述

locale

string

The ICU locale. See Supported Languages and Locales for a list of supported locales.ICU所在地。有关支持的区域设置列表,请参阅支持的语言和区域设置

To specify simple binary comparison, specify locale value of "simple".要指定简单的二进制比较,请将locale的值指定为"simple"

strengthinteger

Optional. The level of comparison to perform. Corresponds to ICU Comparison Levels. Possible values are:可选。要执行的比较级别。对应于ICU比较级别。可能的值有:

ValueDescription描述
1Primary level of comparison. Collation performs comparisons of the base characters only, ignoring other differences such as diacritics and case.初级比较。排序规则仅对基本字符进行比较,忽略其他差异,如变音符号和大小写。
2Secondary level of comparison. Collation performs comparisons up to secondary differences, such as diacritics. That is, collation performs comparisons of base characters (primary differences) and diacritics (secondary differences). Differences between base characters takes precedence over secondary differences.二级比较。排序执行比较,直到次要差异,如变音符号。也就是说,排序规则执行基本字符(主要差异)和变音符号(次要差异)的比较。基本字符之间的差异优先于次要差异。
3Tertiary level of comparison. Collation performs comparisons up to tertiary differences, such as case and letter variants. That is, collation performs comparisons of base characters (primary differences), diacritics (secondary differences), and case and variants (tertiary differences). Differences between base characters takes precedence over secondary differences, which takes precedence over tertiary differences.第三级比较。排序可以进行高达三级差异的比较,如大小写和字母变体。也就是说,排序规则执行基本字符(主要差异)、变音符号(次要差异)以及大小写和变体(第三差异)的比较。基本字符之间的差异优先于次要差异,次要差异优先于第三级差异。
This is the default level.这是默认级别。
4Quaternary Level. Limited for specific use case to consider punctuation when levels 1-3 ignore punctuation or for processing Japanese text.第四纪水平。仅限于特定用例,当级别1-3忽略标点符号或处理日语文本时,考虑标点符号。
5Identical Level. Limited for specific use case of tie breaker.相同级别。仅限于断路器的特定使用情况。

See ICU Collation: Comparison Levels for details.详情请参阅ICU排序:比较级别

caseLevel

boolean

Optional. Flag that determines whether to include case comparison at strength level 1 or 2.可选。用于确定是在strength级别1还是2包括大小写比较的标志。

If true, include case comparison:如果为true,请包括大小写比较:

  • When used with strength:1, collation compares base characters and case.当使用strength:1时,排序规则会比较基本字符和大小写。
  • When used with strength:2, collation compares base characters, diacritics (and possible other secondary differences) and case.当使用strength:2时,排序规则会比较基本字符、变音符号(以及可能的其他次要差异)和大小写。

If false, do not include case comparison at level 1 or 2. The default is false.如果为false,则不包括级别12的大小写比较。默认值为false

For more information, see ICU Collation: Case Level.有关更多信息,请参阅ICU排序:大小写级别

caseFirststring

Optional. A field that determines sort order of case differences during tertiary level comparisons.可选。在三级比较期间确定案例差异排序顺序的字段。

Possible values are:可能的值有:

ValueDescription描述
"upper"Uppercase sorts before lowercase.大写排序在小写排序之前。
"lower"Lowercase sorts before uppercase.小写排序在大写排序之前。
"off"Default value. Similar to "lower" with slight differences. See https://unicode-org.github.io/icu/userguide/strings/properties.html#customization for details of differences.默认值。与"lower"相似,略有差异。请参阅https://unicode-org.github.io/icu/userguide/strings/properties.html#customization了解差异的详细信息。
numericOrderingboolean

Optional. Flag that determines whether to compare numeric strings as numbers or as strings.可选。用于确定是将数字字符串比较为数字还是字符串的标志。

If true, compare as numbers. For example, "10" is greater than "2".如果为真,则作为数字进行比较。例如,"10"大于"2"

If false, compare as strings. For example, "10" is less than "2".如果为false,则作为字符串进行比较。例如,"10"小于"2"

Default is false.默认值为false

See numericOrdering Restrictions.请参阅numericOrdering限制

alternatestring

Optional. Field that determines whether collation should consider whitespace and punctuation as base characters for purposes of comparison.可选。用于确定排序规则是否应将空格和标点符号视为比较的基本字符的字段。

Possible values are:可能的值有:

ValueDescription描述
"non-ignorable"Whitespace and punctuation are considered base characters.空格和标点符号被视为基本字符。
"shifted"Whitespace and punctuation are not considered base characters and are only distinguished at strength levels greater than 3.空格和标点符号不被视为基本字符,只有在强度级别大于3时才能区分。

See ICU Collation: Comparison Levels for more information.有关更多信息,请参阅ICU排序:比较级别

Default is "non-ignorable".默认值为"non-ignorable"

maxVariablestring

Optional. Field that determines up to which characters are considered ignorable when alternate: "shifted". Has no effect if alternate: "non-ignorable"可选。用于确定在alternate: "shifted"时最多哪些字符被视为可忽略的字段。如果alternate: "non-ignorable"则无效

Possible values are:可能的值有:

ValueDescription描述

"punct"

Both whitespace and punctuation are ignorable and not considered base characters.空格和标点符号都是可以忽略的,不被视为基本字符。

"space"

Whitespace is ignorable and not considered to be base characters.空格是可忽略的,不被视为基本字符。
backwardsboolean

Optional. Flag that determines whether strings with diacritics sort from back of the string, such as with some French dictionary ordering.可选。用于确定带有变音符号的字符串是否从字符串后面排序的标志,例如使用某些法语词典排序。

If true, compare from back to front.如果为真,从后到前进行比较。

If false, compare from front to back.如果为false,请从前到后进行比较。

The default value is false.默认值为false

normalizationboolean

Optional. Flag that determines whether to check if text require normalization and to perform normalization. Generally, majority of text does not require this normalization processing.可选。确定是否检查文本是否需要规范化并执行规范化的标志。一般来说,大多数文本不需要这种规范化处理。

If true, check if fully normalized and perform normalization to compare text.如果为true,请检查是否完全规范化,并执行规范化以比较文本。

If false, does not check.如果为false,则不进行检查。

The default value is false.默认值为false

See https://unicode-org.github.io/icu/userguide/collation/concepts.html#normalization for details.请参阅https://unicode-org.github.io/icu/userguide/collation/concepts.html#normalization以了解详情。

Operations that Support Collation支持排序的操作

You can specify collation for the following operations:您可以为以下操作指定排序规则:

Note

You cannot specify multiple collations for an operation. For example, you cannot specify different collations per field, or if performing a find with a sort, you cannot use one collation for the find and another for the sort.不能为操作指定多个排序规则。例如,您不能为每个字段指定不同的排序规则,或者如果使用排序执行查找,则不能对查找使用一个排序规则,对排序使用另一个。

Commands命令mongosh Methods方法
create
createIndexes [1]

db.collection.createIndex() [1]

aggregate

db.collection.aggregate()

distinct

db.collection.distinct()

findAndModify
find

cursor.collation() to specify collation for db.collection.find()db.collection.find()指定排序规则

mapReduce

db.collection.mapReduce()

delete
update
shardCollection
count
Individual update, replace, and delete operations in db.collection.bulkWrite().db.collection.bulkWrite()中的单个更新、替换和删除操作。
[1](1, 2) Some index types do not support collation. See Collation and Unsupported Index Types for details.某些索引类型不支持排序规则。有关详细信息,请参阅排序规则和不支持的索引类型

Behavior行为

Local Variants本地变体

Some collation locales have variants, which employ special language-specific rules. To specify a locale variant, use the following syntax:一些排序语言环境有变体,它们采用特殊的语言特定规则。要指定区域设置变量,请使用以下语法:

{ "locale" : "<locale code>@collation=<variant>" }

For example, to use the unihan variant of the Chinese collation:例如,要使用中文排序规则的

unihan

变体:

{ "locale" : "zh@collation=unihan" }

For a complete list of all collation locales and their variants, see Collation Locales.有关所有排序规则区域及其变体的完整列表,请参阅排序规则区域

Collation and Views排序和视图

  • You can specify a default collation for a view at creation time. If no collation is specified, the view's default collation is the "simple" binary comparison collator. That is, the view does not inherit the collection's default collation.您可以在创建时为视图指定默认排序规则。如果没有指定排序规则,则视图的默认排序规则是“简单”二进制比较排序规则。也就是说,视图不继承集合的默认排序规则。
  • String comparisons on the view use the view's default collation. An operation that attempts to change or override a view's default collation will fail with an error.视图上的字符串比较使用视图的默认排序规则。尝试更改或覆盖视图默认排序规则的操作将失败并出现错误。
  • If creating a view from another view, you cannot specify a collation that differs from the source view's collation.如果从另一个视图创建视图,则不能指定与源视图的排序规则不同的排序规则。
  • If performing an aggregation that involves multiple views, such as with $lookup or $graphLookup, the views must have the same collation.如果执行涉及多个视图的聚合,例如使用$lookup$graphLookup,则这些视图必须具有相同的排序规则。

Collation and Index Use排序和索引使用

To use an index for string comparisons, an operation must also specify the same collation. That is, an index with a collation cannot support an operation that performs string comparisons on the indexed fields if the operation specifies a different collation.要使用索引进行字符串比较,操作还必须指定相同的排序规则。也就是说,如果操作指定了不同的排序规则,则具有排序规则的索引无法支持对索引字段执行字符串比较的操作。

Warning

Because indexes that are configured with collation use ICU collation keys to achieve sort order, collation-aware index keys may be larger than index keys for indexes without collation.因为配置了排序规则的索引使用ICU排序键来实现排序顺序,所以对于没有排序规则的指数,具有排序规则意识的索引键可能比索引键大。

A restaurants collection has the following documents:restaurants(餐厅)集合有以下文档:

db.restaurants.insertMany( [
{ _id: 1, category: "café", status: "Open" },
{ _id: 2, category: "cafe", status: "open" },
{ _id: 3, category: "cafE", status: "open" }
] )

The restaurants collection has an index on a string field category with the collation locale "fr".restaurants集合在排序规则为"fr"的字符串字段category上有一个索引。

db.restaurants.createIndex( { category: 1 }, { collation: { locale: "fr" } } )

The following query, which specifies the same collation as the index, can use the index:以下查询指定了与索引相同的排序规则,可以使用索引:

db.restaurants.find( { category: "cafe" } ).collation( { locale: "fr" } )

However, the following query operation, which by default uses the "simple" binary collator, cannot use the index:但是,以下查询操作(默认情况下使用“简单”二进制排序器)不能使用索引:

db.restaurants.find( { category: "cafe" } )

For a compound index where the index prefix keys are not strings, arrays, and embedded documents, an operation that specifies a different collation can still use the index to support comparisons on the index prefix keys.对于索引前缀键不是字符串、数组和嵌入式文档的复合索引,指定不同排序规则的操作仍然可以使用索引来支持对索引前缀键的比较。

For example, the collection restaurants has a compound index on the numeric fields score and price and the string field category; the index is created with the collation locale "fr" for string comparisons:

db.restaurants.createIndex(
{ score: 1, price: 1, category: 1 },
{ collation: { locale: "fr" } } )

The following operations, which use "simple" binary collation for string comparisons, can use the index:以下使用"simple"二进制排序规则进行字符串比较的操作可以使用索引:

db.restaurants.find( { score: 5 } ).sort( { price: 1 } )
db.restaurants.find( { score: 5, price: { $gt: Decimal128( "10" ) } } ).sort( { price: 1 } )

The following operation, which uses "simple" binary collation for string comparisons on the indexed category field, can use the index to fulfill only the score: 5 portion of the query:

db.restaurants.find( { score: 5, category: "cafe" } )

To confirm whether a query used an index, run the query with the explain() option.要确认查询是否使用了索引,请使用explain()选项运行查询。

Important

Matches against document keys, including embedded document keys, use simple binary comparison. This means that a query for a key like "type.café" will not match the key "type.cafe", regardless of the value you set for the strength parameter.与文档键(包括嵌入式文档键)的匹配使用简单的二进制比较。这意味着,无论您为strength参数设置了什么值,对类似"type.café"的键的查询都不会与键type.cafe"匹配。

Collation and Unsupported Index Types排序规则和不支持的索引类型

The following indexes only support simple binary comparison and do not support collation:以下索引仅支持简单的二进制比较,不支持排序规则:

Tip

To create a text or 2d index on a collection that has a non-simple collation, you must explicitly specify {collation: {locale: "simple"} } when creating the index.若要在具有非简单排序规则的集合上创建text2d索引,必须在创建索引时明确指定{collation: {locale: "simple"} }

Restrictions限制

numericOrdering

When specifying the numericOrdering as true the following restrictions apply:当将numericOrdering指定为true时,适用以下限制:

  • Only contiguous non-negative integer substrings of digits are considered in the comparisons.在比较中只考虑连续的非负整数子串。

    numericOrdering does not support:不支持:

    • +
    • -
    • decimal separators, like decimal points and decimal commas小数分隔符,如小数点和小数逗号
    • exponents指数
  • Only Unicode code points in the Number or Decimal Digit (Nd) category are treated as digits.只有数字或十进制数字(Nd)类别中的Unicode码位被视为数字。
  • If a digit length exceeds 254 characters, the excess characters are treated as a separate number.如果数字长度超过254个字符,则多余的字符将被视为单独的数字。

Consider a collection with the following string number and decimal values:考虑一个具有以下字符串编号和十进制值的集合:

db.c.insertMany(
[
{ "n" : "1" },
{ "n" : "2" },
{ "n" : "2.1" }, { "n" : "-2.1" },
{ "n" : "2.2" },
{ "n" : "2.10" },
{ "n" : "2.20" }, { "n" : "-10" },
{ "n" : "10" },
{ "n" : "20" },
{ "n" : "20.1" }
]
)

The following find query uses a collation document containing the numericOrdering parameter:

db.c.find(
{ }, { _id: 0 }
).sort(
{ n: 1 }
).collation( {
locale: 'en_US',
numericOrdering: true
} )

The operation returns the following results:该操作返回以下结果:

[
    { n: '-2.1' },
    { n: '-10' },
{ n: '1' },
{ n: '2' },
{ n: '2.1' }, { n: '2.2' }, { n: '2.10' },
{ n: '2.20' },
{ n: '10' },
{ n: '20' },
{ n: '20.1' }
]
  • numericOrdering: true sorts the string values in ascending order as if they were numeric values.将字符串值按升序排序,就像它们是数值一样。
  • The two negative values -2.1 and -10 are not sorted in the expected sort order because they have unsupported - characters.
  • The value 2.2 is sorted before the value 2.10, due to the fact that the numericOrdering parameter does not support decimal values.
  • As a result, 2.2 and 2.10 are sorted in lexicographic order.

Example示例

A restaurants collection has the following documents:

db.restaurants.insertMany( [
{ _id: 1, category: "café", status: "Open" },
{ _id: 2, category: "cafe", status: "open" },
{ _id: 3, category: "cafE", status: "open" }
] )

The following find() operation uses collation:以下find()操作使用排序规则:

db.restaurants.find(
{ category: "cafe", status: "Open" }
).collation( { locale: "fr", strength: 1 } )
[
{ _id: 1, category: 'café', status: 'Open' },
{ _id: 2, category: 'cafe', status: 'open' },
{ _id: 3, category: 'cafE', status: 'open' }
]

The filter specifies a collation with strength: 1, which means the query ignores differences between case and letter variants. As a result, even though there is not a document that has an exact match with the specified case and letter variants in the filter, the operation returns all documents in the collection.筛选器指定了一个strength: 1的排序规则,这意味着查询忽略了大小写和字母变体之间的差异。因此,即使没有与筛选器中指定的大小写和字母变体完全匹配的文档,该操作也会返回集合中的所有文档。