Collation allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.排序允许用户为字符串比较指定特定于语言的规则,例如字母大小写和重音标记的规则。
You can specify collation for a collection or a view, an index, or specific operations that support collation.您可以为集合或视图、索引或支持排序规则的特定操作指定排序规则。
To specify collation when you query documents in the MongoDB Atlas UI, see Specify Collation.要在MongoDB Atlas UI中查询文档时指定排序规则,请参阅指定排序规则。
Collation Document排序规则文档
A collation document has the following fields:排序规则文档包含以下字段:
{
locale: <string>,
caseLevel: <boolean>,
caseFirst: <string>,
strength: <int>,
numericOrdering: <boolean>,
alternate: <string>,
maxVariable: <string>,
backwards: <boolean>
}
When specifying collation, the 指定排序规则时,区域设置字段是必填的;所有其他排序字段都是可选的。有关字段的描述,请参阅排序规则文档。locale field is mandatory; all other collation fields are optional. For descriptions of the fields, see Collation Document.
Default collation parameter values vary depending on which locale you specify. For a complete list of default collation parameters and the locales they are associated with, see Collation Default Parameters.默认排序规则参数值因指定的区域设置而异。有关默认排序规则参数及其关联的区域设置的完整列表,请参阅排序规则默认参数。
| string |
| ||||||||||||
strength | integer |
| ||||||||||||
caseLevel | boolean |
| ||||||||||||
caseFirst | string |
| ||||||||||||
numericOrdering | boolean |
| ||||||||||||
alternate | string |
| ||||||||||||
maxVariable | string |
| ||||||||||||
backwards | boolean |
| ||||||||||||
normalization | boolean |
|
Operations that Support Collation支持排序的操作
You can specify collation for the following operations:您可以为以下操作指定排序规则:
Note
You cannot specify multiple collations for an operation. For example, you cannot specify different collations per field, or if performing a find with a sort, you cannot use one collation for the find and another for the sort.不能为操作指定多个排序规则。例如,您不能为每个字段指定不同的排序规则,或者如果使用排序执行查找,则不能对查找使用一个排序规则,对排序使用另一个。
mongosh | |
|---|---|
create | |
createIndexes [1] | |
aggregate | |
distinct | |
findAndModify | |
find |
|
mapReduce | |
delete | |
update | |
shardCollection | |
count | |
db.collection.bulkWrite().db.collection.bulkWrite()中的单个更新、替换和删除操作。 |
| [1] | (1, 2) |
Behavior行为
Local Variants本地变体
Some collation locales have variants, which employ special language-specific rules. To specify a locale variant, use the following syntax:一些排序语言环境有变体,它们采用特殊的语言特定规则。要指定区域设置变量,请使用以下语法:
{ "locale" : "<locale code>@collation=<variant>" }
unihanFor example, to use the 例如,要使用中文排序规则的unihan variant of the Chinese collation:
{ "locale" : "zh@collation=unihan" }
For a complete list of all collation locales and their variants, see Collation Locales.有关所有排序规则区域及其变体的完整列表,请参阅排序规则区域。
Collation and Views排序和视图
You can specify a default collation for a view at creation time. If no collation is specified, the view's default collation is the "simple" binary comparison collator. That is, the view does not inherit the collection's default collation.您可以在创建时为视图指定默认排序规则。如果没有指定排序规则,则视图的默认排序规则是“简单”二进制比较排序规则。也就是说,视图不继承集合的默认排序规则。String comparisons on the view use the view's default collation. An operation that attempts to change or override a view's default collation will fail with an error.视图上的字符串比较使用视图的默认排序规则。尝试更改或覆盖视图默认排序规则的操作将失败并出现错误。If creating a view from another view, you cannot specify a collation that differs from the source view's collation.如果从另一个视图创建视图,则不能指定与源视图的排序规则不同的排序规则。If performing an aggregation that involves multiple views, such as with如果执行涉及多个视图的聚合,例如使用$lookupor$graphLookup, the views must have the same collation.$lookup或$graphLookup,则这些视图必须具有相同的排序规则。
Collation and Index Use排序和索引使用
To use an index for string comparisons, an operation must also specify the same collation. That is, an index with a collation cannot support an operation that performs string comparisons on the indexed fields if the operation specifies a different collation.要使用索引进行字符串比较,操作还必须指定相同的排序规则。也就是说,如果操作指定了不同的排序规则,则具有排序规则的索引无法支持对索引字段执行字符串比较的操作。
Warning
Because indexes that are configured with collation use ICU collation keys to achieve sort order, collation-aware index keys may be larger than index keys for indexes without collation.因为配置了排序规则的索引使用ICU排序键来实现排序顺序,所以对于没有排序规则的指数,具有排序规则意识的索引键可能比索引键大。
A restaurants collection has the following documents:restaurants(餐厅)集合有以下文档:
db.restaurants.insertMany( [
{ _id: 1, category: "café", status: "Open" },
{ _id: 2, category: "cafe", status: "open" },
{ _id: 3, category: "cafE", status: "open" }
] )
The restaurants collection has an index on a string field category with the collation locale "fr".restaurants集合在排序规则为"fr"的字符串字段category上有一个索引。
db.restaurants.createIndex( { category: 1 }, { collation: { locale: "fr" } } )
The following query, which specifies the same collation as the index, can use the index:以下查询指定了与索引相同的排序规则,可以使用索引:
db.restaurants.find( { category: "cafe" } ).collation( { locale: "fr" } )
However, the following query operation, which by default uses the "simple" binary collator, cannot use the index:但是,以下查询操作(默认情况下使用“简单”二进制排序器)不能使用索引:
db.restaurants.find( { category: "cafe" } )
For a compound index where the index prefix keys are not strings, arrays, and embedded documents, an operation that specifies a different collation can still use the index to support comparisons on the index prefix keys.对于索引前缀键不是字符串、数组和嵌入式文档的复合索引,指定不同排序规则的操作仍然可以使用索引来支持对索引前缀键的比较。
For example, the collection restaurants has a compound index on the numeric fields score and price and the string field category; the index is created with the collation locale "fr" for string comparisons:
db.restaurants.createIndex(
{ score: 1, price: 1, category: 1 },
{ collation: { locale: "fr" } } )
The following operations, which use 以下使用"simple" binary collation for string comparisons, can use the index:"simple"二进制排序规则进行字符串比较的操作可以使用索引:
db.restaurants.find( { score: 5 } ).sort( { price: 1 } )
db.restaurants.find( { score: 5, price: { $gt: Decimal128( "10" ) } } ).sort( { price: 1 } )
The following operation, which uses "simple" binary collation for string comparisons on the indexed category field, can use the index to fulfill only the score: 5 portion of the query:
db.restaurants.find( { score: 5, category: "cafe" } )
To confirm whether a query used an index, run the query with the 要确认查询是否使用了索引,请使用explain() option.explain()选项运行查询。
Important
Matches against document keys, including embedded document keys, use simple binary comparison. This means that a query for a key like "type.café" will not match the key "type.cafe", regardless of the value you set for the strength parameter.与文档键(包括嵌入式文档键)的匹配使用简单的二进制比较。这意味着,无论您为strength参数设置了什么值,对类似"type.café"的键的查询都不会与键type.cafe"匹配。
Collation and Unsupported Index Types排序规则和不支持的索引类型
The following indexes only support simple binary comparison and do not support collation:以下索引仅支持简单的二进制比较,不支持排序规则:
Tip
To create a 若要在具有非简单排序规则的集合上创建text or 2d index on a collection that has a non-simple collation, you must explicitly specify {collation: {locale: "simple"} } when creating the index.text或2d索引,必须在创建索引时明确指定{collation: {locale: "simple"} }。
Restrictions限制
numericOrdering
When specifying the 当将numericOrdering as true the following restrictions apply:numericOrdering指定为true时,适用以下限制:
Only contiguous non-negative integer substrings of digits are considered in the comparisons.在比较中只考虑连续的非负整数子串。numericOrderingdoes not support:不支持:+-decimal separators, like decimal points and decimal commas小数分隔符,如小数点和小数逗号exponents指数
Only Unicode code points in the Number or Decimal Digit (Nd) category are treated as digits.只有数字或十进制数字(Nd)类别中的Unicode码位被视为数字。If a digit length exceeds 254 characters, the excess characters are treated as a separate number.如果数字长度超过254个字符,则多余的字符将被视为单独的数字。
Consider a collection with the following string number and decimal values:考虑一个具有以下字符串编号和十进制值的集合:
db.c.insertMany(
[
{ "n" : "1" },
{ "n" : "2" },
{ "n" : "2.1" },
{ "n" : "-2.1" },
{ "n" : "2.2" },
{ "n" : "2.10" },
{ "n" : "2.20" },
{ "n" : "-10" },
{ "n" : "10" },
{ "n" : "20" },
{ "n" : "20.1" }
]
)
The following find query uses a collation document containing the numericOrdering parameter:
db.c.find(
{ }, { _id: 0 }
).sort(
{ n: 1 }
).collation( {
locale: 'en_US',
numericOrdering: true
} )
The operation returns the following results:该操作返回以下结果:
[
{ n: '-2.1' },
{ n: '-10' },
{ n: '1' },
{ n: '2' },
{ n: '2.1' },
{ n: '2.2' },
{ n: '2.10' },
{ n: '2.20' },
{ n: '10' },
{ n: '20' },
{ n: '20.1' }
]
numericOrdering: true sorts the string values in ascending order as if they were numeric values.将字符串值按升序排序,就像它们是数值一样。
- The two negative values
-2.1 and -10 are not sorted in the expected sort order because they have unsupported - characters.
- The value
2.2 is sorted before the value 2.10, due to the fact that the numericOrdering parameter does not support decimal values.
- As a result,
2.2 and 2.10 are sorted in lexicographic order.
Example示例
A restaurants collection has the following documents:
db.restaurants.insertMany( [
{ _id: 1, category: "café", status: "Open" },
{ _id: 2, category: "cafe", status: "open" },
{ _id: 3, category: "cafE", status: "open" }
] )
The following find() operation uses collation:以下find()操作使用排序规则:
db.restaurants.find(
{ category: "cafe", status: "Open" }
).collation( { locale: "fr", strength: 1 } )
[
{ _id: 1, category: 'café', status: 'Open' },
{ _id: 2, category: 'cafe', status: 'open' },
{ _id: 3, category: 'cafE', status: 'open' }
]
The filter specifies a collation with strength: 1, which means the query ignores differences between case and letter variants. As a result, even though there is not a document that has an exact match with the specified case and letter variants in the filter, the operation returns all documents in the collection.筛选器指定了一个strength: 1的排序规则,这意味着查询忽略了大小写和字母变体之间的差异。因此,即使没有与筛选器中指定的大小写和字母变体完全匹配的文档,该操作也会返回集合中的所有文档。