On this page本页内容
$substrBytes
Returns the substring of a string. 返回字符串的子字符串。The substring starts with the character at the specified UTF-8 byte index (zero-based) in the string and continues for the number of bytes specified.子字符串以字符串中指定UTF-8字节索引(从零开始)处的字符开始,并继续指定字节数。
$substrBytes
has the following operator expression syntax:具有以下运算符表达式语法:
{ $substrBytes: [ <string expression>, <byte index>, <byte count> ] }
string expression | string |
|
byte index | number |
|
byte count | number |
|
The $substrBytes
operator uses the indexes of UTF-8 encoded bytes where each code point, or character, may use between one and four bytes to encode.$substrBytes
运算符使用UTF-8编码字节的索引,其中每个代码点或字符可以使用一到四个字节进行编码。
For example, US-ASCII characters are encoded using one byte. 例如,US-ASCII字符使用一个字节进行编码。Characters with diacritic markings and additional Latin alphabetical characters (i.e. Latin characters outside of the English alphabet) are encoded using two bytes. 带有变音符号标记的字符和其他拉丁字母字符(即英语字母表以外的拉丁字符)使用两个字节进行编码。Chinese, Japanese and Korean characters typically require three bytes, and other planes of unicode (emoji, mathematical symbols, etc.) require four bytes.中文、日文和韩文字符通常需要三个字节,而其他unicode平面(表情符号、数学符号等)需要四个字节。
It is important to be mindful of the content in the 注意string expression
because providing a byte index
or byte count
located in the middle of a UTF-8 character will result in an error.string expression
中的内容很重要,因为提供位于UTF-8字符中间的byte index
或byte count
将导致错误。
$substrBytes
differs from 与$substrCP
in that $substrBytes
counts the bytes of each character, whereas $substrCP
counts the code points, or characters, regardless of how many bytes a character uses.$substrCP
的不同之处在于,$substrobBytes
计算每个字符的字节数,而$substrCP
计算代码点或字符数,而不管字符使用了多少字节。
{ $substrBytes: [ "abcde", 1, 2 ] } | "bc"
|
{ $substrBytes: [ "Hello World!", 6, 5 ] } | "World"
|
{ $substrBytes: [ "cafétéria", 0, 5 ] } | "café"
|
{ $substrBytes: [ "cafétéria", 5, 4 ] } | "tér"
|
{ $substrBytes: [ "cafétéria", 7, 3 ] } |
|
{ $substrBytes: [ "cafétéria", 3, 1 ] } |
|
Consider an 考虑使用以下文档进行inventory
collection with the following documents:inventory
集合:
{ "_id" : 1, "item" : "ABC1", quarter: "13Q1", "description" : "product 1" } { "_id" : 2, "item" : "ABC2", quarter: "13Q4", "description" : "product 2" } { "_id" : 3, "item" : "XYZ1", quarter: "14Q2", "description" : null }
The following operation uses the 以下操作使用$substrBytes
operator separate the quarter
value (containing only single byte US-ASCII characters) into a yearSubstring
and a quarterSubstring
. $substrBytes
运算符将quarter
值(仅包含单字节US-ASCII字符)分隔为yearSubstring
和quarterSubstring
。The quarterSubstring
field represents the rest of the string from the specified byte index
following the yearSubstring
. quarterSubstring
字段表示yearSubstring
后面指定字节索引中的其余字符串。It is calculated by subtracting the 它是通过使用byte index
from the length of the string using $strLenBytes
.$strLenBytes
从字符串长度中减去byte index
来计算的。
db.inventory.aggregate( [ { $project: { item: 1, yearSubstring: { $substrBytes: [ "$quarter", 0, 2 ] }, quarterSubtring: { $substrBytes: [ "$quarter", 2, { $subtract: [ { $strLenBytes: "$quarter" }, 2 ] } ] } } } ] )
The operation returns the following results:该操作返回以下结果:
{ "_id" : 1, "item" : "ABC1", "yearSubstring" : "13", "quarterSubtring" : "Q1" } { "_id" : 2, "item" : "ABC2", "yearSubstring" : "13", "quarterSubtring" : "Q4" } { "_id" : 3, "item" : "XYZ1", "yearSubstring" : "14", "quarterSubtring" : "Q2" }
A collection named 名为food的集合包含以下文档:food
contains the following documents:
{ "_id" : 1, "name" : "apple" } { "_id" : 2, "name" : "banana" } { "_id" : 3, "name" : "éclair" } { "_id" : 4, "name" : "hamburger" } { "_id" : 5, "name" : "jalapeño" } { "_id" : 6, "name" : "pizza" } { "_id" : 7, "name" : "tacos" } { "_id" : 8, "name" : "寿司sushi" }
The following operation uses the 以下操作使用$substrBytes
operator to create a three byte menuCode
from the name
value:$substrBytes
运算符从name
值创建三字节menuCode
:
db.food.aggregate( [ { $project: { "name": 1, "menuCode": { $substrBytes: [ "$name", 0, 3 ] } } } ] )
The operation returns the following results:该操作返回以下结果:
{ "_id" : 1, "name" : "apple", "menuCode" : "app" } { "_id" : 2, "name" : "banana", "menuCode" : "ban" } { "_id" : 3, "name" : "éclair", "menuCode" : "éc" } { "_id" : 4, "name" : "hamburger", "menuCode" : "ham" } { "_id" : 5, "name" : "jalapeño", "menuCode" : "jal" } { "_id" : 6, "name" : "pizza", "menuCode" : "piz" } { "_id" : 7, "name" : "tacos", "menuCode" : "tac" } { "_id" : 8, "name" : "寿司sushi", "menuCode" : "寿" }