Definition定义
$substrBytesReturns the substring of a string. The substring starts with the character at the specified UTF-8 byte index (zero-based) in the string and continues for the number of bytes specified.返回字符串的子字符串。子字符串从字符串中指定UTF-8字节索引(从零开始)处的字符开始,并持续指定的字节数。$substrByteshas the following operator expression syntax:$subrBytes具有以下运算符表达式语法:{ $substrBytes: [ <string expression>, <byte index>, <byte count> ] }Field字段Type类型Description描述string expressionstring字符串The string from which the substring will be extracted.从中提取子字符串的字符串。string expressioncan be any valid expression as long as it resolves to a string.string expression可以是任何有效的表达式,只要它解析为字符串即可。For more information on expressions, see Expressions.有关表达式的详细信息,请参阅表达式。If the argument resolves to a value of如果参数解析为nullor refers to a field that is missing,$substrBytesreturns an empty string.null值或引用缺少的字段,$subrBytes将返回一个空字符串。If the argument does not resolve to a string or如果参数未解析为字符串或nullnor refers to a missing field,$substrBytesreturns an error.null,也未引用缺失的字段,$subrBytes将返回错误。byte indexnumber数字Indicates the starting point of the substring.指示子字符串的起点。byte indexcan be any valid expression as long as it resolves to a non-negative integer or number that can be represented as an integer (such as 2.0).byte index可以是任何有效的表达式,只要它解析为非负整数或可以表示为整数的数字(如2.0)。byte indexcannot refer to a starting index located in the middle of a multi-byte UTF-8 character.不能引用位于多字节UTF-8字符中间的起始索引。byte countnumber数字Can be any valid expression as long as it resolves to a non-negative integer or number that can be represented as an integer (such as 2.0).可以是任何有效的表达式,只要它解析为非负整数或可以表示为整数的数字(如2.0)。byte countcan not result in an ending index that is in the middle of a UTF-8 character.无法导致位于UTF-8字符中间的结束索引。
Behavior行为
The $substrBytes operator uses the indexes of UTF-8 encoded bytes where each code point, or character, may use between one and four bytes to encode.$strBytes运算符使用UTF-8编码字节的索引,其中每个码点或字符可以使用一到四个字节进行编码。
For example, US-ASCII characters are encoded using one byte. Characters with diacritic markings and additional Latin alphabetical characters (Latin characters outside of the English alphabet) are encoded using two bytes. Chinese, Japanese and Korean characters typically require three bytes, and other planes of unicode (emoji, mathematical symbols, etc.) require four bytes.例如,US-ASCII字符使用一个字节进行编码。带有变音标记的字符和其他拉丁字母字符(英语字母表之外的拉丁字符)使用两个字节进行编码。中文、日文和韩文字符通常需要三个字节,而unicode的其他平面(表情符号、数学符号等)需要四个字节。
It is important to be mindful of the content in the 注意字符串表达式中的内容很重要,因为提供位于UTF-8字符中间的string expression because providing a byte index or byte count located in the middle of a UTF-8 character will result in an error.byte index或byte count会导致错误。
$substrBytes differs from $substrCP in that $substrBytes counts the bytes of each character, whereas $substrCP counts the code points, or characters, regardless of how many bytes a character uses.$subrBytes与$subrCP的不同之处在于,$subrByte计算每个字符的字节数,而$subrCp计算代码点或字符数,而不管字符使用了多少字节。
|
|
|
|
|
|
|
|
| Errors with message:
|
| Errors with message:
|
Example示例
Single-Byte Character Set单字节字符集
Consider an 考虑使用以下文件进行inventory collection with the following documents:inventory集合:
db.inventory.insertMany( [
{ _id: 1, item: "ABC1", quarter: "13Q1", description: "product 1" },
{ _id: 2, item: "ABC2", quarter: "13Q4", description: "product 2" },
{ _id: 3, item: "XYZ1", quarter: "14Q2", description: null }
] )
The following operation uses the 以下操作使用$substrBytes operator separate the quarter value (containing only single byte US-ASCII characters) into a yearSubstring and a quarterSubstring. $subrBytes运算符将季度值(仅包含单字节US-ASCII字符)分隔为yearSubstring和quarterSubstring。The quarterSubstring field represents the rest of the string from the specified byte index following the yearSubstring. quarterSubstring字段表示yearSubstring之后指定byte index中字符串的其余部分。It is calculated by subtracting the 它是通过使用byte index from the length of the string using $strLenBytes.$strLenBytes从字符串的长度中减去byte index来计算的。
db.inventory.aggregate(
[
{
$project: {
item: 1,
yearSubstring: { $substrBytes: [ "$quarter", 0, 2 ] },
quarterSubtring: {
$substrBytes: [
"$quarter", 2, { $subtract: [ { $strLenBytes: "$quarter" }, 2 ] }
]
}
}
}
]
)
The operation returns the following results:该操作返回以下结果:
{ _id: 1, item: "ABC1", yearSubstring: "13", quarterSubtring: "Q1" }
{ _id: 2, item: "ABC2", yearSubstring: "13", quarterSubtring: "Q4" }
{ _id: 3, item: "XYZ1", yearSubstring: "14", quarterSubtring: "Q2" }Single-Byte and Multibyte Character Set单字节和多字节字符集
Create a 使用以下文档创建food collection with the following documents:food集合:
db.food.insertMany(
[
{ _id: 1, name: "apple" },
{ _id: 2, name: "banana" },
{ _id: 3, name: "éclair" },
{ _id: 4, name: "hamburger" },
{ _id: 5, name: "jalapeño" },
{ _id: 6, name: "pizza" },
{ _id: 7, name: "tacos" },
{ _id: 8, name: "寿司sushi" }
]
)
The following operation uses the 以下操作使用$substrBytes operator to create a three byte menuCode from the name value:$subrBytes运算符从name值创建一个三字节的menuCode:
db.food.aggregate(
[
{
$project: {
"name": 1,
"menuCode": { $substrBytes: [ "$name", 0, 3 ] }
}
}
]
)
The operation returns the following results:该操作返回以下结果:
{ _id: 1, name: "apple", menuCode: "app" }
{ _id: 2, name: "banana", menuCode: "ban" }
{ _id: 3, name: "éclair", menuCode: "éc" }
{ _id: 4, name: "hamburger", menuCode: "ham" }
{ _id: 5, name: "jalapeño", menuCode: "jal" }
{ _id: 6, name: "pizza", menuCode: "piz" }
{ _id: 7, name: "tacos", menuCode: "tac" }
{ _id: 8, name: "寿司sushi", menuCode: "寿" }