Database Manual / Reference / Query Language / Expressions

`$substrBytes` (expression operator)（表达式运算符）

Definition定义

$substrBytes

~~Returns the substring of a string. The substring starts with the character at the specified UTF-8 byte index (zero-based) in the string and continues for the number of bytes specified.~~返回字符串的子字符串。子字符串从字符串中指定UTF-8字节索引（从零开始）处的字符开始，并持续指定的字节数。

~~$substrBytes has the following operator expression syntax:~~$subrBytes具有以下运算符表达式语法：

{ $substrBytes: [ <string expression>, <byte index>, <byte count> ] }

~~Field~~字段 ~~Type~~类型 ~~Description~~描述

~~Field~~字段	~~Type~~类型	~~Description~~描述
`string expression`	~~string~~字符串	~~The string from which the substring will be extracted. `string expression` can be any valid expression as long as it resolves to a string.~~ 从中提取子字符串的字符串。`string expression`可以是任何有效的表达式，只要它解析为字符串即可。~~For more information on expressions, see Expressions.~~有关表达式的详细信息，请参阅表达式。 ~~If the argument resolves to a value of `null` or refers to a field that is missing, `$substrBytes` returns an empty string.~~如果参数解析为`null`值或引用缺少的字段，`$subrBytes`将返回一个空字符串。 ~~If the argument does not resolve to a string or `null` nor refers to a missing field, `$substrBytes` returns an error.~~如果参数未解析为字符串或`null`，也未引用缺失的字段，`$subrBytes`将返回错误。
`byte index`	~~number~~数字	~~Indicates the starting point of the substring. `byte index` can be any valid expression as long as it resolves to a non-negative integer or number that can be represented as an integer (such as 2.0).~~指示子字符串的起点。`byte index`可以是任何有效的表达式，只要它解析为非负整数或可以表示为整数的数字（如2.0）。 `byte index` ~~cannot refer to a starting index located in the middle of a multi-byte UTF-8 character.~~不能引用位于多字节UTF-8字符中间的起始索引。
`byte count`	~~number~~数字	~~Can be any valid expression as long as it resolves to a non-negative integer or number that can be represented as an integer (such as 2.0).~~可以是任何有效的表达式，只要它解析为非负整数或可以表示为整数的数字（如2.0）。 `byte count` ~~can not result in an ending index that is in the middle of a UTF-8 character.~~无法导致位于UTF-8字符中间的结束索引。

string expression

~~string~~字符串

~~The string from which the substring will be extracted. string expression can be any valid expression as long as it resolves to a string.~~ 从中提取子字符串的字符串。string expression可以是任何有效的表达式，只要它解析为字符串即可。~~For more information on expressions, see Expressions.~~有关表达式的详细信息，请参阅表达式。

~~If the argument resolves to a value of null or refers to a field that is missing, $substrBytes returns an empty string.~~如果参数解析为null值或引用缺少的字段，$subrBytes将返回一个空字符串。

~~If the argument does not resolve to a string or null nor refers to a missing field, $substrBytes returns an error.~~如果参数未解析为字符串或null，也未引用缺失的字段，$subrBytes将返回错误。

byte index

~~number~~数字

~~Indicates the starting point of the substring. byte index can be any valid expression as long as it resolves to a non-negative integer or number that can be represented as an integer (such as 2.0).~~指示子字符串的起点。byte index可以是任何有效的表达式，只要它解析为非负整数或可以表示为整数的数字（如2.0）。

byte index ~~cannot refer to a starting index located in the middle of a multi-byte UTF-8 character.~~不能引用位于多字节UTF-8字符中间的起始索引。

byte count

~~number~~数字

~~Can be any valid expression as long as it resolves to a non-negative integer or number that can be represented as an integer (such as 2.0).~~可以是任何有效的表达式，只要它解析为非负整数或可以表示为整数的数字（如2.0）。

byte count ~~can not result in an ending index that is in the middle of a UTF-8 character.~~无法导致位于UTF-8字符中间的结束索引。

Behavior行为

~~The $substrBytes operator uses the indexes of UTF-8 encoded bytes where each code point, or character, may use between one and four bytes to encode.~~$strBytes运算符使用UTF-8编码字节的索引，其中每个码点或字符可以使用一到四个字节进行编码。

For example, US-ASCII characters are encoded using one byte. Characters with diacritic markings and additional Latin alphabetical characters (Latin characters outside of the English alphabet) are encoded using two bytes. Chinese, Japanese and Korean characters typically require three bytes, and other planes of unicode (emoji, mathematical symbols, etc.) require four bytes.例如，US-ASCII字符使用一个字节进行编码。带有变音标记的字符和其他拉丁字母字符（英语字母表之外的拉丁字符）使用两个字节进行编码。中文、日文和韩文字符通常需要三个字节，而unicode的其他平面（表情符号、数学符号等）需要四个字节。

~~It is important to be mindful of the content in the string expression because providing a byte index or byte count located in the middle of a UTF-8 character will result in an error.~~注意字符串表达式中的内容很重要，因为提供位于UTF-8字符中间的byte index或byte count会导致错误。

$substrBytes differs from $substrCP in that $substrBytes counts the bytes of each character, whereas $substrCP counts the code points, or characters, regardless of how many bytes a character uses.$subrBytes与$subrCP的不同之处在于，$subrByte计算每个字符的字节数，而$subrCp计算代码点或字符数，而不管字符使用了多少字节。

~~Example~~示例	~~Results~~结果
`{ $substrBytes: [ "abcde", 1, 2 ] }`	`"bc"`
`{ $substrBytes: [ "Hello World!", 6, 5 ] }`	`"World"`
`{ $substrBytes: [ "cafétéria", 0, 5 ] }`	`"café"`
`{ $substrBytes: [ "cafétéria", 5, 4 ] }`	`"tér"`
`{ $substrBytes: [ "cafétéria", 7, 3 ] }`	Errors with message: `"Error: Invalid range, starting index is a UTF-8 continuation byte."`
`{ $substrBytes: [ "cafétéria", 3, 1 ] }`	Errors with message: `"Error: Invalid range, ending index is in the middle of a UTF-8 character."`

Example示例

Single-Byte Character Set单字节字符集

~~Consider an inventory collection with the following documents:~~考虑使用以下文件进行inventory集合：

db.inventory.insertMany( [
   { _id: 1, item: "ABC1", quarter: "13Q1", description: "product 1" },
   { _id: 2, item: "ABC2", quarter: "13Q4", description: "product 2" },
   { _id: 3, item: "XYZ1", quarter: "14Q2", description: null }
] )

~~The following operation uses the $substrBytes operator separate the quarter value (containing only single byte US-ASCII characters) into a yearSubstring and a quarterSubstring.~~ 以下操作使用$subrBytes运算符将季度值（仅包含单字节US-ASCII字符）分隔为yearSubstring和quarterSubstring。~~The quarterSubstring field represents the rest of the string from the specified byte index following the yearSubstring.~~ quarterSubstring字段表示yearSubstring之后指定byte index中字符串的其余部分。~~It is calculated by subtracting the byte index from the length of the string using $strLenBytes.~~它是通过使用$strLenBytes从字符串的长度中减去byte index来计算的。

db.inventory.aggregate(
  [
    {
      $project: {
        item: 1,
        yearSubstring: { $substrBytes: [ "$quarter", 0, 2 ] },
        quarterSubtring: {
          $substrBytes: [
            "$quarter", 2, { $subtract: [ { $strLenBytes: "$quarter" }, 2 ] }
          ]
        }
      }
    }
  ]
)

~~The operation returns the following results:~~该操作返回以下结果：

{ _id: 1, item: "ABC1", yearSubstring: "13", quarterSubtring: "Q1" }
{ _id: 2, item: "ABC2", yearSubstring: "13", quarterSubtring: "Q4" }
{ _id: 3, item: "XYZ1", yearSubstring: "14", quarterSubtring: "Q2" }

Single-Byte and Multibyte Character Set单字节和多字节字符集

~~Create a food collection with the following documents:~~使用以下文档创建food集合：

db.food.insertMany(
 [
    { _id: 1, name: "apple" },
    { _id: 2, name: "banana" },
    { _id: 3, name: "éclair" },
    { _id: 4, name: "hamburger" },
    { _id: 5, name: "jalapeño" },
    { _id: 6, name: "pizza" },
    { _id: 7, name: "tacos" },
    { _id: 8, name: "寿司sushi" }
 ]
)

~~The following operation uses the $substrBytes operator to create a three byte menuCode from the name value:~~以下操作使用$subrBytes运算符从name值创建一个三字节的menuCode：

db.food.aggregate(
  [
    {
      $project: {
        "name": 1,
        "menuCode": { $substrBytes: [ "$name", 0, 3 ] }
      }
    }
  ]
)

~~The operation returns the following results:~~该操作返回以下结果：

{ _id: 1, name: "apple", menuCode: "app" }
{ _id: 2, name: "banana", menuCode: "ban" }
{ _id: 3, name: "éclair", menuCode: "éc" }
{ _id: 4, name: "hamburger", menuCode: "ham" }
{ _id: 5, name: "jalapeño", menuCode: "jal" }
{ _id: 6, name: "pizza", menuCode: "piz" }
{ _id: 7, name: "tacos", menuCode: "tac" }
{ _id: 8, name: "寿司sushi", menuCode: "寿" }

Tip

$substrCP

Back

$substr

$substrCP