$substrBytes (aggregation)

~~On this page~~本页内容

~~Definition~~定义
~~Behavior~~行为
~~Example~~实例

Definition定义

$substrBytes

~~Returns the substring of a string. The substring starts with the character at the specified UTF-8 byte index (zero-based) in the string and continues for the number of bytes specified.~~返回字符串的子字符串。子字符串以字符串中指定的UTF-8字节索引（从零开始）处的字符开始，并持续指定的字节数。

$substrBytes ~~has the following operator expression syntax:~~具有以下运算符表达式语法：

{ $substrBytes: [ <string expression>, <byte index>, <byte count> ] }

~~Field~~字段	~~Type~~类型	~~Description~~描述
`string expression`	string	~~The string from which the substring will be extracted.~~ 将从中提取子字符串的字符串。~~`string expression` can be any valid expression as long as it resolves to a string.~~ `string expression`可以是任何有效的表达式，只要它解析为字符串即可。~~For more information on expressions, see Expressions.~~有关表达式的详细信息，请参阅表达式。 ~~If the argument resolves to a value of `null` or refers to a field that is missing, `$substrBytes` returns an empty string.~~如果参数解析为`null`值或引用了一个丢失的字段，`$substrBytes`将返回一个空字符串。 ~~If the argument does not resolve to a string or `null` nor refers to a missing field, `$substrBytes` returns an error.~~ 如果参数未解析为字符串或`null`，也未引用丢失的字段，则`$substrBytes`将返回错误。
`byte index`	number	~~Indicates the starting point of the substring.~~ 指示子字符串的起始点。`byte index` ~~can be any valid expression as long as it resolves to a non-negative integer or number that can be represented as an integer (such as 2.0).~~可以是任何有效的表达式，只要它解析为非负整数或可以表示为整数的数字（如2.0）。 `byte index` ~~cannot refer to a starting index located in the middle of a multi-byte UTF-8 character.~~ 不能引用位于多字节UTF-8字符中间的起始索引。
`byte count`	number	~~Can be any valid expression as long as it resolves to a non-negative integer or number that can be represented as an integer (such as 2.0).~~可以是任何有效的表达式，只要它解析为非负整数或可以表示为整数的数字（如2.0）。 `byte count` ~~can not result in an ending index that is in the middle of a UTF-8 character.~~ 无法导致位于UTF-8字符中间的结束索引。

Behavior行为

~~The $substrBytes operator uses the indexes of UTF-8 encoded bytes where each code point, or character, may use between one and four bytes to encode.~~$substraBytes运算符使用UTF-8编码字节的索引，其中每个代码点或字符可以使用一到四个字节进行编码。

For example, US-ASCII characters are encoded using one byte. Characters with diacritic markings and additional Latin alphabetical characters (i.e. Latin characters outside of the English alphabet) are encoded using two bytes. 例如，US-ASCII字符使用一个字节进行编码。带有变音符号标记的字符和附加的拉丁字母字符（即英语字母表之外的拉丁字符）使用两个字节进行编码。~~Chinese, Japanese and Korean characters typically require three bytes, and other planes of unicode (emoji, mathematical symbols, etc.) require four bytes.~~中文、日文和韩文字符通常需要三个字节，而unicode的其他平面（表情符号、数学符号等）需要四个字节。

~~It is important to be mindful of the content in the string expression because providing a byte index or byte count located in the middle of a UTF-8 character will result in an error.~~注意string expression中的内容很重要，因为提供位于UTF-8字符中间的byte index或byte count会导致错误。

$substrBytes ~~differs from $substrCP in that $substrBytes counts the bytes of each character, whereas $substrCP counts the code points, or characters, regardless of how many bytes a character uses.~~与$substrCP的不同之处在于，$substrBytes~~统计每个字符的字节数，而$substraCP统计代码点或字符数，而与字符使用的字节数无关。~~

~~Example~~示例	~~Results~~结果
{ $substrBytes: [ "abcde", 1, 2 ] }	"bc"
{ $substrBytes: [ "Hello World!", 6, 5 ] }	"World"
{ $substrBytes: [ "cafétéria", 0, 5 ] }	"café"
{ $substrBytes: [ "cafétéria", 5, 4 ] }	"tér"
{ $substrBytes: [ "cafétéria", 7, 3 ] }	~~Errors with message:~~消息错误： `"Error: Invalid range, starting index is a UTF-8 continuation byte."`
{ $substrBytes: [ "cafétéria", 3, 1 ] }	~~Errors with message:~~消息错误： `"Error: Invalid range, ending index is in the middle of a UTF-8 character."`

Example实例

Single-Byte Character Set单字节字符集

~~Consider an inventory collection with the following documents:~~考虑一个包含以下文档的inventory集合：

{ "_id" : 1, "item" : "ABC1", quarter: "13Q1", "description" : "product 1" }
{ "_id" : 2, "item" : "ABC2", quarter: "13Q4", "description" : "product 2" }
{ "_id" : 3, "item" : "XYZ1", quarter: "14Q2", "description" : null }

The following operation uses the $substrBytes operator separate the quarter value (containing only single byte US-ASCII characters) into a yearSubstring and a quarterSubstring. The quarterSubstring field represents the rest of the string from the specified byte index following the yearSubstring. 以下操作使用$substrBytes运算符将quarter值（仅包含单字节US-ASCII字符）分隔为yearSubstring和quarterSubstring。quarterSubstring字段表示yearString后面指定字节索引中字符串的其余部分。~~It is calculated by subtracting the byte index from the length of the string using $strLenBytes.~~它是通过使用$strLenBytes从字符串的长度中减去byte index来计算的。

db.inventory.aggregate(
  [
    {
      $project: {
        item: 1,
        yearSubstring: { $substrBytes: [ "$quarter", 0, 2 ] },
        quarterSubtring: {
          $substrBytes: [
            "$quarter", 2, { $subtract: [ { $strLenBytes: "$quarter" }, 2 ] }
          ]
        }
      }
    }
  ]
)

~~The operation returns the following results:~~该操作返回以下结果：

{ "_id" : 1, "item" : "ABC1", "yearSubstring" : "13", "quarterSubtring" : "Q1" }
{ "_id" : 2, "item" : "ABC2", "yearSubstring" : "13", "quarterSubtring" : "Q4" }
{ "_id" : 3, "item" : "XYZ1", "yearSubstring" : "14", "quarterSubtring" : "Q2" }

Single-Byte and Multibyte Character Set单字节和多字节字符集

~~Create a food collection with the following documents:~~使用以下文档创建food集合：

db.food.insertMany(
 [
    { "_id" : 1, "name" : "apple" },
    { "_id" : 2, "name" : "banana" },
    { "_id" : 3, "name" : "éclair" },
    { "_id" : 4, "name" : "hamburger" },
    { "_id" : 5, "name" : "jalapeño" },
    { "_id" : 6, "name" : "pizza" },
    { "_id" : 7, "name" : "tacos" },
    { "_id" : 8, "name" : "寿司sushi" }
 ]
)

~~The following operation uses the $substrBytes operator to create a three byte menuCode from the name value:~~以下操作使用$substraBytes运算符从name值创建一个三字节的menuCode：

db.food.aggregate(
  [
    {
      $project: {
        "name": 1,
        "menuCode": { $substrBytes: [ "$name", 0, 3 ] }
      }
    }
  ]
)