$strLenBytes (aggregation)
On this page本页内容
Definition定义
$strLenBytes
-
Returns the number of UTF-8 encoded bytes in the specified string.返回指定字符串中UTF-8编码的字节数。$strLenBytes
has the following operator expression syntax:具有以下运算符表达式语法:{ $strLenBytes: <string expression> }
The argument can be any valid expression as long as it resolves to a string. For more information on expressions, see Expressions.参数可以是任何有效的表达式,只要它解析为字符串即可。有关表达式的详细信息,请参阅表达式。If the argument resolves to a value of如果参数解析为null
or refers to a missing field,$strLenBytes
returns an error.null
值或引用了丢失的字段,$strLenBytes
将返回错误。
Behavior行为
The $strLenBytes
operator counts the number of UTF-8 encoded bytes in a string where each character may use between one and four bytes.$strLenBytes
运算符统计字符串中UTF-8编码的字节数,其中每个字符可以使用一到四个字节。
For example, US-ASCII characters are encoded using one byte. Characters with diacritic markings and additional Latin alphabetical characters (i.e. Latin characters outside of the English alphabet) are encoded using two bytes. 例如,US-ASCII字符使用一个字节进行编码。带有变音符号标记的字符和附加的拉丁字母字符(即英语字母表之外的拉丁字符)使用两个字节进行编码。Chinese, Japanese and Korean characters typically require three bytes, and other planes of unicode (emoji, mathematical symbols, etc.) require four bytes.中文、日文和韩文字符通常需要三个字节,而unicode的其他平面(表情符号、数学符号等)需要四个字节。
The $strLenBytes
operator differs from $strLenCP
operator which counts the code points in the specified string regardless of how many bytes each character uses.
$strLenBytes
运算符不同于$strLenCP
运算符,后者统计指定字符串中的代码点,而不管每个字符使用多少字节。
{ $strLenBytes: "abcde" } | 5 | |
{ $strLenBytes: "Hello World!" } | 12 | |
{ $strLenBytes: "cafeteria" } | 9 | |
{ $strLenBytes: "cafétéria" } | 11 | é |
{ $strLenBytes: "" } | 0 | |
{ $strLenBytes: "$€λG" } | 7 | € λ |
{ $strLenBytes: "寿司" } | 6 |
Example实例
Single-Byte and Multibyte Character Set单字节和多字节字符集
Create a 使用以下文档创建food
collection with the following documents:food
集合:
db.food.insertMany(
[
{ "_id" : 1, "name" : "apple" },
{ "_id" : 2, "name" : "banana" },
{ "_id" : 3, "name" : "éclair" },
{ "_id" : 4, "name" : "hamburger" },
{ "_id" : 5, "name" : "jalapeño" },
{ "_id" : 6, "name" : "pizza" },
{ "_id" : 7, "name" : "tacos" },
{ "_id" : 8, "name" : "寿司" }
]
)
The following operation uses the 以下操作使用$strLenBytes
operator to calculate the length
of each name
value:$strLenBytes
运算符来计算每个name
值的length
:
db.food.aggregate(
[
{
$project: {
"name": 1,
"length": { $strLenBytes: "$name" }
}
}
]
)
The operation returns the following results:该操作返回以下结果:
{ "_id" : 1, "name" : "apple", "length" : 5 }
{ "_id" : 2, "name" : "banana", "length" : 6 }
{ "_id" : 3, "name" : "éclair", "length" : 7 }
{ "_id" : 4, "name" : "hamburger", "length" : 9 }
{ "_id" : 5, "name" : "jalapeño", "length" : 9 }
{ "_id" : 6, "name" : "pizza", "length" : 5 }
{ "_id" : 7, "name" : "tacos", "length" : 5 }
{ "_id" : 8, "name" : "寿司", "length" : 6 }
The documents with _id: 3
and _id: 5
each contain a diacritic character (é
and ñ
respectively) that requires two bytes to encode. _id:3
和_id:5
的文档分别包含一个变音符号(é
和ñ
),需要两个字节进行编码。The document with _id为8的文档包含两个日语字符,每个字符使用三个字节进行编码。_id: 8
contains two Japanese characters that are encoded using three bytes each. This makes the 这使得length
greater than the number of characters in name
for the documents with _id: 3
, _id: 5
and _id: 8
._id:3
、_id:5
和_id:8
的文档的长度大于name
中的字符数。