Definition定义
$strLenBytesReturns the number of UTF-8 encoded bytes in the specified string.返回指定字符串中UTF-8编码的字节数。$strLenByteshas the following operator expression syntax:$strLenBytes具有以下运算符表达式语法:{ $strLenBytes: <string expression> }The argument can be any valid expression as long as it resolves to a string.参数可以是任何有效的表达式,只要它解析为字符串即可。For more information on expressions, see Expressions.有关表达式的详细信息,请参阅表达式。If the argument resolves to a value of如果参数解析为nullor refers to a missing field,$strLenBytesreturns an error.null值或引用缺少的字段,$strLenBytes将返回错误。
Behavior行为
The $strLenBytes operator counts the number of UTF-8 encoded bytes in a string where each character may use between one and four bytes.$strLenBytes运算符计算字符串中UTF-8编码字节的数量,其中每个字符可以使用1到4个字节。
For example, US-ASCII characters are encoded using one byte. Characters with diacritic markings and additional Latin alphabetical characters (Latin characters outside of the English alphabet) are encoded using two bytes. Chinese, Japanese and Korean characters typically require three bytes, and other planes of unicode (emoji, mathematical symbols, etc.) require four bytes.例如,US-ASCII字符使用一个字节进行编码。带有变音标记的字符和其他拉丁字母字符(英语字母表之外的拉丁字符)使用两个字节进行编码。中文、日文和韩文字符通常需要三个字节,而unicode的其他平面(表情符号、数学符号等)需要四个字节。
The $strLenBytes operator differs from $strLenCP operator which counts the code points in the specified string regardless of how many bytes each character uses.$strLenBytes运算符不同于$strLenCP运算符,后者计算指定字符串中的代码点,而不管每个字符使用多少字节。
| 5 | |
| 12 | |
| 9 | |
| 11 | é is encoded using two bytes.é使用两个字节进行编码。 |
| 0 | |
| 7 | € is encoded using three bytes. λ is encoded using two bytes.€使用三个字节进行编码。λ使用两个字节进行编码。 |
| 6 |
Example示例
Single-Byte and Multibyte Character Set单字节和多字节字符集
Create a 使用以下文档创建食物集合:food collection with the following documents:
db.food.insertMany(
[
{ "_id" : 1, "name" : "apple" },
{ "_id" : 2, "name" : "banana" },
{ "_id" : 3, "name" : "éclair" },
{ "_id" : 4, "name" : "hamburger" },
{ "_id" : 5, "name" : "jalapeño" },
{ "_id" : 6, "name" : "pizza" },
{ "_id" : 7, "name" : "tacos" },
{ "_id" : 8, "name" : "寿司" }
]
)
The following operation uses the 以下操作使用$strLenBytes operator to calculate the length of each name value:$strLenBytes运算符计算每个name值的length:
db.food.aggregate(
[
{
$project: {
"name": 1,
"length": { $strLenBytes: "$name" }
}
}
]
)
The operation returns the following results:该操作返回以下结果:
{ "_id" : 1, "name" : "apple", "length" : 5 }
{ "_id" : 2, "name" : "banana", "length" : 6 }
{ "_id" : 3, "name" : "éclair", "length" : 7 }
{ "_id" : 4, "name" : "hamburger", "length" : 9 }
{ "_id" : 5, "name" : "jalapeño", "length" : 9 }
{ "_id" : 6, "name" : "pizza", "length" : 5 }
{ "_id" : 7, "name" : "tacos", "length" : 5 }
{ "_id" : 8, "name" : "寿司", "length" : 6 }
The documents with _id: 3 and _id: 5 each contain a diacritic character (é and ñ respectively) that requires two bytes to encode. The document with _id: 8 contains two Japanese characters that are encoded using three bytes each. _id: 3和_id: 5的文档均包含一个需要两个字节编码的变音字符(分别为é和ñ)。_id: 8的文档包含两个日文字符,每个字符使用三个字节进行编码。This makes the 这使得length greater than the number of characters in name for the documents with _id: 3, _id: 5 and _id: 8.length大于_id: 3、_id: 5和_id: 8的文档名称中的字符数。