$binarySize (aggregation)

On this page本页内容

Definition定义

$binarySize

New in version 4.4.在版本4.4中新增

Returns the size of a given string or binary data value's content in bytes.返回给定字符串或二进制数据值内容的大小(以字节为单位)。

$binarySize has the following syntax:语法如下:

{ $binarySize: <string or binData> }

The argument can be any valid expression as long as it resolves to either a string or binary data value. 参数可以是任何有效的表达式,只要它解析为字符串或二进制数据值即可。For more information on expressions, see Expressions.有关表达式的详细信息,请参阅表达式

Behavior行为

The argument for $binarySize must resolve to either:$binarySize的参数必须解析为:

  • A string,一串,
  • A binary data value, or二进制数据值,或
  • null.null

If the argument is a string or binary data value, the expression returns the size of the argument in bytes.如果参数是字符串或二进制数据值,则表达式返回参数的大小(以字节为单位)。

If the argument is null, the expression returns null.如果参数为null,则表达式返回null

If the argument resolves to any other data type, $binarySize errors.如果参数解析为任何其他数据类型,$binarySize错误。

String Size Calculation字符串大小计算

If the argument for $binarySize is a string, the operator counts the number of UTF-8 encoded bytes in a string where each character may use between one and four bytes.如果$binarySize的参数是字符串,则运算符计算字符串中的UTF-8编码字节数,其中每个字符可以使用1到4个字节。

For example, US-ASCII characters are encoded using one byte. Characters with diacritic markings and additional Latin alphabetical characters (i.e. Latin characters outside of the English alphabet) are encoded using two bytes. 例如,US-ASCII字符使用一个字节进行编码。带有变音符号的字符和附加的拉丁字母字符(即英语字母表之外的拉丁字符)使用两个字节进行编码。Chinese, Japanese and Korean characters typically require three bytes, and other planes of unicode (emoji, mathematical symbols, etc.) require four bytes.中文、日文和韩文字符通常需要三个字节,其他unicode平面(表情符号、数学符号等)需要四个字节。

Consider the following examples:考虑以下示例:

ExampleResultsNotes
{ $binarySize: "abcde" }
5Each character is encoded using one byte.每个字符使用一个字节进行编码。
{ $binarySize: "Hello World!" }
12Each character is encoded using one byte.每个字符使用一个字节进行编码。
{ $binarySize: "cafeteria" }
9Each character is encoded using one byte.每个字符使用一个字节进行编码。
{ $binarySize: "cafétéria" }
11é is encoded using two bytes.使用两个字节进行编码。
{ $binarySize: "" }
0Empty strings return 0.空字符串返回0。
{ $binarySize: "$€λG" }
7 is encoded using three bytes. 使用三个字节进行编码。λ is encoded using two bytes.使用两个字节进行编码。
{ $binarySize: "寿司" }
6Each character is encoded using three bytes.每个字符使用三个字节进行编码。

Example示例

In mongosh, create a sample collection named images with the following documents:mongosh中,使用以下文档创建名为images的示例集合:

db.images.insertMany([
  { _id: 1, name: "cat.jpg", binary: new BinData(0, "OEJTfmD8twzaj/LPKLIVkA==")},
  { _id: 2, name: "big_ben.jpg", binary: new BinData(0, "aGVsZmRqYWZqYmxhaGJsYXJnYWZkYXJlcTU1NDE1Z2FmZCBmZGFmZGE=")},
  { _id: 3, name: "tea_set.jpg", binary: new BinData(0, "MyIRAFVEd2aImaq7zN3u/w==")},
  { _id: 4, name: "concert.jpg", binary: new BinData(0, "TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=")},
  { _id: 5, name: "empty.jpg", binary: new BinData(0, "") }
])

The following aggregation projects:以下聚合projects

  • The name fieldname字段
  • The imageSize field, which uses $binarySize to return the size of the document's binary field in bytes.imageSize字段,它使用$binarySize返回文档binary字段的大小(以字节为单位)。
db.images.aggregate([
  {
    $project: {
      "name": "$name",
      "imageSize": { $binarySize: "$binary" }
    }
  }
])

The operation returns the following result:该操作返回以下结果:

{ "_id" : 1, "name" : "cat.jpg", "imageSize" : 16 }
{ "_id" : 2, "name" : "big_ben.jpg", "imageSize" : 41 }
{ "_id" : 3, "name" : "teaset.jpg", "imageSize" : 16 }
{ "_id" : 4, "name" : "concert.jpg", "imageSize" : 269 }
{ "_id" : 5, "name" : "empty.jpg", "imageSize" : 0 }

Find Largest Binary Data查找最大二进制数据

The following pipeline returns the image with the largest binary data size:以下管道返回具有最大二进制数据大小的图像:

db.images.aggregate([
   // First Stage
   { $project: { name: "$name", imageSize: { $binarySize: "$binary" } }  },
   // Second Stage
   { $sort: { "imageSize" : -1 } },
   // Third Stage
   { $limit: 1 }
])
First Stage第一阶段

The first stage of the pipeline projects:管道projects的第一阶段:

  • The name fieldname字段
  • The imageSize field, which uses $binarySize to return the size of the document's binary field in bytes.imageSize字段,它使用$binarySize返回文档二进制字段的大小(以字节为单位)。

This stage outputs the following documents to the next stage:本阶段向下一阶段输出以下文件:

{ "_id" : 1, "name" : "cat.jpg", "imageSize" : 16 }
{ "_id" : 2, "name" : "big_ben.jpg", "imageSize" : 41 }
{ "_id" : 3, "name" : "teaset.jpg", "imageSize" : 16 }
{ "_id" : 4, "name" : "concert.jpg", "imageSize" : 269 }
{ "_id" : 5, "name" : "empty.jpg", "imageSize" : 0 }
Second Stage第二阶段

The second stage sorts the documents by imageSize in descending order.第二阶段按imageSize降序对文档进行排序

This stage outputs the following documents to the next stage:本阶段向下一阶段输出以下文件:

{ "_id" : 4, "name" : "concert.jpg", "imageSize" : 269 }
{ "_id" : 2, "name" : "big_ben.jpg", "imageSize" : 41 }
{ "_id" : 1, "name" : "cat.jpg", "imageSize" : 16 }
{ "_id" : 3, "name" : "teaset.jpg", "imageSize" : 16 }
{ "_id" : 5, "name" : "empty.jpg", "imageSize" : 0 }
Third Stage第三阶段

The third stage limits the output documents to only return the document appearing first in the sort order:第三阶段将输出文档限制为仅返回排序顺序中第一个出现的文档:

{ "_id" : 4, "name" : "concert.jpg", "imageSize" : 269 }
Tip提示
←  $avg (aggregation)$bottom (aggregation accumulator) →