Docs HomeMongoDB Manual

$sampleRate (aggregation)

Definition定义

$sampleRate

New in version 4.4.2. 4.4.2版新增。

Matches a random selection of input documents. 匹配随机选择的输入文档。The number of documents selected approximates the sample rate expressed as a percentage of the total number of documents.所选文档数近似于样本率,表示为文档总数的百分比。

The $sampleRate operator has the following syntax:$sampleRate运算符具有以下语法:

{ $sampleRate: <non-negative float> }

Behavior行为

The selection process uses a uniform random distribution. 选择过程使用均匀随机分布。The sample rate is a floating point number between 0 and 1, inclusive, which represents the probability that a given document will be selected as it passes through the pipeline.采样率是一个介于0和1之间(包括0和1)的浮点数,表示给定文档在通过管道时被选中的概率。

For example, a sample rate of 0.33 selects roughly one document in three.例如,0.33的采样率大约选择三分之一的文档。

This expression:此表达式:

{ $match: { $sampleRate: 0.33 } }

is equivalent to using the $rand operator as follows:相当于使用$rand运算符,如下所示:

{ $match: { $expr: { $lt: [ { $rand: {} }, 0.33 ] } } }

Repeated runs on the same data will produce different outcomes since the selection process is non-deterministic. 对同一数据重复运行将产生不同的结果,因为选择过程是不确定的。In general, smaller datasets will show more variability in the number of documents selected on each run. 通常,较小的数据集在每次运行时选择的文档数量上会显示出更多的可变性。As collection size increases, the number of documents chosen will approach the expected value for a uniform random distribution.随着集合规模的增加,所选择的文档数量将接近均匀随机分布的预期值。

Note

If an exact number of documents is required from each run, the $sample operator should be used instead of $sampleRate.如果每次运行都需要确切数量的文档,则应使用$sample运算符而不是$sampleRate

Examples实例

This code creates a small collection with 100 documents.这段代码创建了一个包含100个文档的小集合。

N = 100
bulk = db.collection.initializeUnorderedBulkOp()
for ( i = 0; i < N; i++) { bulk.insert( {_id: i, r: 0} ) }
bulk.execute()

The $sampleRate operator can be used in a pipeline to select random documents from the collection. In this example we use $sampleRate to select about one third of the documents.$sampleRate运算符可以在管道中用于从集合中选择随机文档。在本例中,我们使用$sampleRate来选择大约三分之一的文档。

db.collection.aggregate(
[
{ $match: { $sampleRate: 0.33 } },
{ $count: "numMatches" }
]
)

This is the output from 5 runs on the sample collection:这是在样本集合上运行5次的输出:

{ "numMatches" : 38 }
{ "numMatches" : 36 }
{ "numMatches" : 29 }
{ "numMatches" : 29 }
{ "numMatches" : 28 }
Tip

See also: 另请参阅: