$rankFusion and $scoreFusion stages are available as Preview features. To learn more, see Preview Features.$rankFusion和$scoreFusion阶段可作为预览功能使用。要了解更多信息,请参阅预览功能。Important
$scoreFusion is only available for deployments that use MongoDB 8.2+.仅适用于使用MongoDB 8.2+的部署。
Definition定义
$scoreFusion$scoreFusionfirst executes all input pipelines independently and then de-duplicates and combines the input pipeline results into a final scored results set.首先独立执行所有输入管道,然后进行重复数据消除,并将输入管道结果组合成最终的评分结果集。$scoreFusionoutputs a ranked set of documents based on the scores of the documents and weights from their input pipelines. You can specify an arithmetic expression to compute the score based on the input scores from the pipeline stage. By default, it uses the average of the scores for the documents from the different input pipeline stages.根据文档的分数和输入管道的权重输出一组排名的文档。您可以指定一个算术表达式,根据流水线阶段的输入分数计算分数。默认情况下,它使用来自不同输入管道阶段的文档的平均分数。Use使用$scoreFusionto search for documents in a single collection based on multiple criteria and retrieve a final scored result set that factors in all specified criteria.$scoreFusion根据多个条件在单个集合中搜索文档,并检索考虑所有指定条件的最终评分结果集。
Syntax语法
The stage has the following syntax:该阶段具有以下语法:
{ $scoreFusion: {
input: {
pipelines: {
<input-pipeline-name>: <expression>,
<input-pipeline-name>: <expression>,
...
},
normalization: "none|sigmoid|minMaxScaler"
},
combination: {
weights: {
<input-pipeline-name>: <numeric expression>,
<input-pipeline-name>: <numeric expression>,
...
},
method: "avg|expression",
expression: <expression>
}
} }
Fields字段
$scoreFusion takes the following fields:采用以下字段:
input | Object | $scoreFusion combines.$scoreFusion组合的输入。 |
input.pipelines | Object |
|
input.normalization | String |
|
combination | Object | input pipeline results.input管道结果。 |
combination.weights | Object | 1 if any pipeline's weight is unspecified. Each weight value must be a non-negative number (whole or decimal). Weight can be 0.1。每个权重值必须是非负数(整数或小数)。权重可以为0。 |
combination.method | String |
|
combination.expression |
| |
scoreDetails | Boolean | false.false。 |
Behavior行为
Collections集合
You can only use 您只能将$scoreFusion with a single collection. You cannot use this aggregation stage at a database scope.$scoreFusion用于单个集合。您不能在数据库范围内使用此聚合阶段。
De-Duplication删除功能
$scoreFusion de-duplicates the results across multiple input pipelines in the final output. Each unique input document appears at most once in the $scoreFusion output, regardless of the number of times that the document appears in input pipeline outputs.$scoreFusion在最终输出中消除了多个输入管道中的重复结果。每个唯一的输入文档在$scoreFusion输出中最多出现一次,而不管该文档在输入管道输出中出现的次数。
Input Pipelines输入管道
Each 每个input pipeline must be both a Selection Pipeline and a Scoring Pipeline.input管道都必须是选择管道和评分管道。
Selection Pipeline选择管道
A Selection Pipeline retrieves a set of documents from a collection without performing any modifications after retrieval. 选择管道从集合中检索一组文档,检索后不进行任何修改。$scoreFusion compares documents across different input pipelines which requires that all input pipelines output the same unmodified documents.$scoreFusion比较不同输入管道中的文档,这要求所有输入管道输出相同的未修改文档。
A selection pipeline must only contain the following stages:选择管道必须仅包含以下阶段:
| |
Scoring Pipeline评分管道
A scoring pipeline sorts or orders documents based on the score of the documents. 评分管道根据文档的评分对文档进行排序或排序。$scoreFusion uses the order of scored pipeline results to influence the output scores. Scoring pipelines must meet one of the following criteria:$scoreFusion使用评分管道结果的顺序来影响输出分数。评分管道必须符合以下标准之一:
Input Pipeline Names输入管道名称
Pipeline names in input must meet the following restrictions:input中的管道名称必须满足以下限制:
Must not be an empty string不能为空字符串Must not start with a不得以$$开头Must not contain the ASCII null character delimiter字符串中任何地方都不能包含ASCII空字符分隔符\0anywhere in the string\0Must not contain a不允许包含.
scoreDetails
If you set 如果将scoreDetails to true, $scoreFusion creates a scoreDetails metadata field for each document. The scoreDetails field contains information about the final ranking.scoreDetails设置为true,$scoreFusion将为每个文档创建一个scoreDetails元数据字段。scoreDetails字段包含有关最终排名的信息。
Note
When you set 当您将scoreDetails to true, $scoreFusion sets the scoreDetails metadata field for each document. By default, it doesn't automatically output the scoreDetails metafield.scoreDetails设置为true时,$scoreFusion会为每个文档设置scoreDetails元数据字段。默认情况下,它不会自动输出scoreDetails图元字段。
To view the 要查看scoreDetails metadata field, you must explicitly set it through the $meta expression in a stage like $project, $addFields, or $set.scoreDetails元数据字段,您必须通过$project、$addFields或$set等阶段中的$meta表达式显式设置它。
The scoreDetails field contains the following subfields:scoreDetails字段包含以下子字段:
value | |
description | $scoreFusion computed the final score.$scoreFusion如何计算最终分数。 |
normalization | |
combination | |
details |
Each array entry in the details field contains the following subfields:details字段中的每个数组条目都包含以下子字段:
inputPipelineName | |
inputPipelineRawScore | |
weight | |
value | { $meta: 'score' } for this document, value contains { $meta: 'score' }.{ $meta: 'score' },则value包含{ $meta: 'score' }。 |
details | scoreDetails field of the input pipeline. If the input pipeline does not output a scoreDetails field, this field is an empty array.scoreDetails字段。如果输入管道不输出scoreDetails字段,则此字段为空数组。 |
Warning
MongoDB does not guarantee any specific output format for MongoDB不保证scoreDetails.scoreDetails的任何特定输出格式。
Example示例
The following code blocks show the 以下代码块显示了具有scoreDetails field for a $scoreFusion operation with $search, $vectorSearch, and $match input pipelines:$search、$vectorSearch和$match输入管道的$scoreFusion操作的scoreDetails字段:
scoreDetails: {
value: 7.847857250621068,
description: 'the value calculated by combining the scores (either normalized or raw) across input pipelines from which this document is output from:',
normalization: 'sigmoid',
combination: {
method: 'custom expression',
expression: "{ string: { $sum: [ { $multiply: [ '$$searchOne', 10 ] }, '$$searchTwo' ] } }"
},
details: [
{
inputPipelineName: 'searchOne',
inputPipelineRawScore: 0.7987099885940552,
weight: 1,
value: 0.6896984675751023,
details: []
},
{
inputPipelineName: 'searchTwo',
inputPipelineRawScore: 2.9629626274108887,
weight: 1,
value: 0.950872574870045,
details: []
}
]
}Explain Results解释结果
MongoDB converts MongoDB将$scoreFusion operations into a set of existing aggregation stages that, in combination, compute the output result prior to query execution. $scoreFusion操作转换为一组现有的聚合阶段,这些阶段结合在一起,在查询执行之前计算输出结果。The Explain Results for a $scoreFusion operation show the full execution of the underlying aggregation stages that $scoreFusion uses to compose the final result.$scoreFusion操作的“解释结果”显示了$scoreFusion用于组成最终结果的底层聚合阶段的完整执行情况。
Examples示例
This example uses a collection with embeddings and text fields. Create 此示例使用具有嵌入和文本字段的集合。在集合上创建search and vectorSearch type indexes on the collection.search和vectorSearch类型索引。
The following index definition automatically indexes all the dynamically indexable fields in the collection for running 以下索引定义自动为集合中的所有动态可索引字段建立索引,以便对索引字段运行$search queries against the indexed fields.$search查询。
search Index搜索索引
db.embedded_movies.createSearchIndex(
"<INDEX_NAME>",
{
mappings: { dynamic: true }
}
)
The following index definition indexes the field with the embeddings in the collection for running 以下索引定义使用集合中的嵌入对字段进行索引,以便对该字段运行$vectorSearch queries against that field.$vectorSearch查询。
vectorSearch Index矢量搜索索引
db.embedded_movies.createSearchIndex(
"<INDEX_NAME>",
"vectorSearch",
{
"fields": [
{
"type": "vector",
"path": "<FIELD_NAME>",
"numDimensions": <NUMBER_OF_DIMENSIONS>,
"similarity": "dotProduct"
}
]
}
);
The following aggregation pipeline uses 以下聚合管道使用$scoreFusion with the following input pipelines:$scoreFusion和以下输入管道:
searchOne | 20 | vector type for the term specified as embeddings. The query considers up to 500 nearest neighbors, but limits the results to 20 documents.vector类型索引的字段运行向量搜索。查询最多考虑500个最近邻,但将结果限制在20个文档内。 |
searchTwo | 20 |
db.embedded_movies.aggregate( [
{
$scoreFusion: {
input: {
pipelines: {
searchOne: [
{
"$vectorSearch": {
"index": "<INDEX_NAME>",
"path": "<FIELD_NAME>",
"queryVector": <QUERY_EMBEDDINGS>,
"numCandidates": <NUMBER_OF_NEAREST_NEIGHBORS_TO_CONSIDER>,
"limit": <NUBMER_OF_DOCUMENTS_TO_RETURN>
}
}
],
searchTwo: [
{
"$search": {
"index": "<INDEX_NAME>",
"text": {
"query": "<QUERY_TERM>",
"path": "<FIELD_NAME>"
}
}
},
]
},
normalization: "sigmoid"
},
combination: {
method: "expression",
expression: {
$sum: [
{$multiply: [ "$$searchOne", 10]}, "$$searchTwo"
]
}
},
"scoreDetails": true
}
},
{
"$project": {
_id: 1,
title: 1,
plot: 1,
scoreDetails: {"$meta": "scoreDetails"}
}
},
{ $limit: 20 }
] )
This pipeline performs the following actions:此管道执行以下操作:
Executes the执行inputpipelinesinput管道Combines the returned results合并返回的结果Outputs the first 20 documents which are the top 20 ranked results of the输出前20个文档,这些文档是$scoreFusionpipeline$scoreFusion管道排名前20的结果