Database Manual / Reference / Query Language / Aggregation Stages

$scoreFusion (aggregation)(聚合)

The $rankFusion and $scoreFusion stages are available as Preview features. To learn more, see Preview Features.$rankFusion$scoreFusion阶段可作为预览功能使用。要了解更多信息,请参阅预览功能

Important

$scoreFusion is only available for deployments that use MongoDB 8.2+.仅适用于使用MongoDB 8.2+的部署。

Definition定义

$scoreFusion

$scoreFusion first executes all input pipelines independently and then de-duplicates and combines the input pipeline results into a final scored results set.首先独立执行所有输入管道,然后进行重复数据消除,并将输入管道结果组合成最终的评分结果集。

$scoreFusion outputs a ranked set of documents based on the scores of the documents and weights from their input pipelines. You can specify an arithmetic expression to compute the score based on the input scores from the pipeline stage. By default, it uses the average of the scores for the documents from the different input pipeline stages.根据文档的分数和输入管道的权重输出一组排名的文档。您可以指定一个算术表达式,根据流水线阶段的输入分数计算分数。默认情况下,它使用来自不同输入管道阶段的文档的平均分数。

Use $scoreFusion to search for documents in a single collection based on multiple criteria and retrieve a final scored result set that factors in all specified criteria.使用$scoreFusion根据多个条件在单个集合中搜索文档,并检索考虑所有指定条件的最终评分结果集。

Syntax语法

The stage has the following syntax:该阶段具有以下语法:

{ $scoreFusion: {
input: {
pipelines: {
<input-pipeline-name>: <expression>,
<input-pipeline-name>: <expression>,
...
},
normalization: "none|sigmoid|minMaxScaler"
},
combination: {
weights: {
<input-pipeline-name>: <numeric expression>,
<input-pipeline-name>: <numeric expression>,
...
},
method: "avg|expression",
expression: <expression>
}
} }

Fields字段

$scoreFusion takes the following fields:采用以下字段:

Field字段Type类型Description描述
inputObjectDefines the input that $scoreFusion combines.定义$scoreFusion组合的输入。
input.pipelinesObject

Contains a map of pipeline names to the aggregation stages that define that pipeline. 包含管道名称到定义该管道的聚合阶段的映射。input.pipelines must contain at least one pipeline. 必须至少包含一个管道。You must specify $score to the input pipeline if the input pipeline doesn't return a score. 如果输入管道不返回分数,则必须向输入管道指定$scoreAll pipelines must operate on the same collection and must have a unique name.所有管道必须在同一集合上运行,并且必须具有唯一的名称。

For more information on input pipeline restrictions, see Input Pipelines and Input Pipeline Names.有关输入管道限制的更多信息,请参阅输入管道和输入管道名称

input.normalizationString

Normalizes the score to the range 0 to 1 before combining the results. Value can be:在合并结果之前,将分数归一化到01的范围内。值可以是:

  • none - to not normalize.不正常化。
  • sigmoid - to apply $sigmoid expression.应用$sigmoid表达式。
  • minMaxScaler - to apply the $minMaxScaler window operator.应用$minMaxScaler窗口运算符。
combinationObjectOptional. 可选。Defines how to combine the input pipeline results.定义如何组合input管道结果。
combination.weightsObjectOptional. 可选。Weights to apply to the normalized input pipeline scores when combining the results. Corresponds to the input pipelines, one per pipeline. 组合结果时应用于归一化输入管道分数的权重。对应于输入管道,每条管道一个。The default weight is 1 if any pipeline's weight is unspecified. Each weight value must be a non-negative number (whole or decimal). Weight can be 0.如果未指定任何管道的权重,则默认权重为1。每个权重值必须是非负数(整数或小数)。权重可以为0
combination.methodString

Optional. 可选。Specifies method for combining scores. Value can be:指定合并分数的方法。值可以是:

  • avg - to calculate the average of the input scores.以计算输入分数的平均值。
  • expression - to apply a custom aggregation expression that you specify in the combination.expression field.应用在combination.expression字段中指定的自定义聚合表达式。

If omitted, defaults to avg.如果省略,则默认为avg

combination.expressionArithmetic Expression算术表达式

Optional. 可选。Specifies the logic for combining the input scores. This is the custom expression that is used when combination.method is set to expression. Within the expression, use the name of the input pipeline to represent the corresponding input score for a document.指定组合输入分数的逻辑。这是当combination.method设置为expression时使用的自定义表达式。在表达式中,使用输入管道的名称来表示文档的相应输入分数。

Mutually exclusive with combination.weights.combination.weights互斥。

scoreDetailsBooleanOptional. 可选。Specifies whether to include detailed scoring information from each input pipeline in the output document's metadata. If omitted, default to false.指定是否在输出文档的元数据中包含来自每个输入管道的详细评分信息。如果省略,则默认为false

Behavior行为

Collections集合

You can only use $scoreFusion with a single collection. You cannot use this aggregation stage at a database scope.您只能将$scoreFusion用于单个集合。您不能在数据库范围内使用此聚合阶段。

De-Duplication删除功能

$scoreFusion de-duplicates the results across multiple input pipelines in the final output. Each unique input document appears at most once in the $scoreFusion output, regardless of the number of times that the document appears in input pipeline outputs.$scoreFusion在最终输出中消除了多个输入管道中的重复结果。每个唯一的输入文档在$scoreFusion输出中最多出现一次,而不管该文档在输入管道输出中出现的次数。

Input Pipelines输入管道

Each input pipeline must be both a Selection Pipeline and a Scoring Pipeline.每个input管道都必须是选择管道和评分管道。

Selection Pipeline选择管道

A Selection Pipeline retrieves a set of documents from a collection without performing any modifications after retrieval. $scoreFusion compares documents across different input pipelines which requires that all input pipelines output the same unmodified documents.选择管道从集合中检索一组文档,检索后不进行任何修改。$scoreFusion比较不同输入管道中的文档,这要求所有输入管道输出相同的未修改文档。

A selection pipeline must only contain the following stages:选择管道必须仅包含以下阶段:

Type类型Stages阶段
Search Stages搜索阶段
  • $match, including $match with legacy text search $geoNear$match,包括$match传统文本搜索$geoNear
  • $search
  • $vectorSearch

    Note

    If you use $geoNear in a selection pipeline, you cannot specify includeLogs or distanceField because those fields modify documents.如果在选择管道中使用$geoNear,则无法指定includeLogsdistanceField,因为这些字段会修改文档。

Ordering Stages订购阶段
Pagination Stages分页阶段

Scoring Pipeline评分管道

A scoring pipeline sorts or orders documents based on the score of the documents. 评分管道根据文档的评分对文档进行排序或排序。$scoreFusion uses the order of scored pipeline results to influence the output scores. Scoring pipelines must meet one of the following criteria:$scoreFusion使用评分管道结果的顺序来影响输出分数。评分管道必须符合以下标准之一:

  • Begin with one of the following ordered stages:从以下顺序阶段之一开始:

  • Contain an explicit $score stage if the preceding pipeline doesn't inherently return a score.如果前面的管道本身不返回分数,则包含显式的$score阶段。

Input Pipeline Names输入管道名称

Pipeline names in input must meet the following restrictions:input中的管道名称必须满足以下限制:

  • Must not be an empty string不能为空字符串
  • Must not start with a $不得以$开头
  • Must not contain the ASCII null character delimiter \0 anywhere in the string字符串中任何地方都不能包含ASCII空字符分隔符\0
  • Must not contain a 不允许包含.

scoreDetails

If you set scoreDetails to true, $scoreFusion creates a scoreDetails metadata field for each document. The scoreDetails field contains information about the final ranking.如果将scoreDetails设置为true$scoreFusion将为每个文档创建一个scoreDetails元数据字段。scoreDetails字段包含有关最终排名的信息。

Note

When you set scoreDetails to true, $scoreFusion sets the scoreDetails metadata field for each document. By default, it doesn't automatically output the scoreDetails metafield.当您将scoreDetails设置为true时,$scoreFusion会为每个文档设置scoreDetails元数据字段。默认情况下,它不会自动输出scoreDetails图元字段。

To view the scoreDetails metadata field, you must explicitly set it through the $meta expression in a stage like $project, $addFields, or $set.要查看scoreDetails元数据字段,您必须通过$project$addFields$set等阶段中的$meta表达式显式设置它。

The scoreDetails field contains the following subfields:scoreDetails字段包含以下子字段:

Field字段Description描述
valueThe numerical value of the score for this document.此文档的分数数值。
descriptionA description of how $scoreFusion computed the final score.描述$scoreFusion如何计算最终分数。
normalizationThe normalization method used to normalize the score.用于对分数进行归一化的归一化方法。
combinationThe combination method and expression used to combine the pipeline results.用于组合流水线结果的组合方法和表达式。
detailsAn array where each array entry contains information about the input pipelines that output this document.一个数组,其中每个数组条目都包含有关输出此文档的输入管道的信息。

Each array entry in the details field contains the following subfields:details字段中的每个数组条目都包含以下子字段:

Field字段Description描述
inputPipelineNameThe name of the input pipeline that output this document.输出此文档的输入管道的名称。
inputPipelineRawScoreThe score of the document from the pipeline before normalization.规范化前管道中文档的得分。
weightThe weight of the input pipeline.输入管道的重量。
valueOptional. 可选。If the input pipeline outputs a { $meta: 'score' } for this document, value contains { $meta: 'score' }.如果输入管道为此文档输出一个{ $meta: 'score' },则value包含{ $meta: 'score' }
detailsThe scoreDetails field of the input pipeline. If the input pipeline does not output a scoreDetails field, this field is an empty array.输入管道的scoreDetails字段。如果输入管道不输出scoreDetails字段,则此字段为空数组。

Warning

MongoDB does not guarantee any specific output format for scoreDetails.MongoDB不保证scoreDetails的任何特定输出格式。

Example示例

The following code blocks show the scoreDetails field for a $scoreFusion operation with $search, $vectorSearch, and $match input pipelines:以下代码块显示了具有$search$vectorSearch$match输入管道的$scoreFusion操作的scoreDetails字段:

  scoreDetails: {
value: 7.847857250621068,
description: 'the value calculated by combining the scores (either normalized or raw) across input pipelines from which this document is output from:',
normalization: 'sigmoid',
combination: {
method: 'custom expression',
expression: "{ string: { $sum: [ { $multiply: [ '$$searchOne', 10 ] }, '$$searchTwo' ] } }"
},
details: [
{
inputPipelineName: 'searchOne',
inputPipelineRawScore: 0.7987099885940552,
weight: 1,
value: 0.6896984675751023,
details: []
},
{
inputPipelineName: 'searchTwo',
inputPipelineRawScore: 2.9629626274108887,
weight: 1,
value: 0.950872574870045,
details: []
}
]
}

Explain Results解释结果

MongoDB converts $scoreFusion operations into a set of existing aggregation stages that, in combination, compute the output result prior to query execution. MongoDB将$scoreFusion操作转换为一组现有的聚合阶段,这些阶段结合在一起,在查询执行之前计算输出结果。The Explain Results for a $scoreFusion operation show the full execution of the underlying aggregation stages that $scoreFusion uses to compose the final result.$scoreFusion操作的“解释结果”显示了$scoreFusion用于组成最终结果的底层聚合阶段的完整执行情况。

Examples示例

This example uses a collection with embeddings and text fields. Create search and vectorSearch type indexes on the collection.此示例使用具有嵌入和文本字段的集合。在集合上创建searchvectorSearch类型索引。

The following index definition automatically indexes all the dynamically indexable fields in the collection for running $search queries against the indexed fields.以下索引定义自动为集合中的所有动态可索引字段建立索引,以便对索引字段运行$search查询。

search Index搜索索引

db.embedded_movies.createSearchIndex(
"<INDEX_NAME>",
{
mappings: { dynamic: true }
}
)

The following index definition indexes the field with the embeddings in the collection for running $vectorSearch queries against that field.以下索引定义使用集合中的嵌入对字段进行索引,以便对该字段运行$vectorSearch查询。

vectorSearch Index矢量搜索索引

db.embedded_movies.createSearchIndex(
"<INDEX_NAME>",
"vectorSearch",
{
"fields": [
{
"type": "vector",
"path": "<FIELD_NAME>",
"numDimensions": <NUMBER_OF_DIMENSIONS>,
"similarity": "dotProduct"
}
]
}
);

The following aggregation pipeline uses $scoreFusion with the following input pipelines:以下聚合管道使用$scoreFusion和以下输入管道:

Pipeline管道Number of Documents Returned退回的文件数量Description描述
searchOne20Runs a vector search on the field indexed as vector type for the term specified as embeddings. The query considers up to 500 nearest neighbors, but limits the results to 20 documents.对作为嵌入项指定的vector类型索引的字段运行向量搜索。查询最多考虑500个最近邻,但将结果限制在20个文档内。
searchTwo20Runs a full-text search for the same term and limits the results to 20 documents.对同一术语运行全文搜索,并将结果限制为20个文档。
db.embedded_movies.aggregate( [
{
$scoreFusion: {
input: {
pipelines: {
searchOne: [
{
"$vectorSearch": {
"index": "<INDEX_NAME>",
"path": "<FIELD_NAME>",
"queryVector": <QUERY_EMBEDDINGS>,
"numCandidates": <NUMBER_OF_NEAREST_NEIGHBORS_TO_CONSIDER>,
"limit": <NUBMER_OF_DOCUMENTS_TO_RETURN>
}
}
],
searchTwo: [
{
"$search": {
"index": "<INDEX_NAME>",
"text": {
"query": "<QUERY_TERM>",
"path": "<FIELD_NAME>"
}
}
},
]
},
normalization: "sigmoid"
},
combination: {
method: "expression",
expression: {
$sum: [
{$multiply: [ "$$searchOne", 10]}, "$$searchTwo"
]
}
},
"scoreDetails": true
}
},
{
"$project": {
_id: 1,
title: 1,
plot: 1,
scoreDetails: {"$meta": "scoreDetails"}
}
},
{ $limit: 20 }
] )

This pipeline performs the following actions:此管道执行以下操作:

  • Executes the input pipelines执行input管道
  • Combines the returned results合并返回的结果
  • Outputs the first 20 documents which are the top 20 ranked results of the $scoreFusion pipeline输出前20个文档,这些文档是$scoreFusion管道排名前20的结果