$text

On this page本页内容

MongoDB Atlas SearchMongoDB Atlas搜索

Atlas Search makes it easy to build fast, relevance-based search capabilities on top of your MongoDB data. Atlas Search可以轻松地在MongoDB数据的基础上构建快速、基于相关性的搜索功能。Try it today on MongoDB Atlas, our fully managed database as a service.今天就在MongoDB Atlas上试试吧,这是我们全面管理的数据库即服务。

Definition定义

$text

$text performs a text search on the content of the fields indexed with a text index. 对使用文本索引索引的字段的内容执行文本搜索。A $text expression has the following syntax:$text语法如下所示:

Changed in version 3.2.在版本3.2中更改。

{
  $text:
    {
      $search: <string>,
      $language: <string>,
      $caseSensitive: <boolean>,
      $diacriticSensitive: <boolean>
    }
}

The $text operator accepts a text query document with the following fields:$text运算符接受包含以下字段的文本查询文档:

Field字段Type类型Description描述
$search string A string of terms that MongoDB parses and uses to query the text index. MongoDB解析并用于查询文本索引的术语字符串。MongoDB performs a logical OR search of the terms unless specified as a phrase. MongoDB对术语执行逻辑OR搜索,除非指定为短语。See Behavior for more information on the field.有关该字段的更多信息,请参阅行为
$language string

Optional.可选。The language that determines the list of stop words for the search and the rules for the stemmer and tokenizer. 确定搜索停止词列表以及词干分析器和标记器规则的语言。If not specified, the search uses the default language of the index. For supported languages, see Text Search Languages.如果未指定,搜索将使用索引的默认语言。有关支持的语言,请参阅文本搜索语言

If you specify a language value of "none", then the text search uses simple tokenization with no list of stop words and no stemming.如果将语言值指定为"none",则文本搜索使用简单的标记化,没有停止词列表,也没有词干。

$caseSensitive boolean

Optional.可选。A boolean flag to enable or disable case sensitive search. 用于启用或禁用区分大小写搜索的布尔标志Defaults to false; i.e. the search defers to the case insensitivity of the text index.。默认为false;亦即,搜索取决于文本索引的大小写不敏感。

For more information, see Case Insensitivity.有关更多信息,请参阅不区分大小写

New in version 3.2.版本3.2中的新功能。

$diacriticSensitive boolean

Optional.可选。A boolean flag to enable or disable diacritic sensitive search against version 3 text indexes. 一个布尔标志,用于启用或禁用针对版本3文本索引的区分重音的搜索。Defaults to false; i.e. the search defers to the diacritic insensitivity of the text index.默认为false;即,搜索遵循text索引的不区分重音的原则。

Text searches against earlier versions of the text index are inherently diacritic sensitive and cannot be diacritic insensitive. 针对早期版本文本索引的文本搜索本质上是区分重音的,不能区分重音。As such, the $diacriticSensitive option has no effect with earlier versions of the text index.因此,$diacriticSensitive选项对早期版本的文本索引没有影响。

For more information, see Diacritic Insensitivity.有关更多信息,请参阅变音不敏感

New in version 3.2.版本3.2中的新功能。

The $text operator, by default, does not return results sorted in terms of the results’ scores. 默认情况下,$text运算符不返回按结果分数排序的结果。For more information on sorting by the text search scores, see the Text Score documentation.有关按文本搜索分数排序的更多信息,请参阅文本分数文档。

Behavior行为

Restrictions限制

  • A query can specify, at most, one $text expression.查询最多可以指定一个$text表达式。
  • The $text query can not appear in $nor expressions.$text查询不能出现在$nor表达式中。
  • The $text query can not appear in $elemMatch query expressions or $elemMatch projection expressions.$text查询不能出现在$elemMatch查询表达式或$elemMatch投影表达式中。
  • To use a $text query in an $or expression, all clauses in the $or array must be indexed.若要在$or表达式中使用$text查询,必须为$or数组中的所有子句编制索引。
  • You cannot use hint() if the query includes a $text query expression.如果查询包含$text查询表达式,则不能使用hint()
  • You cannot specify $natural sort order if the query includes a $text expression.如果查询包含$text表达式,则不能指定$natural排序顺序。
  • You cannot combine the $text expression, which requires a special text index, with a query operator that requires a different type of special index. 不能将需要特殊文本索引$text表达式与需要不同类型特殊索引的查询运算符组合使用。For example you cannot combine $text expression with the $near operator.例如,不能将$text表达式与$near运算符组合使用。
  • Views do not support text search.视图不支持文本搜索。

If using the $text operator in aggregation, the following restrictions also apply.如果在聚合中使用$text运算符,则以下限制也适用。

  • The $match stage that includes a $text must be the first stage in the pipeline.包含$text$match阶段必须是管道中的第一个阶段。
  • A text operator can only occur once in the stage.text运算符只能在阶段中出现一次。
  • The text operator expression cannot appear in $or or $not expressions.text运算符表达式不能出现在$or$not表达式中。
  • The text search, by default, does not return the matching documents in order of matching scores. 默认情况下,文本搜索不会按匹配分数的顺序返回匹配文档。To sort by descending score, use the $meta aggregation expression in the $sort stage.要按分数降序排序,请在$sort阶段使用$meta聚合表达式。

$search Field字段

In the $search field, specify a string of words that the text operator parses and uses to query the text index.$search字段中,指定text运算符解析并用于查询文本索引的一串单词。

The text operator treats most punctuation in the string as delimiters, except a hyphen-minus (-) that negates term or an escaped double quotes \" that specifies a phrase.text运算符将字符串中的大多数标点符号视为分隔符,但用于否定术语的连字符减号(-)或用于指定短语的转义双引号\"除外。

Phrases短语

To match on a phrase, as opposed to individual terms, enclose the phrase in escaped double quotes (\"), as in:要匹配短语,而不是单个术语,请将短语括在转义双引号(\")中,如下所示:

"\"ssl certificate\""

If the $search string includes a phrase and individual terms, text search will only match the documents that include the phrase.如果$search字符串包含短语和单个术语,则文本搜索将仅匹配包含该短语的文档。

For example, passed a $search string:例如,传递了$search字符串:

"\"ssl certificate\" authority key"

The $text operator searches for the phrase "ssl certificate".$text运算符搜索短语"ssl certificate"

Negations否定

Prefixing a word with a hyphen-minus (-) negates a word:在单词前面加上连字符减号(-)将否定单词:

  • The negated word excludes documents that contain the negated word from the result set.否定词从结果集中排除包含否定词的文档。
  • When passed a search string that only contains negated words, text search will not match any documents.当传递的搜索字符串仅包含否定词时,文本搜索将不匹配任何文档。
  • A hyphenated word, such as pre-market, is not a negation. 带连字符的单词(如pre-market)不是否定词。If used in a hyphenated word, $text operator treats the hyphen-minus (-) as a delimiter. 如果在带连字符的单词中使用,$text运算符将连字符减号(-)视为分隔符。To negate the word market in this instance, include a space between pre and -market, i.e., pre -market.在这种情况下,要否定单词market,请在pre-market之间加一个空格,即pre -market

The $text operator adds all negations to the query with the logical AND operator.$text运算符使用逻辑AND运算符将所有否定添加到查询中。

Match Operation匹配操作

Stop Words停止文字

The $text operator ignores language-specific stop words, such as the and and in English.$text运算符忽略特定语言的停止词,例如英语中的theand

Stemmed Words词干词

For case insensitive and diacritic insensitive text searches, the $text operator matches on the complete stemmed word. 对于不区分大小写和不区分重音的文本搜索,$text运算符匹配完整的词干单词。So if a document field contains the word blueberry, a search on the term blue will not match. 因此,如果文档字段包含blueberry一词,则对blue一词的搜索将不匹配。However, blueberry or blueberries will match.然而,blueberryblueberrys会相配。

Case Sensitive Search and Stemmed Words区分大小写的搜索和词干词

For case sensitive search (i.e. $caseSensitive: true), if the suffix stem contains uppercase letters, the $text operator matches on the exact word.对于区分大小写的搜索(即$caseSensitive: true),如果后缀词干包含大写字母,则$text运算符将匹配确切的单词。

Diacritic Sensitive Search and Stemmed Words区分重音的搜索和词干词

For diacritic sensitive search (i.e. $diacriticSensitive: true), if the suffix stem contains the diacritic mark or marks, the $text operator matches on the exact word.对于区分变音符号的搜索(即$diacriticSensitive: true),如果后缀词干包含一个或多个变音符号,则$text运算符将匹配准确的单词。

Case Insensitivity大小写不敏感

Changed in version 3.2.已在版本3.2中更改。

The $text operator defaults to the case insensitivity of the text index:$text运算符默认为文本索引的大小写不敏感:

  • The version 3 text index is case insensitive for Latin characters with or without diacritics and characters from non-Latin alphabets, such as the Cyrillic alphabet. 版本3文本索引对带有或不带变音符号的拉丁字符以及来自非拉丁字母(如西里尔字母)的字符不区分大小写。See text index for details.有关详细信息,请参阅text索引。
  • Earlier versions of the text index are case insensitive for Latin characters without diacritic marks; i.e. for [A-z].早期版本的text索引对没有变音符号的拉丁字符不区分大小写;例如,对于[A-z]

$caseSensitive Option选项

To support case sensitive search where the text index is case insensitive, specify $caseSensitive: true.若要在text索引不区分大小写的情况下支持区分大小写的搜索,请指定$caseSensitive:true

Case Sensitive Search Process区分大小写的搜索过程

When performing a case sensitive search ($caseSensitive: true) where the text index is case insensitive, the $text operator:text索引不区分大小写的情况下执行区分大小写的搜索($caseSensitive: true)时,$text运算符:

  • First searches the text index for case insensitive and diacritic matches.首先在text索引中搜索不区分大小写和区分重音的匹配项。
  • Then, to return just the documents that match the case of the search terms, the $text query operation includes an additional stage to filter out the documents that do not match the specified case.然后,为了只返回与搜索词大小写匹配的文档,$text查询操作还包括一个额外的阶段,以筛选出与指定大小写不匹配的文档。

For case sensitive search (i.e. $caseSensitive: true), if the suffix stem contains uppercase letters, the $text operator matches on the exact word.对于区分大小写的搜索(即$caseSensitive: true),如果后缀词干包含大写字母,则$text运算符将匹配确切的单词。

Specifying $caseSensitive: true may impact performance.指定$caseSensitive: true可能会影响性能。

See also参阅

Stemmed Words

Diacritic Insensitivity变音不敏感

Changed in version 3.2.已在版本3.2中更改。

The $text operator defaults to the diacritic insensitivity of the text index:$text运算符默认为text索引的不区分重音:

  • The version 3 text index is diacritic insensitive. 版本3文本索引不区分重音。That is, the index does not distinguish between characters that contain diacritical marks and their non-marked counterpart, such as é, ê, and e.也就是说,索引不区分包含变音标记的字符和未标记的对应字符,例如,éêe
  • Earlier versions of the text index are diacritic sensitive.早期版本的text索引区分重音。

$diacriticSensitive Option选项

To support diacritic sensitive text search against the version 3 text index, specify $diacriticSensitive: true.要支持对版本3文本索引进行区分重音的text搜索,请指定$diacriticSensitive:true

Text searches against earlier versions of the text index are inherently diacritic sensitive and cannot be diacritic insensitive. 针对早期版本的text索引的文本搜索本质上是区分重音的,不能不区分重音。As such, the $diacriticSensitive option for the $text operator has no effect with earlier versions of the text index.因此,$text运算符的$diacriticSensitive选项对早期版本的text索引没有影响。

Diacritic Sensitive Search Process变音敏感搜索过程

To perform a diacritic sensitive text search ($diacriticSensitive: true) against a version 3 text index, the $text operator:要对版本3文本索引执行区分重音的text搜索($diacriticSensitive : true),请使用$text运算符:

  • First searches the text index, which is diacritic insensitive.首先搜索不区分重音的text索引。
  • Then, to return just the documents that match the diacritic marked characters of the search terms, the $text query operation includes an additional stage to filter out the documents that do not match.然后,为了只返回与搜索词的带变音符号的字符匹配的文档,$text查询操作还包括一个额外的阶段来筛选不匹配的文档。

Specifying $diacriticSensitive: true may impact performance.指定$diacriticSensitive :true可能会影响性能。

To perform a diacritic sensitive search against an earlier version of the text index, the $text operator searches the text index which is diacritic sensitive.要对早期版本的text索引执行区分重音的搜索,$text运算符将搜索区分重音的text索引。

For diacritic sensitive search, if the suffix stem contains the diacritic mark or marks, the $text operator matches on the exact word.对于区分变音符号的搜索,如果后缀词干包含一个或多个变音符号,则$text运算符将匹配确切的单词。

See also参阅

Stemmed Words词干词

Text Score文本分数

The $text operator assigns a score to each document that contains the search term in the indexed fields. $text运算符为索引字段中包含搜索项的每个文档分配一个分数。The score represents the relevance of a document to a given text search query. 分数表示文档与给定文本搜索查询的相关性。The score can be part of a sort() method specification as well as part of the projection expression. 分数可以是sort()方法规范的一部分,也可以是投影表达式的一部分。The { $meta: "textScore" } expression provides information on the processing of the $text operation. { $meta: "textScore" }表达式提供有关处理$text操作的信息。See $meta projection operator for details on accessing the score for projection or sort.有关访问投影或排序分数的详细信息,请参阅$meta投影运算符。

Examples示例

The following examples assume a collection articles that has a version 3 text index on the field subject:以下示例假定某个集合articles在字段subject上具有版本3文本索引:

db.articles.createIndex( { subject: "text" } )

Populate the collection with the following documents:使用以下文档填充集合:

db.articles.insert(
   [
     { _id: 1, subject: "coffee", author: "xyz", views: 50 },
     { _id: 2, subject: "Coffee Shopping", author: "efg", views: 5 },
     { _id: 3, subject: "Baking a cake", author: "abc", views: 90  },
     { _id: 4, subject: "baking", author: "xyz", views: 100 },
     { _id: 5, subject: "Café Con Leche", author: "abc", views: 200 },
     { _id: 6, subject: "Сырники", author: "jkl", views: 80 },
     { _id: 7, subject: "coffee and cream", author: "efg", views: 10 },
     { _id: 8, subject: "Cafe con Leche", author: "xyz", views: 10 }
   ]
)

Search for a Single Word搜索一个单词

The following query specifies a $search string of coffee:以下查询指定了一个$search字符串coffee

db.articles.find( { $text: { $search: "coffee" } } )

This query returns the documents that contain the term coffee in the indexed subject field, or more precisely, the stemmed version of the word:此查询返回索引subject字段中包含术语coffee的文档,或者更准确地说,返回单词的词干版本:

{ "_id" : 2, "subject" : "Coffee Shopping", "author" : "efg", "views" : 5 }
{ "_id" : 7, "subject" : "coffee and cream", "author" : "efg", "views" : 10 }
{ "_id" : 1, "subject" : "coffee", "author" : "xyz", "views" : 50 }

Match Any of the Search Terms匹配任何搜索词

If the search string is a space-delimited string, $text operator performs a logical OR search on each term and returns documents that contains any of the terms.如果搜索字符串是空格分隔的字符串,$text运算符对每个术语执行逻辑OR搜索,并返回包含任何术语的文档。

The following query specifies a $search string of three terms delimited by space, "bake coffee cake":下面的查询指定了一个$search字符串,由三个由空格分隔的词组成,"bake coffee cake"

db.articles.find( { $text: { $search: "bake coffee cake" } } )

This query returns documents that contain either bake or coffee or cake in the indexed subject field, or more precisely, the stemmed version of these words:此查询返回索引subject字段中包含bakecoffeecake的文档,或者更准确地说,返回这些单词的词干版本:

{ "_id" : 2, "subject" : "Coffee Shopping", "author" : "efg", "views" : 5 }
{ "_id" : 7, "subject" : "coffee and cream", "author" : "efg", "views" : 10 }
{ "_id" : 1, "subject" : "coffee", "author" : "xyz", "views" : 50 }
{ "_id" : 3, "subject" : "Baking a cake", "author" : "abc", "views" : 90 }
{ "_id" : 4, "subject" : "baking", "author" : "xyz", "views" : 100 }

Search for a Phrase搜索短语

To match the exact phrase as a single term, escape the quotes.要将准确的短语作为一个术语进行匹配,请跳过引号。

The following query searches for the phrase coffee shop:以下查询将搜索短语coffee shop

db.articles.find( { $text: { $search: "\"coffee shop\"" } } )

This query returns documents that contain the phrase coffee shop:此查询返回包含短语coffee shop的文档:

{ "_id" : 2, "subject" : "Coffee Shopping", "author" : "efg", "views" : 5 }

See also参阅

Phrases短语

Exclude Documents That Contain a Term排除包含术语的文档

A negated term is a term that is prefixed by a minus sign -. 否定项是以减号-为前缀的项。If you negate a term, the $text operator will exclude the documents that contain those terms from the results.如果对某个术语取反,则$text运算符将从结果中排除包含这些术语的文档。

The following example searches for documents that contain the words coffee but do not contain the term shop, or more precisely the stemmed version of the words:以下示例搜索包含单词coffee但不包含术语shop的文档,或者更准确地说,搜索单词的词干版本:

db.articles.find( { $text: { $search: "coffee -shop" } } )

The query returns the following documents:查询返回以下文档:

{ "_id" : 7, "subject" : "coffee and cream", "author" : "efg", "views" : 10 }
{ "_id" : 1, "subject" : "coffee", "author" : "xyz", "views" : 50 }

Search a Different Language搜索其他语言

Use the optional $language field in the $text expression to specify a language that determines the list of stop words and the rules for the stemmer and tokenizer for the search string.使用$text表达式中的可选$language字段指定一种语言,该语言确定停止词列表以及搜索字符串的词干分析器和标记器规则。

If you specify a language value of "none", then the text search uses simple tokenization with no list of stop words and no stemming.如果指定语言值"none",则文本搜索使用简单的标记化,没有停止词列表,也没有词干。

The following query specifies es, i.e. Spanish, as the language that determines the tokenization, stemming, and stop words:以下查询指定es(即西班牙语)作为确定标记化、词干和停止词的语言:

db.articles.find(
   { $text: { $search: "leche", $language: "es" } }
)

The query returns the following documents:查询返回以下文档:

{ "_id" : 5, "subject" : "Café Con Leche", "author" : "abc", "views" : 200 }
{ "_id" : 8, "subject" : "Cafe con Leche", "author" : "xyz", "views" : 10 }

The $text expression can also accept the language by name, spanish. $text表达式也可以接受名称为spanish的语言。See Text Search Languages for the supported languages.有关支持的语言,请参阅文本搜索语言

Text Search Score Examples文本搜索分数示例

Return the Text Search Score返回文本搜索分数

The following query performs a text search for the term cake and uses the $meta operator in the projection document to append the relevance score to each matching document:以下查询对术语cake执行文本搜索,并在投影文档中使用$meta运算符将相关性得分附加到每个匹配文档中:

db.articles.find(
   { $text: { $search: "cake" } },
   { score: { $meta: "textScore" } }
)

The returned document includes an additional field score that contains the document’s relevance score:返回的文档包括一个额外的字段score,该分数包含文档的相关性分数:

{ "_id" : 3, "subject" : "Baking a cake", "author" : "abc", "views" : 90, "score" : 0.75 }

See also参阅

$meta

Sort by Text Search Score按文本搜索分数排序

  • Starting in MongoDB 4.4, you can specify the { $meta: "textScore" } expression in the sort() without also specifying the expression in the projection. 从MongoDB 4.4开始,您可以在sort()中指定{ $meta: "textScore" }表达式,而无需在投影中指定表达式。For example,例如

    db.articles.find(
       { $text: { $search: "cake" } }
    ).sort( { score: { $meta: "textScore" } } )

    As a result, you can sort the resulting documents by their search relevance without projecting the textScore.因此,您可以根据搜索相关性对结果文档进行排序,而无需投影textScore

    In earlier versions, to include { $meta: "textScore" } expression in the sort(), you must also include the same expression in the projection.在早期版本中,要在sort()中包含{ $meta: "textScore" }表达式,还必须在投影中包含相同的表达式。
  • Starting in MongoDB 4.4, if you include the { $meta: "textScore" } expression in both the projection and sort(), the projection and sort documents can have different field names for the expression.从MongoDB 4.4开始,如果在projectionsort()中都包含{ $meta: "textScore" }表达式,那么投影文档和排序文档可以有不同的表达式字段名。

    For example, in the following operation, the projection uses a field named score for the expression and the sort() uses the field named ignoredName.例如,在以下操作中,投影使用名为score的字段作为表达式,而sort()使用名为ignoredName的字段。
    db.articles.find(
       { $text: { $search: "cake" } } ,
       { score: { $meta: "textScore" } }
    ).sort( { ignoredName: { $meta: "textScore" } } )

    In previous versions of MongoDB, if { $meta: "textScore" } is included in both the projection and sort, you must specify the same field name for the expression.在MongoDB的早期版本中,如果投影和排序中都包含{ $meta: "textScore" },则必须为表达式指定相同的字段名。

  • In MongoDB 4.2 and earlier, to sort by the text score, include the same $meta expression in both the projection document and the sort expression. 在MongoDB 4.2及更早版本中,为了按文本分数排序,在投影文档和排序表达式中都包含相同的$meta表达式。The following query searches for the term coffee and sorts the results by the descending score:以下查询将搜索coffee一词,并按递减分数对结果进行排序:

    db.articles.find(
       { $text: { $search: "coffee" } },
       { score: { $meta: "textScore" } }
    ).sort( { score: { $meta: "textScore" } } )

    The query returns the matching documents sorted by descending score.查询返回按分数降序排序的匹配文档。

See also参阅

$meta

Return Top 2 Matching Documents返回前2个匹配文档

Use the limit() method in conjunction with a sort() to return the top n matching documents.结合使用limit()方法和sort()返回前n个匹配文档。

The following query searches for the term coffee and sorts the results by the descending score, limiting the results to the top two matching documents:以下查询将搜索coffee一词,并按递减分数对结果进行排序,将结果限制在前两个匹配的文档中:

db.articles.find(
   { $text: { $search: "coffee" } },
   { score: { $meta: "textScore" } }
).sort( { score: { $meta: "textScore" } } ).limit(2)

See also参阅

$meta

Text Search with Additional Query and Sort Expressions使用其他查询和排序表达式进行文本搜索

The following query searches for documents where the author equals "xyz" and the indexed field subject contains the terms coffee or bake. 下面的查询搜索author等于"xyz"且索引字段subject包含术语coffeebake的文档。The operation also specifies a sort order of ascending date, then descending text search score:该操作还指定了date升序、文本搜索分数降序的排序顺序:

db.articles.find(
   { author: "xyz", $text: { $search: "coffee bake" } },
   { score: { $meta: "textScore" } }
).sort( { date: 1, score: { $meta: "textScore" } } )