Text Indexes文本索引

On this page本页内容

Overview概述

To run legacy text search queries, you must have a text index on your collection. 若要运行传统文本搜索查询,您的集合上必须有text索引。MongoDB provides text indexes to support text search queries on string content. MongoDB提供文本索引以支持对字符串内容的文本搜索查询。text indexes can include any field whose value is a string or an array of string elements. 索引可以包括值为字符串或字符串元素数组的任何字段。A collection can only have one text search index, but that index can cover multiple fields.一个集合只能有一个文本搜索索引,但该索引可以覆盖多个字段。

Versions

text Index VersionDescription描述
Version 3MongoDB introduces a version 3 of the text index. MongoDB引入了text索引的版本3。Version 3 is the default version of text indexes created in MongoDB 3.2 and later.版本3是MongoDB 3.2及更高版本中创建的text索引的默认版本。
Version 2MongoDB 2.6 introduces a version 2 of the text index. MongoDB 2.6引入了text索引的版本2。Version 2 is the default version of text indexes created in MongoDB 2.6 and 3.0 series.版本2是MongoDB 2.6和3.0系列中创建的text索引的默认版本。
Version 1MongoDB 2.4 introduces a version 1 of the text index. MongoDB 2.4引入了text索引的版本1。MongoDB 2.4 can only support version 1.MongoDB 2.4只能支持版本1

To override the default version and specify a different version, include the option { "textIndexVersion": <version> } when creating the index.要覆盖默认版本并指定其他版本,请在创建索引时包含选项{ "textIndexVersion": <version> }

Create Text Index创建文本索引

Important重要

A collection can have at most one text index.集合最多只能有一个text索引。

Atlas Search (available in MongoDB Atlas) supports multiple full-text search indexes on a single collection. Atlas Search(在MongoDB Atlas中提供)支持单个集合上的多个全文搜索索引。To learn more, see the Atlas Search documentation.要了解更多信息,请参阅Atlas搜索文档

To create a text index, use the db.collection.createIndex() method. 要创建text索引,请使用db.collection.createIndex()方法。To index a field that contains a string or an array of string elements, include the field and specify the string literal "text" in the index document, as in the following example:要为包含字符串或字符串元素数组的字段编制索引,请在索引文档中包含该字段并指定字符串文字"text",如下例所示:

db.reviews.createIndex( { comments: "text" } )

You can index multiple fields for the text index. 可以为text索引的多个字段编制索引。The following example creates a text index on the fields subject and comments:以下示例在subjectcomments字段上创建text索引:

db.reviews.createIndex(
   {
     subject: "text",
     comments: "text"
   }
 )

A compound index can include text index keys in combination with ascending/descending index keys. 复合索引可以包括text索引键以及升序/降序索引键。For more information, see Compound Index.有关详细信息,请参阅复合索引

In order to drop a text index, use the index name. 要删除text索引,请使用索引名称。See Use the Index Name to Drop a text Index for more information.有关详细信息,请参阅使用索引名称删除text索引

Specify Weights指定权重

For a text index, the weight of an indexed field denotes the significance of the field relative to the other indexed fields in terms of the text search score.对于text索引,索引字段的权重表示该字段相对于其他索引字段在文本搜索分数方面的重要性。

For each indexed field in the document, MongoDB multiplies the number of matches by the weight and sums the results. 对于文档中的每个索引字段,MongoDB将匹配数乘以权重并对结果求和。Using this sum, MongoDB then calculates the score for the document. 然后,MongoDB使用该总和计算文档的得分。See $meta operator for details on returning and sorting by text scores.有关按文本分数返回和排序的详细信息,请参阅$meta运算符。

The default weight is 1 for the indexed fields. 索引字段的默认权重为1。To adjust the weights for the indexed fields, include the weights option in the db.collection.createIndex() method.要调整索引字段的权重,请在db.collection.createIndex()方法中包含weights选项。

For more information using weights to control the results of a text search, see Control Search Results with Weights.有关使用权重控制文本搜索结果的详细信息,请参阅使用权重控制搜索结果

Wildcard Text Indexes通配符文本索引

Note注意

Wildcard Text Indexes are distinct from Wildcard Indexes. 通配符文本索引不同于通配符索引Wildcard indexes cannot support queries using the $text operator.通配符索引不能支持使用$text运算符的查询。

While Wildcard Text Indexes and Wildcard Indexes share the wildcard $** field pattern, they are distinct index types. 虽然通配符文本索引和通配符索引共享通配符$**字段模式,但它们是不同的索引类型。Only Wildcard Text Indexes support the $text operator.只有通配符文本索引支持$text运算符。

When creating a text index on multiple fields, you can also use the wildcard specifier ($**). 在多个字段上创建text索引时,还可以使用通配符说明符($**)。With a wildcard text index, MongoDB indexes every field that contains string data for each document in the collection. 使用通配符文本索引,MongoDB为集合中包含每个文档的字符串数据的每个字段编制索引。The following example creates a text index using the wildcard specifier:以下示例使用通配符说明符创建文本索引:

db.collection.createIndex( { "$**": "text" } )

This index allows for text search on all fields with string content. 此索引允许对所有具有字符串内容的字段进行文本搜索。Such an index can be useful with highly unstructured data if it is unclear which fields to include in the text index or for ad-hoc querying.如果不清楚要在文本索引中包含哪些字段或用于特殊查询,则这种索引对于高度非结构化的数据非常有用。

Wildcard text indexes are text indexes on multiple fields. 通配符文本索引是多个字段上的text索引。As such, you can assign weights to specific fields during index creation to control the ranking of the results. 因此,您可以在创建索引期间为特定字段分配权重,以控制结果的排名。For more information using weights to control the results of a text search, see Control Search Results with Weights.有关使用权重控制文本搜索结果的详细信息,请参阅使用权重控制搜索结果

Wildcard text indexes, as with all text indexes, can be part of a compound indexes. 与所有文本索引一样,通配符文本索引可以是复合索引的一部分。For example, the following creates a compound index on the field a as well as the wildcard specifier:例如,以下内容在字段a和通配符说明符上创建复合索引:

db.collection.createIndex( { a: 1, "$**": "text" } )

As with all compound text indexes, since the a precedes the text index key, in order to perform a $text search with this index, the query predicate must include an equality match conditions a. 与所有复合文本索引一样,由于a位于文本索引键之前,为了使用该索引执行$text搜索,查询谓词必须包含相等匹配条件aFor information on compound text indexes, see Compound Text Indexes.有关复合文本索引的信息,请参阅复合文本索引

Case Insensitivity不区分大小写

Changed in version 3.2.在版本3.2中更改

The version 3 text index supports the common C, simple S, and for Turkish languages, the special T case foldings as specified in Unicode 8.0 Character Database Case Folding.版本3text索引支持通用C、简单S,对于土耳其语,支持Unicode 8.0字符数据库大小写折叠中指定的特殊T大小写折叠。

The case foldings expands the case insensitivity of the text index to include characters with diacritics, such as é and É, and characters from non-Latin alphabets, such as "И" and "и" in the Cyrillic alphabet.大小写折叠扩展了text索引的大小写不敏感特性,以包括带变音符号的字符,如éÉ,以及来自非拉丁字母的字符,例如西里尔字母表中的"И"和"и"。

Version 3 of the text index is also diacritic insensitive. text索引的版本3也不区分重音As such, the index also does not distinguish between é, É, e, and E.因此,该索引也不区分éÉeE

Previous versions of the text index are case insensitive for [A-z] only; i.e. case insensitive for non-diacritics Latin characters only . 以前版本的text索引仅对[A-z]不区分大小写;即,仅对非变音符号拉丁字符不区分大小写。For all other characters, earlier versions of the text index treat them as distinct.对于所有其他字符,早期版本的文本索引将它们视为不同的。

Diacritic Insensitivity变音不敏感

Changed in version 3.2.在版本3.2中更改

With version 3, text index is diacritic insensitive. 对于版本3,text索引不区分重音。That is, the index does not distinguish between characters that contain diacritical marks and their non-marked counterpart, such as é, ê, and e. 也就是说,索引不区分包含变音符号的字符及其未标记的对应字符,例如,éêeMore specifically, the text index strips the characters categorized as diacritics in Unicode 8.0 Character Database Prop List.更具体地说,text索引将Unicode 8.0字符数据库中分类为变音符号的字符剥离。

Version 3 of the text index is also case insensitive to characters with diacritics. As such, the index also does not distinguish between é, É, e, and E.text索引的版本3对带变音符号的字符也不区分大小写。因此,该索引也不区分éÉeE

Previous versions of the text index treat characters with diacritics as distinct.以前版本的text索引将带变音符号的字符视为不同的字符。

Tokenization Delimiters标记化分隔符

Changed in version 3.2.在版本3.2中更改

For tokenization, version 3 text index uses the delimiters categorized under Dash, Hyphen, Pattern_Syntax, Quotation_Mark, Terminal_Punctuation, and White_Space in Unicode 8.0 Character Database Prop List.对于标记化,版本3text索引使用Unicode 8.0字符数据库属性列表中的破折号Dash、连字符HyphenPattern_Syntax、引号Quotation_MarkTerminal_PunctuationWhite_Space下分类的分隔符。

For example, if given a string "Il a dit qu'il «était le meilleur joueur du monde»", the text index treats «, », and spaces as delimiters.例如,如果给定一个字符串"Il a dit qu'il «était le meilleur joueur du monde»",,text索引将«»和空格视为分隔符。

Previous versions of the index treat « as part of the term "«était" and » as part of the term "monde»".该指数的早期版本将«视为术语"«était"的一部分,将»视为术语"monde»"的一部分。

Index Entries索引项

text index tokenizes and stems the terms in the indexed fields for the index entries. 索引对索引项的索引字段中的术语进行标记和词干化。text index stores one index entry for each unique stemmed term in each indexed field for each document in the collection. 索引为集合中每个文档的每个索引字段中的每个唯一词干项存储一个索引项。The index uses simple language-specific suffix stemming.索引使用简单的特定于语言的后缀词干。

Supported Languages and Stop Words支持的语言和停止词

MongoDB supports text search for various languages. MongoDB支持各种语言的文本搜索。text indexes drop language-specific stop words (e.g. in English, the, an, a, and, etc.) and use simple language-specific suffix stemming. 索引删除特定于语言的停止词(如英语中的theanaand等),并使用简单的特定于语言后缀词干。For a list of the supported languages, see Text Search Languages.有关支持的语言列表,请参阅文本搜索语言

If you specify a language value of "none", then the text index uses simple tokenization with no list of stop words and no stemming.如果将语言值指定为"none",则text索引使用简单的标记化,没有停止词列表,也没有词干。

To specify a language for the text index, see Specify a Language for Text Index.要指定text索引的语言,请参阅指定文本索引语言

sparse Property属性

text indexes are always sparse and ignore the sparse option. text索引始终是稀疏的,并会忽略sparse选项。If a document lacks a text index field (or the field is null or an empty array), MongoDB does not add an entry for the document to the text index. 如果文档缺少text索引字段(或字段为null或空数组),MongoDB不会将文档条目添加到text索引中。For inserts, MongoDB inserts the document but does not add to the text index.对于插入,MongoDB插入文档,但不添加到text索引。

For a compound index that includes a text index key along with keys of other types, only the text index field determines whether the index references a document. 对于包含文本索引键和其他类型键的复合索引,只有text索引字段确定索引是否引用文档。The other keys do not determine whether the index references the documents or not.其他键不确定索引是否引用文档。

Restrictions限制

One Text Index Per Collection每个集合一个文本索引

A collection can have at most one text index.集合最多只能有一个text索引。

Atlas Search (available in MongoDB Atlas) supports multiple full-text search indexes on a single collection. Atlas Search(在MongoDB Atlas中提供)支持单个集合上的多个全文搜索索引。To learn more, see the Atlas Search documentation.要了解更多信息,请参阅Atlas搜索文档

Text Search and Hints文本搜索和提示

You cannot use hint() if the query includes a $text query expression.如果查询包含$text查询表达式,则不能使用hint()

Text Index and Sort文本索引和排序

Sort operations cannot obtain sort order from a text index, even from a compound text index; i.e. sort operations cannot use the ordering in the text index.排序操作无法从text索引中获得排序顺序,甚至无法从复合文本索引中获取排序顺序;即排序操作不能使用文本索引中的排序。

Compound Index复合索引

A compound index can include a text index key in combination with ascending/descending index keys. However, these compound indexes have the following restrictions:复合索引可以包括text索引键以及升序/降序索引键。但是,这些复合索引有以下限制:

  • A compound text index cannot include any other special index types, such as multi-key or geospatial index fields.复合text索引不能包括任何其他特殊索引类型,如多键地理空间索引字段。
  • If the compound text index includes keys preceding the text index key, to perform a $text search, the query predicate must include equality match conditions on the preceding keys.如果复合text索引包含text索引键之前的键,则要执行$text搜索,查询谓词必须在前面的键上包含相等匹配条件
  • When creating a compound text index, all text index keys must be listed adjacently in the index specification document.创建复合text索引时,索引规范文档中必须相邻列出所有text索引键。

See also Text Index and Sort for additional limitations.有关其他限制,请参阅文本索引和排序

For an example of a compound text index, see Limit the Number of Entries Scanned.有关复合文本索引的示例,请参阅限制扫描的条目数

Drop a Text Index删除文本索引

To drop a text index, pass the name of the index to the db.collection.dropIndex() method. 要删除text索引,请将索引名称传递给db.collection.dropIndex()方法。To get the name of the index, run the db.collection.getIndexes() method.要获取索引的名称,请运行db.collection.getIndexes()方法。

For information on the default naming scheme for text indexes as well as overriding the default name, see Specify Name for text Index.有关text索引的默认命名方案以及覆盖默认名称的信息,请参阅text索引指定名称

Collation Option排序规则选项

text indexes only support simple binary comparison and do not support collation.索引只支持简单的二进制比较,不支持排序规则

To create a text index on a a collection that has a non-simple collation, you must explicitly specify {collation: {locale: "simple"} } when creating the index.要在具有非简单排序规则的集合上创建text索引,必须在创建索引时显式指定{collation: {locale: "simple"} }

Storage Requirements and Performance Costs存储要求和性能成本

text indexes have the following storage requirements and performance costs:索引具有以下存储要求和性能成本:

  • text indexes can be large. 索引可以很大。They contain one index entry for each unique post-stemmed word in each indexed field for each document inserted.对于插入的每个文档,每个索引字段中的每个唯一词干词都包含一个索引项。
  • Building a text index is very similar to building a large multi-key index and will take longer than building a simple ordered (scalar) index on the same data.构建text索引与构建大型多键索引非常相似,并且需要比在相同数据上构建简单有序(标量)索引更长的时间。
  • When building a large text index on an existing collection, ensure that you have a sufficiently high limit on open file descriptors. 在现有集合上构建大型text索引时,请确保对打开的文件描述符有足够高的限制。See the recommended settings.请参阅推荐设置
  • text indexes will impact insertion throughput because MongoDB must add an index entry for each unique post-stemmed word in each indexed field of each new source document.索引将影响插入吞吐量,因为MongoDB必须为每个新源文档的每个索引字段中的每个唯一词干词添加索引项。
  • Additionally, text indexes do not store phrases or information about the proximity of words in the documents. 此外,text索引不存储短语或文档中单词邻近度的信息。As a result, phrase queries will run much more effectively when the entire collection fits in RAM.因此,当整个集合放入RAM时,短语查询将更有效地运行。

Text Search Support文本搜索支持

The text index supports $text query operations. text索引支持$text查询操作。For examples of text search, see the $text reference page. 有关文本搜索的示例,请参阅$text参考页。For examples of $text operations in aggregation pipelines, see Text Search in the Aggregation Pipeline.有关聚合管道中$text操作的示例,请参阅聚合管道中的文本搜索

←  Multikey Index BoundsSpecify a Language for Text Index →