首頁(yè) 猿問 elasticsearch...

elasticsearch 全文搜索 match_phrase_prefix 查詢中的 max_expansions 該怎么用?

Java

德瑪西亞99 2019-01-16 12:58:46

首先說明，我使用的是 elasticsearch 6.3 版本,下面來(lái)描述問題：官方文檔中說 match_phrase_prefix 查詢中有個(gè)參數(shù) max_expansions 說的是參數(shù) max_expansions 控制著可以與前綴匹配的詞的數(shù)量，默認(rèn)值是 50。以 I like swi 查詢?yōu)槔?，它?huì)先查找第一個(gè)與前綴 swi 匹配的詞，然后依次查找搜集與之匹配的詞（按字母順序），直到?jīng)]有更多可匹配的詞或當(dāng)數(shù)量超過 max_expansions 時(shí)結(jié)束。但是我在使用時(shí)，故意造出了數(shù)十個(gè)以 swi 開頭的詞，而將 max_expansions 的值設(shè)為 10。但是卻返回了所有的結(jié)果。如果您知道原因，麻煩告訴我，非常感謝。 GET matchphaseprefixtest/_search { "query": { "match_phrase_prefix": { "message": { "query": "I like sw", "max_expansions": 10 } } } }

查看完整描述

1 回答

一只萌萌小番薯

TA貢獻(xiàn)1795條經(jīng)驗(yàn) 獲得超7個(gè)贊

~~你的queryDSL寫的是match, 改成match_phrase_prefix~~

How to Use Fuzzy Searches in Elasticsearch 這篇博客里有這么一段話

It is important to understand that the max_expansions query limit works at the shard level, meaning that even if set to 1, multiple terms may match, all coming from different shards. This behavior can make it seem as if max_expansions is not in effect, so beware that counting unique terms that come are returned is not a valid way to determine if max_expansions is working.

大意是說max_expansions是作用在分片級(jí)別（shard level）的，這意味著即使設(shè)置為1，依然有可能匹配到多個(gè)詞，這些詞來(lái)自不同的分片（shards）。這種行為使得結(jié)果看起來(lái)跟max_expansions沒生效一樣，因此謹(jǐn)記計(jì)算返回搜索結(jié)果的關(guān)鍵詞數(shù)量不能作為檢驗(yàn)max_expansions是否生效的方法

博客講的是fuzzy query, 但從測(cè)試結(jié)果看match phrase prefix query中此參數(shù)的作用方式也是如此的

驗(yàn)證

通過指定routing從而使結(jié)果全部來(lái)自一個(gè)分片來(lái)驗(yàn)證，不過需要注意的是，該分片只有一部分文檔，除非索引文檔的時(shí)候全部指定同一個(gè)routing

GET matchphaseprefixtest/_search?routing=1
{
  "query": {
    "match_phrase_prefix": {
      "message": {
        "query": "I like sw",
        "max_expansions": 10
       }
    }
  }
}

反對(duì) 回復(fù) 2019-02-12