You are looking at documentation for an older release.
Not what you want? See the
current release documentation.
NGram Tokenizeredit
A tokenizer of type nGram
.
The following are settings that can be set for a nGram
tokenizer type:
Setting | Description | Default value |
---|---|---|
| Minimum size in codepoints of a single n-gram |
|
| Maximum size in codepoints of a single n-gram |
|
| Characters classes to keep in the tokens, Elasticsearch will split on characters that don’t belong to any of these classes. |
|
token_chars
accepts the following character classes:
|
for example |
|
for example |
|
for example |
|
for example |
|
for example |
Exampleedit
curl -XPUT 'localhost:9200/test' -d ' { "settings" : { "analysis" : { "analyzer" : { "my_ngram_analyzer" : { "tokenizer" : "my_ngram_tokenizer" } }, "tokenizer" : { "my_ngram_tokenizer" : { "type" : "nGram", "min_gram" : "2", "max_gram" : "3", "token_chars": [ "letter", "digit" ] } } } } }' curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_ngram_analyzer' -d 'FC Schalke 04' # FC, Sc, Sch, ch, cha, ha, hal, al, alk, lk, lke, ke, 04