Edge NGram Tokenizeredit
A tokenizer of type edgeNGram.
This tokenizer is very similar to nGram but only keeps n-grams which
start at the beginning of a token.
The following are settings that can be set for a edgeNGram tokenizer
type:
| Setting | Description | Default value |
|---|---|---|
| Minimum size in codepoints of a single n-gram |
|
| Maximum size in codepoints of a single n-gram |
|
| Characters classes to keep in the tokens, Elasticsearch will split on characters that don’t belong to any of these classes. |
|
token_chars accepts the following character classes:
|
|
for example |
|
|
for example |
|
|
for example |
|
|
for example |
|
|
for example |
Exampleedit
curl -XPUT 'localhost:9200/test' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"my_edge_ngram_analyzer" : {
"tokenizer" : "my_edge_ngram_tokenizer"
}
},
"tokenizer" : {
"my_edge_ngram_tokenizer" : {
"type" : "edgeNGram",
"min_gram" : "2",
"max_gram" : "5",
"token_chars": [ "letter", "digit" ]
}
}
}
}
}'
curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_edge_ngram_analyzer' -d 'FC Schalke 04'
# FC, Sc, Sch, Scha, Schal, 04side deprecatededit
There used to be a side parameter up to 0.90.1 but it is now deprecated. In
order to emulate the behavior of "side" : "BACK" a
reverse token filter should be used together
with the edgeNGram token filter. The
edgeNGram filter must be enclosed in reverse filters like this:
"filter" : ["reverse", "edgeNGram", "reverse"]
which essentially reverses the token, builds front EdgeNGrams and reverses
the ngram again. This has the same effect as the previous "side" : "BACK" setting.