Classic Tokenizer | Elasticsearch Reference [1.7]

You are looking at documentation for an older release. Not what you want? See the current release documentation.

» » »

« Path Hierarchy Tokenizer Thai Tokenizer »

Classic Tokenizeredit

A tokenizer of type classic providing grammar based tokenizer that is a good tokenizer for English language documents. This tokenizer has heuristics for special treatment of acronyms, company names, email addresses, and internet host names. However, these rules don’t always work, and the tokenizer doesn’t work well for most languages other than English.

The following are settings that can be set for a classic tokenizer type:

Setting	Description
`max_token_length`	The maximum token length. If a token is seen that exceeds this length then it is discarded. Defaults to `255`.

« Path Hierarchy Tokenizer Thai Tokenizer »

Classic Tokenizeredit

Top Videos

Be in the know with the latest and greatest from Elastic.