Standard Tokenizer | Elasticsearch Reference [1.7]

You are looking at documentation for an older release. Not what you want? See the current release documentation.

» » »

Standard Tokenizeredit

A tokenizer of type standard providing grammar based tokenizer that is a good tokenizer for most European language documents. The tokenizer implements the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

The following are settings that can be set for a standard tokenizer type:

Setting	Description
`max_token_length`	The maximum token length. If a token is seen that exceeds this length then it is split at `max_token_length` intervals. Defaults to `255`.

« Tokenizers Edge NGram Tokenizer »

Standard Tokenizeredit

Top Videos

Be in the know with the latest and greatest from Elastic.