You are looking at documentation for an older release.
Not what you want? See the
current release documentation.
Standard Tokenizeredit
A tokenizer of type standard providing grammar based tokenizer that is
a good tokenizer for most European language documents. The tokenizer
implements the Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29.
The following are settings that can be set for a standard tokenizer
type:
| Setting | Description |
|---|---|
| The maximum token length. If a token is seen that
exceeds this length then it is split at |