You are looking at documentation for an older release.
Not what you want? See the
current release documentation.
Letter Tokenizeredit
A tokenizer of type letter
that divides text at non-letters. That’s to
say, it defines tokens as maximal strings of adjacent letters. Note,
this does a decent job for most European languages, but does a terrible
job for some Asian languages, where words are not separated by spaces.