Snowball Analyzeredit
An analyzer of type snowball
that uses the
standard tokenizer, with
standard filter,
lowercase filter,
stop filter, and
snowball filter.
The Snowball Analyzer is a stemming analyzer from Lucene that is originally based on the snowball project from snowball.tartarus.org.
Sample usage:
{ "index" : { "analysis" : { "analyzer" : { "my_analyzer" : { "type" : "snowball", "language" : "English" } } } } }
The language
parameter can have the same values as the
snowball filter and defaults to English
. Note that not all the language
analyzers have a default set of stopwords provided.
The stopwords
parameter can be used to provide stopwords for the
languages that have no defaults, or to simply replace the default set
with your custom list. Check Stop Analyzer
for more details. A default set of stopwords for many of these
languages is available from for instance
here
and
here.
A sample configuration (in YAML format) specifying Swedish with stopwords:
index : analysis : analyzer : my_analyzer: type: snowball language: Swedish stopwords: "och,det,att,i,en,jag,hon,som,han,på,den,med,var,sig,för,så,till,är,men,ett,om,hade,de,av,icke,mig,du,henne,då,sin,nu,har,inte,hans,honom,skulle,hennes,där,min,man,ej,vid,kunde,något,från,ut,när,efter,upp,vi,dem,vara,vad,över,än,dig,kan,sina,här,ha,mot,alla,under,någon,allt,mycket,sedan,ju,denna,själv,detta,åt,utan,varit,hur,ingen,mitt,ni,bli,blev,oss,din,dessa,några,deras,blir,mina,samma,vilken,er,sådan,vår,blivit,dess,inom,mellan,sådant,varför,varje,vilka,ditt,vem,vilket,sitta,sådana,vart,dina,vars,vårt,våra,ert,era,vilkas"