top of page

YAVA-Stemmer

Stemming as a procedure of automatic morphological analysis has been an indispensable feature of information retrieval and text summarization since early 1960s. The general idea underlying stemming is to identify words that are the same in meaning but different in form by removing suffixes and endings. Such identification is important for correct term weighting and significantly increases effectiveness of information retrieval.

By now a number of stemmers for different languages have been created, the most famous English ones being Porter stemmer and Paice/Husk (Lancaster) stemmer. Both stemmers are algorithmic and work on lists of suffixes specific for English. Dictionary stemmers work on dictionaries of stems.

YAVA-Stemmer (Yatsko’s stemmer) is a Russian stemmer of a hybrid nature because it employs an extensive stems dictionary as well as a list of suffixes and endings.   The necessity to use both methods has been determined by the complex Russian morphology. Due to the advanced algorithm YAVA-Stemmer works better than any other Russian stemmer making less  overstemming erro
rs.

The details of the stemmer's algorithm are given in V. A. Yatsko's paper "Problems in the development of a stemmer" available at https://cyberleninka.ru/article/n/osobennosti-razrabotki-stemmera (in Russian). 
 

Stemmers are usually integrated into NLP systems to be used during text preprocessing. We distribute YAVA-Stemmer as a stand-alone application for purely testing purposes. It can also be used for educational purposes and term-weighting. Go to Downloads section to get the application. 

 

bottom of page