Yatsko's Linguistic Informatics
Research and Investigations
This page focuses on the theoretical concepts suggested by Dr. V.A. Yatsko
His theoretical contributions include:
-
An innovative Y-method of automatic text classification based on deviations of stop-words frequencies from Zipfian distribution [12]. The method has demonstrated high discriminative power in differentiating between various classes of text documents in English and Russian. To support processing of Russian texts an original stop-word list was created [2] (see 'Publications' section).
-
Original undersampling and logarithmic equalizing techniques to neutralize discrepancies in texts lengths were developed as preprocessing procedures for the Y-method [1].
-
An original classification of linguistic technologies based on two groups of classification criteria, semiotic and technological that contributes to the systemic presentation of the linguistic informatics domain [4].
-
An original methodology for predicting US presidential election results [7]
-
The interpretation of Bradford's law in terms of geometric progression (Y-law). This interpretation can be effectively used to compute threshold values in case it is necessary to distinguish subsets within a set of objects (e.g. successful/unsuccessful applicants, developed/underdeveloped regions, etc.). Y-law allows distinguishing between a great number of subsets in contrast Bradford's law that distinguishes between 3 zones [11; 13; 20].
-
The methodology of zonal-correlational analysis based on Y-law. The methodology can be used for purposes of automatic text classification [13].
-
Symmetric term-weighting technique, which allows creating coherent summaries. This technique underlies the functioning of UNIS and ETS summarizers [32; 30; 29; 7].
-
Depth of user search conception for assessment of Internet information retrieval systems.
-
Ontologies for opinion mining localized for Russian and English [9;22].
-
A classification of software used in foreign language education [24]
-
Methodology for assessment of summaries' quality based on matching against a reference dictionary extracted from the source text [27].
-
An original conception of historical development of computer science, which distinguishes between its mathematic, technological, and linguistic foundations [8].
-
Integrational Discourse Analysis (IDA) conception that distinguishes between semantic, communicative, relational, modal and pragmatic discourse dimensions [33; 34].
-
Compositional Modelling methodology for modeling academic texts' logical and semantic structure based on the IDA conception [35].
-
Componential-predication analysis, a methodology for linguistic research. It proved efficient for the analysis of English possessive sentences [15; 16; 38].
-
An original classification of modi based on the conception of alienated knowledge [36].