ANALYSER
The morphological analysis of language depends on language type. Kazakh Language belongs to Turkish Language group of Altaic Language Family, which are «agglutinative languages». In such languages words are formed by combining root words and morphemes. There are roots and several suffixes and affixes, when they are combined, the word modifies or extends its meaning.
Word formation rules.
The Kazakh alphabet consists of 42 letters, 37 sounds of which 12 (А, Ə, Е, И, О, Ө, У, Ұ, Ү, Ы, І, Э) are vowels, 25 (Б, В, Г, Ғ, Д, Ж, З, Й, К, Қ, Л, М, Н, Ң, П, Р, С, Т, Ф, Ш, Щ, Х, Ц, Ч,
Һ) are consonants and others are soundless or combination of two sounds (Ь, Ъ, Ю, Я, Ë). In Kazakh language words are formed by adding to root words suffixes and affixes. Suffixes are always added before affixes. There are many rules concerning combination of root and suffixes/affixes. The root words are classified into two groups: nominal and verbal. Their affixes differ from each other. We focus only on nominal root words. The rules about verbal root words will be considered in the future researches. [1]
The affixes, which can be added to nominal roots in Kazakh language are divided into the following four types:
- Plural: Kazakh language has six various affixes to express the plural form of words.
- Personal possessive: Kazakh language has six various affixes to express the possessive forms of personal pronouns.
- Case: Kazakh language has seven various affixes to express the different cases.
- Predicative Person: The first, second and third personal pronouns are usually followed by the words with additive predicative personal elements. [1]
The above-mentioned four types of affixes can be used separately or linked together. Suffixes in Kazakh are complex, especially when a root is linked with many suffixes. There are some rules we can follow to add affixes to word roots (figure) [3]
Kazakh word formation uses a number of phonetic harmony rules. The vowel harmony rules require that vowels in a suffix/affix must be hard or soft according to the last syllable when they are affixed to a root. [2]
Automated kazakh language morphological analyser
Figure
Hard vowels |
а, о, ұ, ы, у |
Soft vowels |
ə, е, и, ө, ү, і, э, у |
For example,
адам{person} + plural affix[дар, дер?], syllable -дам is hard, so we must add hard affix -дар. The result word is адамдар{people}.
But, there are other two plural affixes for this case: -лар,-тар. We didn’t choose them. Why?
Another basic aspect of Kazakh phonology is consonant harmony. There are 3 groups of consonants:
Unvoiced consonants |
п, ф, к, қ, т, с, ш, щ, х, ц, ч, һ |
Voiced consonants |
б, в, г, ғ, д, ж, з |
Sonorant consonants |
р, л, й, у, м, н, ң |
Word
Vowel harmony check |
Affix determination |
Consonant harmony check |
The affixing rule according to consonant harmony is shown in table below:
The last letter of stem |
Affixes |
vowels or sonorant consonants -р, -й, -у |
-лар, -лер |
voiced consonants or sonorant consonants -м, -л, -н, -ң |
-дар, -дер |
unvoiced consonants or voiced consonant -б, -в, -г, -д |
-тар, -тер |
Here we considered only plural affixes. The other affixes conform to these rules too. But there can be some differences, which we are going to consider during the implementation of this project.
The algorithm
Root determination
Determining affix type
FindingKroaot word from database
Figure 1 – Morphological analysis
Maksutkhan N. et al.
Kazakh language database consists of all root words, which are classified by part of speech.
Root determination algorithm
- The inputted word is «candidate root». After each step, next «portion» will be «candidate root».
- To start from the right, by deleting one letter, if the «candidate root» is not found from Kazakh language database. If there is word that is equal to our «portion», we will stop and this «portion» will be the root of word.
Оқушыларға{to students}
No
No
No
Yes
This is root -оқушы{student}!
Compare with database, until yes
No
Figure 2 – Root determination
To find the part of speech of the root. By checking last letter and last syllable of root, at each step to compare “portion” with affixes from the set of affixes, which satisfy vowel and consonant harmonies of this root.
There is no such affix There is no such affix
Yes, this is plural affix. Let’s check remained “portion” for affix determination
There is no such affix
Yes, this is case affix.
Figure 3 – Affix determination
The result of morphological analysis:
Оқушы [noun, hard syllable, vowel] +
лар[plural affix, hard syllable] + ға [case affix, hard syllable]
Automated kazakh language morphological analyser
References
- Tujmebaev ZH.K. Kazahskij YAzyk. Grammaticheskij spravochnik. – Almaty, 1996. – S. 34.
- SHaripbaev A., Bekmanova G. T. Sintez form slov tyurkskogo yazyka s pomoshch’yu semanticheskih nejronnyh setej. Sovremennye problemy prikladnoj matematiki i informacionnyh tekhnologij. – Tashkent: Al’-Horezmi, 2009. – 145 s.
- SHaripbaev A. A., Bekmanova G. T. Logicheskaya semantika slov v kazahskom yazyke // Materialy Vserossijskoj konferencii. Znanie-ontologiya-teoriya. – Novosibirsk: Zont, 2009. – S. 246-249.