Cart
Free US shipping over $10
Proud to be B-Corp

Turkish Natural Language Processing Kemal Oflazer

Turkish Natural Language Processing By Kemal Oflazer

Turkish Natural Language Processing by Kemal Oflazer


Turkish Natural Language Processing Summary

Turkish Natural Language Processing by Kemal Oflazer

This book brings together work on Turkish natural language and speech processing over the last 25 years, covering numerous fundamental tasks ranging from morphological processing and language modeling, to full-fledged deep parsing and machine translation, as well as computational resources developed along the way to enable most of this work. Owing to its complex morphology and free constituent order, Turkish has proved to be a fascinating language for natural language and speech processing research and applications.
After an overview of the aspects of Turkish that make it challenging for natural language and speech processing tasks, this book discusses in detail the main tasks and applications of Turkish natural language and speech processing. A compendium of the work on Turkish natural language and speech processing, it is a valuable reference for new researchers considering computational work on Turkish, as well as a one-stop resource for commercial and research institutions planning to develop applications for Turkish. It also serves as a blueprint for similar work on other Turkic languages such as Azeri, Turkmen and Uzbek.

About Kemal Oflazer

Kemal Oflazer received his Ph.D. in computer science from Carnegie Mellon University in Pittsburgh, USA, and his M.Sc. in computer science and B.Sc. in electrical and electronics engineering from Middle East Technical University, Ankara, Turkey. He is currently a faculty member at Carnegie Mellon University in Doha, Qatar, where he is also the Associate Dean for Research. He has held visiting positions at the Computing Research Laboratory at New Mexico State University, Las Cruces, USA and at the Language Technologies Institute, Carnegie Mellon University. Prior to joining CMU-Qatar, he worked at Sabanci University in Istanbul, Turkey (2000-2008) and Bilkent University in Ankara, Turkey (1989-2000). He has worked extensively on developing natural language processing techniques and resources for Turkish. Oflazer's current research interests include statistical machine translation into morphologically complex languages, the use of NLP for language learning and machine learning for computational morphology. In addition, he was a member of the editorial boards of Computational Linguistics, the Journal of Artificial Intelligence Research, Machine Translation, and Research on Language and Computation and was a book review editor for Natural Language Engineering. He was a member of the nomination and advisory boards for EACL, and served as the program co-chair for ACL 2005, an area chair for COLING 2000, EACL 2003, ACL 2004, ACL 2012, and EMNLP 2013 and the organization committee co-chair for EMNLP 2014. Currently, he is an editorial board member of both Language Resources and Evaluation and Natural Language Engineering journals and is a member of the advisory board for SpringerBriefs in Natural Language Processing.
Murat Saraclar received his B.Sc. degree in 1994 from the Electrical and Electronics Engineering Department at Bilkent University, Ankara, Turkey, his M.S.E. degree in 1997 and Ph.D. degree in 2001 from the Electrical and Computer Engineering Department at the Johns Hopkins University, Baltimore, USA. From 2000 to 2005, he was with the multimedia services department at AT&T Labs Research, and in 2005 joined the Electrical and Electronic Engineering Department of Bogazici University, Istanbul, Turkey, where he is currently a full professor. He was a visiting research scientist at Google Inc., New York, USA (2011-2012) and an academic visitor at IBM T.J. Watson Research Center (2012-2013). Saraclar was awarded the AT&T Labs Research Excellence Award in 2002, the Turkish Academy of Sciences Young Scientist (TUBA-GEBIP) Award in 2009, and the IBM Faculty Award in 2010. He has published more than 100 articles in journals and conference proceedings. Furthermore, he served as an associate editor for IEEE Signal Processing Letters (2009-2012) and IEEE Transactions on Audio, Speech, and Language Processing (2012-2016). He was an editorial board member of Language Resources and Evaluation from 2012 to 2016, and is currently an editorial board member of Computer Speech and Language as well as a member of the IEEE Signal Processing Society Speech and Language Technical Committee (2007-2009, 2015-2018).

Table of Contents

1 Turkish and its Challenges for Language and Speech Processing . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Turkish Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Constituent Order and Morphology-Syntax Interface . . . . . . . . . . . . 71.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5 State-of-the-art Tools and Resources for Turkish . . . . . . . . . . . . . . . 15References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Morphological Processing for Turkish . . . . . . . . . . . . . . . . . . . . . . . . . . 212.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Overview of Turkish Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3 Morphophonology and Morphographemics . . . . . . . . . . . . . . . . . . . . 232.4 Root Lexicons and Morphotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.4.1 Representational Convention . . . . . . . . . . . . . . . . . . . . . . . . 282.4.2 Nominal Morphotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.3 Verbal Morphotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.4 Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.4.5 Examples of Morphological Analyses . . . . . . . . . . . . . . . . 322.5 The Architecture of the Turkish Morphological Processor . . . . . . . . 342.6 Processing Real Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.6.1 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.6.2 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.6.3 Foreign Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.6.4 Unknown Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.7 Multiword Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.7.1 Lexicalized Collocations . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.7.2 Semi-lexicalized Collocations . . . . . . . . . . . . . . . . . . . . . . . 382.7.3 Non-lexicalized Collocations . . . . . . . . . . . . . . . . . . . . . . . . 402.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3 Morphological Disambiguation for Turkish . . . . . . . . . . . . . . . . . . . . . . 533.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.3 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.3.1 Rule-based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.3.2 Learning the Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.3.3 Models Based on Inflectional Group n-grams . . . . . . . . . . 593.3.4 Discriminative Methods for Disambiguation . . . . . . . . . . . 603.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.4.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654 Language Modeling for Turkish Text and Speech Processing . . . . . . . 69Ebru Arisoy and Murat Saraclar4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.2 Language Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.3 Challenges in Statistical Language Modeling for Turkish . . . . . . . . 734.4 Sub-lexical Units for Statistical Language Modeling . . . . . . . . . . . . 754.4.1 Linguistic Sub-lexical Units . . . . . . . . . . . . . . . . . . . . . . . . . 764.4.2 Statistical Sub-lexical Units . . . . . . . . . . . . . . . . . . . . . . . . . 774.5 Statistical Language Modeling for Turkish . . . . . . . . . . . . . . . . . . . . 784.5.1 Language Modeling with Linguistic Sub-lexical Units . . . 784.5.2 Statistical Sub-lexical Units - Morphs . . . . . . . . . . . . . . . . 814.6 Discriminative Language Modeling for Turkish . . . . . . . . . . . . . . . . 814.6.1 Discriminative Language Model . . . . . . . . . . . . . . . . . . . . . 824.6.2 Feature Sets for Turkish DLM . . . . . . . . . . . . . . . . . . . . . . . 834.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895 Turkish Speech Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Ebru Arisoy and Murat Saraclar5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.2 Foundations of Automatic Speech Recognition . . . . . . . . . . . . . . . . 965.3 Turkish Language Resources for ASR . . . . . . . . . . . . . . . . . . . . . . . . 1005.3.1 Turkish Acoustic and Text Data . . . . . . . . . . . . . . . . . . . . . . 1005.3.2 Linguistic Tools Used in Turkish ASR . . . . . . . . . . . . . . . . 1055.4 Turkish ASR Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.4.1 Newspaper Content Transcription System . . . . . . . . . . . . . 1065.4.2 Turkish Broadcast News Transcription System . . . . . . . . . 1095.4.3 LVCSR System for Call Center Conversations . . . . . . . . . 1125.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146 Turkish Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119Reyyan Yeniterzi, Goekhan Tur and Kemal Oflazer6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.2 NER on Turkish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.3 Task Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.3.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.3.2 Evaluating NER Performance . . . . . . . . . . . . . . . . . . . . . . . 1226.4 Domain and Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.4.1 Formal Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.4.2 Informal Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.4.3 Challenges of Informal Texts for NER . . . . . . . . . . . . . . . . 1266.5 Preprocessing for NER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.5.1 Tokenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.5.2 Morphological Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.5.3 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.6 Approaches used in Turkish NER . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.6.1 Rule-based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296.6.2 Hybrid Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1306.6.3 Machine Learning Approaches . . . . . . . . . . . . . . . . . . . . . . 1316.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1347 Dependency Parsing of Turkish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1377.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1377.2 Dependency Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397.3 Morphology and Dependency Relations in Turkish . . . . . . . . . . . . . 1407.3.1 Dependency Relations in Turkish . . . . . . . . . . . . . . . . . . . . 1437.4 An Incremental Data-driven Statistical Dependency ParsingSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1447.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1457.4.2 Modeling Turkish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1477.4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1507.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1507.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1528 Wide-coverage parsing, semantics and morphology . . . . . . . . . . . . . . . 1578.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1578.2 Morphology and Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1608.3 Radical Lexicalization and Predicate-Argument Structure ofsub-lexical Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1618.4 Combinatory Categorial Grammar: CCG. . . . . . . . . . . . . . . . . . . . . . 1628.5 The Turkish Categorial Lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1668.5.1 The Lexemic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1688.5.2 The Morphemic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1708.6 Parsing with Automatically Induced CCG Lexicons . . . . . . . . . . . . 1728.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1759 Deep Parsing of Turkish with Lexical-Functional Grammar . . . . . . . . 1799.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1799.2 Lexical-Functional Grammar and Xerox Linguistic Environment . 1809.3 Inflectional Groups as First-class Syntactic Citizens . . . . . . . . . . . . 1819.4 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1849.5 LFG Analyses of Various Linguistic Phenomena . . . . . . . . . . . . . . . 1859.5.1 Noun Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1859.5.2 Adjective Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1869.5.3 Adverbial Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1879.5.4 Postpositional Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1879.5.5 Temporal Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1889.6 Sentential Derivations, Sentences and Free Constituent Order . . . . 1899.6.1 Sentential Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1899.6.2 Sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1949.6.3 Handling Constituent Order Variations . . . . . . . . . . . . . . . . 1959.7 Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1989.8 Valency Alternations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1999.8.1 Causatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1999.8.2 Passives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2029.9 Non-canonical Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2049.10 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2069.10.1 Manual Test Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2079.10.2 Sentence Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2079.10.3 Noun Phrase Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2089.11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20910 Statistical Machine Translation and Turkish . . . . . . . . . . . . . . . . . . . . . 21310.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21310.2 Handling Morphology in Statistical Machine Translation . . . . . . . . 21510.3 The Morpheme Segmentation Approach . . . . . . . . . . . . . . . . . . . . . . 21610.3.1 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 21910.3.2 Word Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22210.3.3 Sample Translations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22310.3.4 Observations on the Morpheme Segmentation Approach . 22410.4 The Syntax-to-Morphology Mapping Approach . . . . . . . . . . . . . . . . 22510.4.1 Mapping Source-side Syntax to Target-side Morphology . 22610.4.2 Experimental Setup and Results . . . . . . . . . . . . . . . . . . . . . 23010.4.3 Experiments with Constituent Reordering . . . . . . . . . . . . . 23710.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24111 Machine Translation Between Turkic Languages . . . . . . . . . . . . . . . . . 24511.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24511.2 Turkic Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24611.2.1 Similarities and Differences of Turkic Languages . . . . . . . 24611.3 Machine Translation between Turkic Languages . . . . . . . . . . . . . . . 25011.3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25011.3.2 Morphological Disambiguation . . . . . . . . . . . . . . . . . . . . . . 25311.3.3 Morphological Feature Transfer . . . . . . . . . . . . . . . . . . . . . 25411.3.4 Lexical Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25411.3.5 Statistical Disambiguation Module . . . . . . . . . . . . . . . . . . . 25611.3.6 Sentence Level Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25711.3.7 Morphological Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 25811.4 Machine Translation Evaluation on Turkic Languages . . . . . . . . . . . 25811.4.1 Root Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25911.4.2 Feasible Suffix Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26011.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26112 Sentiment Analysis in Turkish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26512.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26512.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26812.3 Main Difficulties for Turkish Sentiment Analysis . . . . . . . . . . . . . . . 27012.4 Practical Sentiment Analysis for Turkish . . . . . . . . . . . . . . . . . . . . . . 27112.4.1 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27112.4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27312.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27612.5.1 Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27612.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27712.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27913 The Turkish Treebank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28313.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28313.2 What information needs to be represented? . . . . . . . . . . . . . . . . . . . . 28413.2.1 Representing Morphological Information . . . . . . . . . . . . . . 28413.2.2 Representing Syntactic Relations . . . . . . . . . . . . . . . . . . . . 28613.2.3 Example of a Treebank Sentence . . . . . . . . . . . . . . . . . . . . . 28813.3 Evolution of the Turkish Treebank . . . . . . . . . . . . . . . . . . . . . . . . . . . 29013.3.1 The CoNLL Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29013.3.2 Branches of the Turkish Treebank . . . . . . . . . . . . . . . . . . . . 29213.4 The ITU Web Treebank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29313.5 The Annotation Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29413.6 The Turkish Universal Dependencies Treebank . . . . . . . . . . . . . . . . 29613.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29814 Linguistic corpora: A view from Turkish . . . . . . . . . . . . . . . . . . . . . . . . 30114.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30114.2 Brief History of Corpus Linguistics . . . . . . . . . . . . . . . . . . . . . . . . . . 30214.3 Linguistic Corpora and Corpus Linguistics . . . . . . . . . . . . . . . . . . . . 30414.4 Use of Corpora in Linguistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30814.5 Turkish Linguistic Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30914.5.1 METU-Turkish Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31014.5.2 Turkish National Corpus (TNC) . . . . . . . . . . . . . . . . . . . . . 31314.5.3 Spoken Turkish Corpus (STC) . . . . . . . . . . . . . . . . . . . . . . . 31514.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32115 Turkish Wordnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32715.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32715.2 Basic Structure of Turkish Wordnet . . . . . . . . . . . . . . . . . . . . . . . . . . 32815.2.1 Semantic Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32815.2.2 Linking Wordnets to Each Other . . . . . . . . . . . . . . . . . . . . . 32915.3 Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33015.3.1 Merge vs. Expand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33115.3.2 Parts-of-Speech, Definitions and Sense Numbers . . . . . . . 33115.3.3 Lexical Gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33215.3.4 No Dangling Nodes or Relations . . . . . . . . . . . . . . . . . . . . . 33215.3.5 Validating Semantic Relations . . . . . . . . . . . . . . . . . . . . . . . 33315.4 The Development Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33315.4.1 First Set of Concepts (Subset I) . . . . . . . . . . . . . . . . . . . . . . 33315.4.2 Extracting Semantic Relations from MonolingualResources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33415.4.3 Second Set of Concepts (Subset II) . . . . . . . . . . . . . . . . . . . 33615.4.4 Shifting to Princeton Wordnet 1.7.1 . . . . . . . . . . . . . . . . . . 33715.4.5 Third Set of Concepts (Subset III) . . . . . . . . . . . . . . . . . . . . 33815.4.6 Shifting to Princeton Wordnet 2.0 . . . . . . . . . . . . . . . . . . . . 33815.4.7 Adding Balkanet-specific Concepts. . . . . . . . . . . . . . . . . . . 33815.4.8 Final Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33915.5 Current Status of Turkish Wordnet . . . . . . . . . . . . . . . . . . . . . . . . . . . 33915.6 Quality Validation and Coverage Tests . . . . . . . . . . . . . . . . . . . . . . . . 34015.7 Applications of Turkish Wordnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34215.7.1 Capturing Semantic Relations through Morphology . . . . . 34215.7.2 Turkish Wordnet in Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34415.8 Conclusion and Directions for Future Work . . . . . . . . . . . . . . . . . . . 345References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34516 Turkish Discourse Bank: Connectives and Their Configurations . . . . 34916.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34916.2 The TDB Annotation Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35116.2.1 Major Sources of Disagreements among Annotators . . . . 35316.2.2 The Discourse Annotation Tool for Turkish . . . . . . . . . . . . 35516.3 Connectives and Discourse Structure . . . . . . . . . . . . . . . . . . . . . . . . . 35516.4 Discourse relation configurations in the TDB . . . . . . . . . . . . . . . . . . 35616.4.1 Independent Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35716.4.2 Full Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35816.4.3 Nested Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35816.4.4 Shared Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36016.4.5 Properly Contained Argument . . . . . . . . . . . . . . . . . . . . . . . 36016.4.6 Properly Contained Relation . . . . . . . . . . . . . . . . . . . . . . . . 36216.4.7 Partially Overlapping Arguments . . . . . . . . . . . . . . . . . . . . 36216.4.8 Pure Crossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36416.5 Results and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

Additional information

NPB9783319901633
9783319901633
331990163X
Turkish Natural Language Processing by Kemal Oflazer
New
Hardback
Springer International Publishing AG
2018-08-02
355
N/A
Book picture is for illustrative purposes only, actual binding, cover or edition may vary.
This is a new book - be the first to read this copy. With untouched pages and a perfect binding, your brand new copy is ready to be opened for the first time

Customer Reviews - Turkish Natural Language Processing