We at the Chair for Natural Language Processing (Computer Science XII) try to make machines understand human language! In fact, we try to make them understand very many different human languages. We primarily focus on written text (after all, speech can always be transcribed to text). Methodologically, the work of the group focuses on deep learning and representation learning methods for semantic modeling of natural language (that is, precise modeling of meaning of natural language statements and text documents), with the special focus on multilingual representation learning and cross-language transfer of models for concrete NLP tasks.
Driven by deep learning advances, NLP has lately seen substantial progress, primarily due to the technical ability to (pre)train ever larger neural models on ever more text. Such progress can be exclusive as its benefits are beyond reach for most of the world’s population (e.g., speakers of low-resource languages, anyone who lacks computational resources needed to train or use these models). Moreover, training ever larger language models based on complex neural architectures (for example, the popular Transformer) has a large carbon footprint and such models tend to encode a wide range of negative societal stereotypes and biases (e.g., sexism, racism). At WüNLP we specifically address these challenges and aim to democratize state-of-the-art language technology. To this end, we pursue three research threads that we hope will lead to equitable, societally fair, and sustainable language technology: (i) sustainable, modular, and sample-efficient NLP models, (ii) fair and ethical (i.e., unbiased) NLP, and (iii) truly multilingual NLP, with special focus on low-resource languages.
Text data is all around -- besides the core methodological NLP work, we also work on interdisciplinary projects where we apply cutting-edge NLP methods to interesting problems from other disciplines, most prominently in the area of Computational Social Science (and so far most often in collaboration with political scientists).
Our Chair has international prominence and visibility. We regularly publish our research results at the very competitive top-tier NLP conferences (ACL, EMNLP, NAACL, EACL). Further, Prof. Glavaš served as an Editor-in-Chief for the ACL Rolling Review, the centralized reviewing service of the Association for Computational Linguistics. We have established numerous research collaborations, most prominently with the Language Technology Group of the University of Cambridge., CIS at LMU München, and UKP at TU Darmstadt.