The Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling” aims to conduct innovative research to support the high-quality automatic annotation of large-scale corpus resources of German language through induction of domain-, genre- and variety-adaptive natural language processing models, to enable advanced empirical research in linguistics as well as innovative applications in the humanities and the social sciences.
Central research themes are the corpus-based acquisition of linguistic, especially distributional semantic models, the interfacing of corpora with linguistic and knowledge ontologies and the corpus- and computational linguistic analysis of genre effects in grammar and lexicon research. As a distinguishing feature the Leibniz ScienceCampus will focus on German language, which is considerably understudied in relation to English. The computational modeling will take advantage of weak supervision and unsupervised learning techniques. Expected results include improved large-scale annotated corpus resources of contemporary German, enhanced with novel semantic annotation layers, and advanced NLP models and resources for the analysis of German corpora from different genres and domains.
The close cooperation between linguists and computational linguists in the ScienceCampus is designed to foster novel research methods in empirical linguistics. Through improved genre- and domain-adaptive computational models it will be possible to address a wide range of applications in Digital Humanities and Language Technology. The Leibniz ScienceCampus will explore novel research questions in this area through collaborative interdisciplinary incubator projects in empirical linguistics and Digital Humanities.
The Leibniz ScienceCampus “Empirical Linguistics and Computational Language Modeling” is a novel cooperation project between the Institute for German Language (IDS) Mannheim and the Institute of Computational Linguistics of Heidelberg University. The ScienceCampus includes cooperation partners from linguistics and computer science from Heidelberg University and the University of Mannheim, as well as computational linguists from the Heidelberg Institute for Theoretical Studies (HITS).