Research Projects

  • image

    SegCor - Segmentation of Oral Corpora

    It is the aim of this ANR/DFG funded project to develop methods for the segmentation of German and French spoken language corpora

    It is the aim of this project to develop methods for the segmentation of spoken language. Those methods are to be based on linguistic knowledge and at the same time adequate for the analysis of spoken language on various linguistic levels as well as for the development of tools in computational linguistics. The publication of a guideline for a systematic segmentation of various types of German and French verbal interaction is a milestone of this project.

    In the second stage, the possibilities of an automatized segmentation of spoken language corpora based on the segmentation guidelines will be tested and documented. This way the project does not only improve the usability of the three databases involved but also deepen our knowledge about the structures of spoken language.

    SegCor is a project funded by the German Research Foundation (DFG) and the French National Research Agency (ANR). This project is a cooperation of the department of Pragmatics of Institute for the German Language (IDS Mannheim), the University of Lyon and the University of Orleans.<

  • image

    Part-of-Speech Tagging for FOLK (The Research and Teaching Corpus of Spoken German)

    The aim of my PhD project is the automatized annotation of the “Research and Teaching Corpus of Spoken German” (FOLK) with part-of-speech tags.

    The aim of the project is the automatized annotation of the “Research and Teaching Corpus of Spoken German” (FOLK) with part-of-speech tags. State-of-the-art taggers like e.g. the Treetagger with the STTS (Stuttgart Tübingen Tagset) only show an accuracy of 60% to 80% on our transcripts of spoken German.

    Part-of-speech tagging (POS-tagging) of spoken data requires different means of annotation than POS-tagging of written and edited texts. In order to capture the features of German spoken language, a distinct tagset is needed to respond to the kinds of elements which only occur in speech. In order to create such a coherent tagset the most prominent phenomena of spoken language need to be analyzed, especially with respect to how they differ from written language.

    The adaption of the tagset was done in cooperation with the workgroup “speech particles”​ of the STTS workshop “The STTS-Tagset for part-of-speech annotation: state of affairs and perspectives” and Prof. Dr. Hardarik Blühdorn of the Grammar department of the IDS Mannheim.

  • image

    FOLK (Forschungs- und Lehrkorpus Gesprochenes Deutsch)

    As a research assistant I was transcribing and orthographically normalizing data for the FOLK project.

    FOLK is the “Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK)” (eng.: research and teaching corpus of spoken German). The project has set itself the aim of building a corpus of German conversations which a) covers a broad range of interaction types in private, institutional and public settings, b) is sufficiently large and diverse and of sufficient quality to support different qualitative and quantitative research approaches, c) is transcribed, annotated and made accessible according to current technological standards, and d) is available to the scientific community on a sound legal basis and without unnecessary restrictions of usage.

    (text by Thomas Schmidt 2014)

  • image

    LiSe-DaZ (Linguistische Sprachstandserhebung – Deutsch als Zweitsprache)

    As a research assistant I was in charge of the organization of the standardization procedure as well as employing and training of research assistants in the project LiSe-DaZ (linguistic learning assessment of German as a second language).

    “LiSe-DaZ – Linguistische Sprachstandserhebung – Deutsch als Zweitsprache” (linguistic learning assessment of German as a second language) was developed by Prof. Dr. Rosemarie Tracy, University of Mannheim, and Prof. Dr. Petra Schulz, University of Frankfurt/Main on behalf of the Baden-Württemberg foundation. With this procedure it is possible to assess the linguistic development of children who are between three and seven years old, both native speakers and speakers of German as a second language. For the first time, the development of the language spaking abilities of the children can be assessed according to various linguistic stages and based on this, remedial teaching can be specifically adjusted to the needs of the children. LiSe-DaZ was standardized with 912 children all over Germany. The tool was published by Hogrefe Verlag in summer 2011.