Corpus interoperabillity workshop, 11 March 2011

UFR études anglophones
8-10 Charles V
salle audiovisuelle

Alternatively, use EVO videoconferencing system , Email nicolas.ballier AT univ-paris-diderot.fr for password, Meeting URL

9h 00 Nicolas Ballier (Paris Diderot) Introduction: from corpora to corpus interoperability

9h 20 Philippe Martin (Paris Diderot) Using winpitch for corpus interoperability: some case studies

Corpora are sometimes designed as one shot only undertakings without any plan of reusability . The talk will exemplify some possibilities with WinPitch to compare several transcribed and aligned sound files of the same corpus as well as transcribed and aligned recordings from different corpora. Imbedded routines have been added to query corpora released without any searching facilities (NECTE) or former (CORAL-ROM) or on-going projects (PFC, Rhapsodie, CRFC…) using Praat textgrid format.

10h 00 discussion

10h 20 Detmar Meurers (Tübingen) Focus projection between theory and evidence: Towards using corpora for research linking syntax, prosody, and information structure

Research over the past decade has established that the nature of the integration of a sentence into the discourse can provide explanations for constraints previously stipulated in syntax. But to be able to further explore and refine this line of research, it is essential to have an explicit model of the interaction of syntax, information structure, and intonation as part of a formal linguistic architecture. Research investigating the interaction of syntax, information structure, and intonation has traditionally been theoretically driven, with the syntactic F-marking approach of Selkirk (1995) serving as one prominent foundation. At the same time, recent work mostly driven by pragmatic and semantic considerations has questioned the very foundation of such an approach. This includes the claim that focus projection as the fundamental means of connecting the focus exponent (the word carrying the nuclear pitch accent) and the semantically interpreted focus element is not needed at all (Roberts, 2006; Kadmon, 2006, 2009), or that it is not subject to syntactic constraints (Büring, 2006; Fanselow, 2008). Importantly, the new approaches do not just differ in terms of their theoretical interpretation, but they also make claims about a fundamentally different empirical landscape. In this joint work with Kordula De Kuthy, we want to bring together and compare the theoretical predictions with two sources of empirical evidence. After reviewing the published experimental results relating to focus projection in English, we explore where prosodically annotated, syntactically parsed corpora can provide empirical evidence for or against the different conceptualizations of focus projection.

11h 00 discussion

11h 20 break

11h 40 Isabelle Léglise (CNRS) The CLAPOTY project

This talk will present the ANR project “Contacts de Langues : Analyses Plurifactorielles assistées par Ordinateur et conséquences Typologiques” (Towards a multi-level, typological and computer-assisted analysis of contact-induced language change).

12h 20 discussion and general discussion

12h 50 — 13h 00 Nicolas Ballier and Natalie Kübler (Paris Diderot) Closing remarks and CLILLAC-ARP projects related to corpus interoperability

13h 00 END

The Corpus interoperability series

This series of meetings between linguists of different theoretical backgrounds and research interests investigates the dialogue between different corpora, written corpora and spoken corpora, multilingual corpora, native corpora and learner corpora.

Next sessions

Our next sessions (dates to be announced) will be dedicated to

1. Comparing native an non-native corpora: a dialogue between AIX-MARSEC and ANGLISH

The AIX-MARSEC corpus is a multi-layered corpus.
The ANGLISH corpus (Tortel 2009) has a similar Jassem-like tier of NRU and ANA, an MOP-based syllable tier as well as word alignment and SAMPA transcription. How can we foster the analysis of rhythm and syllablification by complex queries searching the two corpora?

Cyril Auran (Lille 3), Daniel Hirst (Aix), Anne Tortel (Paris 8)

2. From phonological syllable inventories to phonetic cues (and back)

A certain number of corpora have been annotated with syllable tiers in a PRAAT-like manner one the one hand. Some software like PHON offer syllable structure and inventory queries based on phonological categories on the other hand. Shall the twaim ever meet?
