ALOES 2016 pre-conference workshop

Pre-conference workshop : learner scoring and automatic assessment for spoken data ?

Thursday 31 March 2016

Bâtiment Olympe de Gouges, room 147, 1st floor
Salle 147, 1e étage

8h50 Welcome

9h10 Nicolas Ballier Introduction

9h20 Nicolas Ballier (Paris Diderot, CLILLAC-ARP) : Metrics for written and spoken data : an overview

09h40 Philippe Martin (Paris Diderot, LLF): WinPitch LTL and forced alignment for syllable division variability

10h10 Philippe Martin, Nicolas Ballier & Maelle Amand : Rhythmic variation and speaker classification

10h40 Thomas Gaillat (Paris Diderot, CLILLAC-ARP) : Applying complexity metrics as posthoc placement tests

10h50 discussion


11h15 Daniel Hirst (Aix-Marseille university, Laboratoire Parole et Langage, Aix-en-Provence) Providing visual and auditory feedback for improving L2 prosody. A demonstration with ProZed.

11h45 Sophie Herment & Anne Tortel (Aix-Marseille university, Laboratoire Parole et Langage, Aix-en-Provence) The pronunciation of unstressed initial <e> by French learners of English: perspectives for automatic assessment

It is acknowledged that learners often produce a transfer from L1 to L2 (see Gut 2009 amongst others). L1 vowels will therefore tend to be pronounced instead of the appropriate L2 vowels. This is even more problematic in the case of unstressed reduced vowels for French learners of English. Even advanced learners show difficulty: if they often display a very good realization of stressed syllables (especially vowels), they tend to fail pronouncing unstressed syllables properly, and more particularly reduced vowels. This is due to the fact that the two languages are rhythmically completely different since French syllables can hardly ever be reduced. The present study follows up on Tortel & Herment (2015), which focuses on the pronunciation of unstressed initial <e> by native English speakers and shows that a process of weak vowel centralizing and raising in initial position takes place, with a neutralization of the opposition between /ə/, /ɪ/ and /e/ in this context. Here we examine the pronunciation of the same vowel by French learners of English. The analyses are also based on the AixOx corpus (Herment et al. 2012, 2014), which is a collection of read speech. The same 40 one-minute passages are read by 10 native English speakers and 20 French learners of English divided into 2 groups (beginners and advanced). The aim is (i) to examine the realizations by the two groups of learners and (ii) to compare them to the realisations produced by the natives (from Tortel & Herment 2015). Our study underlines the importance of the vocalic reduction issue and should constitute a guideline for teaching English and especially English rhythm. The ultimate goal will be to contribute to the elaboration of an automatic evaluation of prosody and more particularly provide evaluative rhythmic criteria for the speech of French learners of English (Tortel, 2009).
14h30 Maelle Amand & Zakaria Touhami (Paris Diderot, CLILLAC-ARP) Could you say /lӕp˺tʰɒp˺ /? The acquisition of unreleased and aspirated stops by French learners of English: perspectives for automatic assessment.

Stop unrelease and stop aspiration is an important criterion in English pronunciation teaching (Rojczyk 2013) and should be the object of specific training for students majoring in English studies since French speakers rarely unrelease sentence final stops (Von Dommeln 1983) and do not aspirate plosives. This three-phase study involves 30 second year students of English. During phase 1, two groups were asked to read expressions containing medial and final voiceless stops. In phase 2, one group watched an explanatory video before reading the same expressions over again. The second group served as control group. Phase 3 checked the durability of the acquisition of stop non-release and aspiration with both groups being asked to read the same sentences one month later. The automatic extraction of VOTs and detection of bursts with Praat enabled us to rate the students’ performance. Results showed that the non-release of sentence final plosives and stop VOTs (voice onset time) were significantly closer to native speech patterns amongst the students who had watched the video. Based on these results we suggest ways to build computer-aided pronunciation tasks (Kröger & al. 2010) using Praat, MCA in R and detailed assessment grids given to students during training.

15h00 Adrien Méli (Paris Diderot, CLILLAC-ARP) Formant analysis of learner data : formant tracking vs. mid-temporal values ?


15h30 Geoffrey Stewart Morrison (Department of Linguistics, University of Alberta) logistic regression modelling for speech perception data

PRESENTER:Geoffrey Stewart Morrison




A common approach in speech perception research is to create a series of synthetic speech tokens in which the acoustic properties of the speech changes in small steps. Although these are a series of discrete points, they are usually referred to as continua. One end of a continuum may have acoustic properties typical of realisations of one phoneme, and the other end acoustic properties typical of realisations of another phoneme. Each point on the continuum is played in random order to a listener, usually all points are presented multiple times, and on each presentation the listener responds by indicating the category they perceive, i.e., which phoneme they hear. Continua can be multidimensional with different acoustic properties changing along different dimensions, e.g., first formant frequency changing on one dimension and vowel duration changing on another. Response categories can be two or more different phonemes. Different groups of listeners can come from different populations, e.g., first-language versus second-language speakers of the language to which the phoneme response categories belong. Once the results are collected, the researcher needs a means of quantifying the results in a way which allows for easy comparison of the effects of changing different acoustic properties, or which allows for easy comparison of the perception patterns of different individual listeners or groups of listeners. Logistic regression provides a solution to this problem. It makes use of all the data collected, is not overly susceptible to noise, and can be used to quantify both the location and the crispness of perceptual boundaries between different categories. This workshop provides an introduction to the use of logistic regression for speech perception research. It focusses on concepts and goes into some basic mathematical detail necessary to understand its application, but does not go into the more complex mathematics necessary to implement logistic regression from scratch – many software packages provide logistic regression tools obviating the need for the applied researcher to go into the latter. A conceptual and basic mathematical understanding of logistic regression will allow an applied researcher to make appropriate use of the tool and understand its output. Logistic regression is also widely used but often not well understood by sociolinguistics researchers, and this presentation may also be of interest to researchers from that field (researchers in sociolinguistics may know logistic regression by another name such as varbrul or goldvarb).


There are many well written introductions to logistic regression written by different authors and aimed at readers with differing levels of mathematical background. Registrants for the workshop will be provided with the following publications written by the presenter. Some are introductory, and others provide examples of application to speech perception research. Workshop participants are requested to read through at least the first two publications listed and come to the workshop with any questions that arise.

Dr Morrison is an independent forensic consultant based in Vancouver, British Columbia, Canada. He is also an Adjunct Professor at the Department of Linguistics, University of Alberta. He has previously been Scientific Counsel, Office of Legal Affairs, INTERPOL General Secretariat; Director of the Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales; a Subject Editor for the journal Speech Communication; and Chair of the Forensic Acoustics Subcommittee of the Acoustical Society of America. He has authored 50 refereed and invited academic publications, including 30 on forensic topics, and he has conducted research in collaboration with police services in Australia and in Europe. He has worked on forensic casework in Australia and in the United States, and has worked at the behest of both the prosecution and the defence. In 2015 he advised defence counsel in a US Federal Court Daubert hearing on the admissibility of a forensic voice comparison analysis proffered by the prosecution. Dr Morrison has presented workshops and tutorials on evaluation of forensic evidence at operational forensic laboratories, and at judicial and forensic science conferences around the world. He uses intuitive examples to introduce new concepts and gradually builds new knowledge on old. Workshop participants have thanked Dr Morrison for making material which they expected to be challenging easy to understand. Prior to his work in forensic science, Dr Morrison conducted speech perception research, particularly research on cross-language and second-language speech perception. He has published on the use of logistic regression as a tool in both evaluation of forensic evidence and in speech perception research.

Participants must register for access to the papers under discussion

18h00 : closing remarks

Participation to the workshop is free of charge, but participants must register by sending an email to maelle.amand@gmail.com to get access to the papers under discussion

Coming to us: Bâtiment Olympe de Gouges, rue Albert Einstein, 75013 Paris http://www.univ-paris-diderot.fr/EtudesAnglophones/pg.php?bc=CHVU&page=Venir

Contact person : Maelle Amand maelle.amand@gmail.com

