A one day workshop at Université de Paris, 30 Oct 2019
organised by Manon Bouyé and Nicolas Ballier
8 Place Paul Ricœur, 75013 Paris
Olympe de Gouges Building, room 115, first floor
The building is number 8 on this map.
With the financial support of the French Ministry for Europe and Foreign Affairs (Ministères de l’Europe et des affaires étrangères, MEAE) and the French Ministry of Higher Education, Research and Innovation (Ministère de l’Enseignement supérieur, de la Recherche et de l’Innovation, MESRI).
9 00 opening N. Ballier The Ulysse PHC project : aims, data and limitations
9h20 Thomas Gaillat investigating learner micro-systems and customizing CEFR criterial features : the micro-system feature set and its regex syntax . PDF
9h40 discussion
10h30 Bernardo Stearns and Annanda Sousa : the user interface prototype demo
We hope to deliver a docker and a github version of our user interface that allows you to paste a text, have a coffee while the text is processed and then get the probability of the text of being of a given CEFR level. PDF
10h45 Discussion
11h15 Bernardo Stearns for Andrew Simpkins : Classifying learner level
PDF
Overfitting ? comparison with a graded corpus
As a preliminary step, we have tested our current User Interface with
the CEFR ASAG corpus to check whether our model is biased to the A1 level.
11h30 General discussion
Posters displayed at Diderot and on a shared google drive for distant participants.
Thomas Gaillat et al. (Rennes) : Vizualisations of linguistic profiles in learner written productions
Volodina, Elena (Gothenburg) Overview over text-based CEFR research for L2 Swedish: on the intersection between NLP, L2 corpora and CALL
Arnold et al. poster (paper presented at the Cap2018 conference). A paper, adding syntactic complexity metrics to the CAp2018 dataset, was also accepted for this French conference of Machine Learning. Arnold, T., Ballier, N, Gaillat, T. & Lissón, P., 2018 , Predicting CEFRL levels in learners of English on the basis of metrics and full texts, CAp2018 conference. Université de Rouen. 19-21 juin 2018. http://cap2018.litislab.fr/, Paper 31 in the proceedings of the conference, https://arxiv.org/abs/1806.11099.
A blueprint was circulated pointing out potential future directions.
14h STRAND 1 Adding more metrics/NLP-based methods for error detection / problematic areas for learners
15h STRAND 2 Exploring the relation between Learner corpus annotation, language testing, and individual feedback to learners
16h30 coffee break
17h STRAND3 Should we try to link learner corpus and learning analytics research - and what is there to be gained? Ideas for Tracking Development path ? (Fuchs, Götz & Werner 2016) How to develop learner profiles based on student input?
1815 closing remarks and future plans
1830 end of the workshop
As a closing event of a European-funded project, we invite colleagues to share their ideas about the automatic analysis of learner corpora and how they can be applied towards interlanguage analysis, CEFR level prediction, and error detection - and extended to support individual feedback to learners and learning analytics.
The morning session will present some of the results of this French-Irish project “PHC Ulysse 2019”: the features of the EFCAMDAT corpus we used as the first step for our experiments, the methodology we developed, and our main findings. We will present our prototype of user interface for automatic detection of CEFR levels and discuss aspects such as overfitting of a model based on the French and Spanish components of EFCAMDAT. We will also discuss the shared task we held on a portion of this
We will discuss posters over coffee breaks recapitulating some of the issues.
Admission is free but registration is compulsory (on a first come, first served basis) on this webpage: https://framaforms.org/20191030-workshop-beyond-cefr-level-prediction-of-texts-in-learner-corpora-1570435104
The summary of the Ulysse PHC Project can be found here : http://www.clillac-arp.univ-paris-diderot.fr/projets/ulysse2019
Discussants at Diderot :
Taylor Arnold (University of Richmond, https://math.richmond.edu/faculty/tarnold2) is Assistant Professor of Statistics at the University of Richmond and has a strong interest in NLP as a data scientist and digital humanist, see https://arxiv.org/abs/1806.11099
Detmar Meurers (University of Tübingen, http://purl.org/dm) is Professor of Computational Linguistics and head of the research group on Intelligent Computer-Assisted Language Learning there: http://icall-research.de
Discussants (videoconference):
Mick O’Donnell , Universidad Autónoma de Madrid, Departamento de Filología Española
https://uam.academia.edu/MickODonnell
see the WricLE corpus, the TREACLE Project
http://www.treacle.es/publications.html
and the Adaptive Learning of English Grammar Online
http://alegro.org.es/
Elena Volodina (Gothenburg)
https://spraakbanken.gu.se/personal/elena
See the SweLL project - research infrastructure for Swedish as a second language
https://spraakbanken.gu.se/eng/swell_infra
Olga Vinogradova (Moscow, National Research University Higher School of Economics)
See the Realec project (Russian Error-Annotated Learner English Corpus)
http://web-corpora.net/realec/
See the 59 features : the link and short description attached: https://docs.google.com/spreadsheets/d/1aoJSFVmcqA1QboB-ErQYHXfmLHyMNnuLaGnDmjq92uY/edit?usp=sharing
Contact person:
Nicolas Ballier : nicolas.ballier@univ-paris-diderot.fr
Directrice : Pr Natalie Kübler
Centre de Linguistique Inter-langues,
de Lexicologie, de Linguistique Anglaise
et de Corpus-Atelier de Recherche sur la Parole
EA 3967
8 place Paul Ricœur
75013 Paris
Case courrier 7002
5 rue Thomas Mann
75205 Paris cedex 13