Beyond CEFR level prediction of texts in learner corpora: Exploring feedback to learners and learning analytics

A one day workshop at Université de Paris, 30 Oct 2019
organised by Manon Bouyé and Nicolas Ballier
8 Place Paul Ricœur, 75013 Paris
Olympe de Gouges Building, room 115, first floor

The building is number 8 on this map.


MORNING: discussing our results

With the financial support of the French Ministry for Europe and Foreign Affairs (Ministères de l’Europe et des affaires étrangères, MEAE) and the French Ministry of Higher Education, Research and Innovation (Ministère de l’Enseignement supérieur, de la Recherche et de l’Innovation, MESRI).

9 00 opening N. Ballier The Ulysse PHC project : aims, data and limitations

9h20 Thomas Gaillat investigating learner micro-systems and customizing CEFR criterial features : the micro-system feature set and its regex syntax . PDF

9h40 discussion

10h30 Bernardo Stearns and Annanda Sousa : the user interface prototype demo
We hope to deliver a docker and a github version of our user interface that allows you to paste a text, have a coffee while the text is processed and then get the probability of the text of being of a given CEFR level. PDF

10h45 Discussion

11h15 Bernardo Stearns for Andrew Simpkins : Classifying learner level
Overfitting ? comparison with a graded corpus
As a preliminary step, we have tested our current User Interface with the CEFR ASAG corpus to check whether our model is biased to the A1 level.

11h30 General discussion

12 15 LUNCH BREAK (poster session at Diderot)

Posters displayed at Diderot and on a shared google drive for distant participants.
Thomas Gaillat et al. (Rennes) : Vizualisations of linguistic profiles in learner written productions
Volodina, Elena (Gothenburg) Overview over text-based CEFR research for L2 Swedish: on the intersection between NLP, L2 corpora and CALL

Arnold et al. poster (paper presented at the Cap2018 conference). A paper, adding syntactic complexity metrics to the CAp2018 dataset, was also accepted for this French conference of Machine Learning. Arnold, T., Ballier, N, Gaillat, T. & Lissón, P., 2018 , Predicting CEFRL levels in learners of English on the basis of metrics and full texts, CAp2018 conference. Université de Rouen. 19-21 juin 2018. http://cap2018.litislab.fr/, Paper 31 in the proceedings of the conference, https://arxiv.org/abs/1806.11099.

AFTERNOON: Learner corpora and beyond: collecting and interpreting learning process and product data

A blueprint was circulated pointing out potential future directions.

14h STRAND 1 Adding more metrics/NLP-based methods for error detection / problematic areas for learners

15h STRAND 2 Exploring the relation between Learner corpus annotation, language testing, and individual feedback to learners

16h30 coffee break

17h STRAND3 Should we try to link learner corpus and learning analytics research - and what is there to be gained? Ideas for Tracking Development path ? (Fuchs, Götz & Werner 2016) How to develop learner profiles based on student input?

1815 closing remarks and future plans

1830 end of the workshop

Call for participation

As a closing event of a European-funded project, we invite colleagues to share their ideas about the automatic analysis of learner corpora and how they can be applied towards interlanguage analysis, CEFR level prediction, and error detection - and extended to support individual feedback to learners and learning analytics.

The morning session will present some of the results of this French-Irish project “PHC Ulysse 2019”: the features of the EFCAMDAT corpus we used as the first step for our experiments, the methodology we developed, and our main findings. We will present our prototype of user interface for automatic detection of CEFR levels and discuss aspects such as overfitting of a model based on the French and Spanish components of EFCAMDAT. We will also discuss the shared task we held on a portion of this

We will discuss posters over coffee breaks recapitulating some of the issues.

Admission is free but registration is compulsory (on a first come, first served basis) on this webpage: https://framaforms.org/20191030-workshop-beyond-cefr-level-prediction-of-texts-in-learner-corpora-1570435104

The summary of the Ulysse PHC Project can be found here : http://www.clillac-arp.univ-paris-diderot.fr/projets/ulysse2019


Discussants at Diderot :

Taylor Arnold (University of Richmond, https://math.richmond.edu/faculty/tarnold2) is Assistant Professor of Statistics at the University of Richmond and has a strong interest in NLP as a data scientist and digital humanist, see https://arxiv.org/abs/1806.11099

Detmar Meurers (University of Tübingen, http://purl.org/dm) is Professor of Computational Linguistics and head of the research group on Intelligent Computer-Assisted Language Learning there: http://icall-research.de

Discussants (videoconference):

Mick O’Donnell , Universidad Autónoma de Madrid, Departamento de Filología Española
see the WricLE corpus, the TREACLE Project http://www.treacle.es/publications.html and the Adaptive Learning of English Grammar Online http://alegro.org.es/

Elena Volodina (Gothenburg)
See the SweLL project - research infrastructure for Swedish as a second language https://spraakbanken.gu.se/eng/swell_infra

Olga Vinogradova (Moscow, National Research University Higher School of Economics)
See the Realec project (Russian Error-Annotated Learner English Corpus) http://web-corpora.net/realec/

See the 59 features : the link and short description attached: https://docs.google.com/spreadsheets/d/1aoJSFVmcqA1QboB-ErQYHXfmLHyMNnuLaGnDmjq92uY/edit?usp=sharing

Contact person:
Nicolas Ballier : nicolas.ballier@univ-paris-diderot.fr

