PHC Hubert Curien Ulysse 2019 (ref 43121RJ) (5 000 Euros)
Investigating criterial features of learner English and AI-driven automatic language level assessment
With the financial support of the French Ministry for Europe and Foreign Affairs (Ministères de l’Europe et des affaires étrangères, MEAE) and the French Ministry of Higher Education, Research and Innovation (Ministère de l’Enseignement supérieur, de la Recherche et de l’Innovation, MESRI).
Un des projets en SHS retenus pour le PHC Ulysses 2019 (24 projets retenus au final pour 78 soumissions). Ce projet binational implique côté français un ancien doctorant (Thomas Gaillat), une doctorante du laboratoire (Manon Bouyé) et Nicolas Ballier, coordinateur du projet pour la partie française.
This project aims to investigate criterial features in learner English and to build a proof-of-concept system for language level assessment in English. Our research focus is to identify linguistic features and to integrate them within a system based on Artificial Intelligence (AI). The purpose is to create a system to analyse learner English essay writings and map them to specific levels of the language levels of the Common European Framework of Reference for Languages (CEFRL).
Benefiting from Xiaofei Lu‘s invitation to Paris, we have experimented the tools he designed (Lexical Complexity Analyzer and L2 Syntactic Complexity Analyzer) as well as R packages such as {ZipfR}, {KoRpus} and {quanteda} for the investigation of complexity metrics for learner data.
This paper (in French, sorry) discussed the possibility of assigning levels to the ANGLISH corpus (Tortel, 2009) based on vocabulary growth curves and complexity metrics. Metrics were applied to intermediate, advanced and native spontaneous productions of the ANGLISH corpus. These metrics proved much less convincing for the precision of the classification than the timing of syllables measured by types of syllables (stressed, reduced) as evidenced by (Ballier, Martin and Amand, 2016).
Ballier, Nicolas, Thomas Gaillat. 2016. “Classification d’apprenants francophones de l’anglais sur la base des métriques de complexité lexicale et syntaxique”. JEP-TALN-RECITAL 2016, Jul 2016, Paris, France. Actes de la conférence conjointe JEP-TALN-RECITAL 2016, 9, pp.1-14, 2016, ELTAL. PDF
Ballier, Nicolas ; Martin, Philippe & Amand, Maelle. 2016. Variabilité des syllabes réalisées par des apprenants de l’anglais. Actes de la conférence conjointe JEP-TALN-RECITAL 2016, volume 1 : JEP 2016, Paris, France. pp.732-740. PDF
For Paula Lissón’s M1 MA thesis, we experimented metrics to measure progression between first year and second year students in the longitudinal corpus Diderot-Longdale (Goutéreaux 2013).
Ballier, N. & Lissón P. (2017) A corpus-based evaluation of readability metrics as indices of syntactic complexity in EFL learners’ written productions, Bolzano, LCR2017, 5-7 October 2017.
Ballier, N & Lissón, P. (2017). Estudio de la aplicabilidad de la ley de Zipf y de la ley de Heaps en los corpus de aprendientes de inglés. CILC2017, Paris, 30-31 mai 2017
We designed the dataset for the data challenge of the yearly conference of the French machine learning community CAp2018 : A conference Competition : My Tailor is rich! Predicting English level by analyzing writing styles. We explained the metrics applied to the French component of the EFCAMDAT corpus. The training and testing datasets were hosted by the owner of the corpus for copyright reasons. A paper summarizing the mutual benefits of this competition has been submitted to an international journal.
A paper, adding syntactic complexity metrics to the CAp2018 dataset, was also accepted for this French conference of Machine Learning. Arnold, T., Ballier, N, Gaillat, T. & Lissón, P., 2018 , Predicting CEFRL levels in learners of English on the basis of metrics and full texts, CAp2018 conference. Université de Rouen. 19-21 juin 2018. http://cap2018.litislab.fr/, Paper 31 in the proceedings of the conference, https://arxiv.org/abs/1806.11099.
Paula Lissón & Nicolas Ballier (2018) ‘Investigating lexical progression through lexical diversity metrics in a corpus of French L3’, Discours, 23, 2.
The main results of Paula Lissón’s M2 MA thesis, which investigated the complexity variable(s) for French learner choices between that and zero in restrictive relative clauses in English, was presented at LCR2019. Lissón,P. Ballier, N. & Gerdes, K. (2019) On relativizer use in learner English: a corpus-based study, LCR2019, Warsaw, 12-14 September 2019.
Galway meeting (May 7-10)
Paris meeting (Oct 27-31)
Directrice : Pr Natalie Kübler
Centre de Linguistique Inter-langues,
de Lexicologie, de Linguistique Anglaise
et de Corpus-Atelier de Recherche sur la Parole
EA 3967
8 place Paul Ricœur
75013 Paris
Case courrier 7002
5 rue Thomas Mann
75205 Paris cedex 13