Outils pour utilisateurs

Outils du site


colloques:morpho_phon_withr_jan2015

WORKSHOP : Morpho-phonology with R

Acknowledgement:

This workshop was made possible by the European Science Foundation via the NetWordS network (supporting grant received by NetWordS - 09-RNP-089 to Jesús Fernández-Domínguez.

Programme:

This three-day workshop is an introduction to data-mining of lexical databases.
Day 1 focusses on queries on simple, nicely structured files with R.
Day 2 focusses on perl scripts for queries on more complex databases.
Day 3 focusses on data visualisation, using R (ggplot2) and Excel / R tools.

This is meant as a hands on session, a workshop in a computer pool (Room 208, Olympe de Gouges). Participants are limited to 18, unless they bring their own laptops. Participants are expected to have installled the relevant R packages and software on their machines.

Useful backgound reading and preparation

  • the LEXIQUE database
  • the CELEX database
  • EDP12
  • the CMU Dictionary
  • the Buscapalabras
  • R :

Tuesday 13 January : R and morpho-phonological databases (CSV format) : (NB & Véronique Pouillon)

Room 208 (computer pools) Bât. Olympe de Gouges
8 rue Albert Einstein

(Data mining with R for simple files (CSV, such as Busca Palabaras, data in Spanish)

0900 Nicolas Ballier Introduction
0930 Vincent Renner, Jesus Fernandez & Nicolas Ballier : blends in English, French and Spanish : the blonset conjecture
0945 NB : Rstudio and R packages
1000 Véronique Pouillon : input, output files in R
1045 coffee break
1105 Initial simple queries
1230 lunch break

1400 More complex queries
1600 coffee break
1630 Even more complex queries
1730 Some limits or caveats for R as a concordancer
1800 end

Wed 14 January : perl scripts for queries in morpho-phonologial dictionaries (any format) (Véronique Pouillon & NB)

Room 208 (computer pools)

0945 NB : Perl for linguists
1000 Véronique Pouillon : Perl basic queries
1045 coffee break
1105 Initial simple queries
1230 lunch break

1400 More complex queries
1600 coffee break
1630 Even more complex queries
1730 Some limits or caveats for R vs. perl
1800 end

15 Jan : R & Excel, and data visualisation : Laura Goudet & Nicolas Ballier

Room 208 (computer pools) : 9h30 ⇒ 12h , 14h ⇒ 17h
Bât. Olympe de Gouges
8 rue Albert Einstein

This session will explain the logic of packages and demo some of the functions of the zipfR package.
This session will demo data visualisation with R and easy transitions from Excel to R, RExcel.
Linguistic datasets will be used for illustration purposes. The morning session will focuss on words, wordles, bar charts, spider charts, and possible applications of the Levenshtein distance. The afternoon session will demo more complex data visualisation, boxplots and other plots for phonetics, using {ggplot2} package.

Friday 16 Jan AM Spoken Learner Corpora : work in progress

room 163
Bât. Olympe de Gouges
8 rue Albert Einstein
75013 Paris

Discussants : Alain Diana, Nicolas Ballier

0900 Nicolas Ballier : introduction
0915 - 945 Adrien Méli : Vowels in the Longdale corpus: a longitudinal approach
0945 - 1015 Thomas Gaillat :Using R for this and that in the Longdale corpus
1015 -1045 coffee break

1045- 1130 Nicolas Ballier : Metrics in learner corpora (Longdale, ANGLISH, AixOx) , a roadmap
1130 -1200 Final discussion and next steps

Explorations

colloques/morpho_phon_withr_jan2015.txt · Dernière modification: 2015/01/13 04:06 (modification externe)

Outils de la page