Outils pour utilisateurs

Outils du site


Taylor Arnold

Invited Prof at Diderot (LARCA/CLILLAC-ARP joint initiative)
professional webpage
Adresse électronique/ email: taylor.arnold@acm.org
  • April 24-27: preparation and rehearsals : Testing mock bootcamp session on April 28th AM R configuration in computer pools
  • April 28 : 14h-16h: Texts and Space (ODG, room M19, 1st floor) An informal exchange with geographers from the UMS RIATE and the CIST Research group

Taylor is to present Geospatial Analysis and Text mining as analysed in his R textbook (45 minutes). Geospatial Data and Text mining within the UMS RIATE and the CIST research group (45 minutes) The dicussion will offer a comparison with other R packages developed for cartography and current ongoing projects Discussant for spatial data : Claude Grasland (CIST & Géographie-cités) Discussant for text mining : Nicolas Ballier

  • May 2nd: From corpus linguistic linguistics to Data Mining room 720 14-17h A joint presentation by Prof N. Ballier presenting bilingual corpora and concordancers with Taylor Arnold introducing text mining techniques. This step-by-step presentation of text mining techniques is intended as a teaser for more intensive sessions. Discussant : Natalie Kübler (tbc), Nicolas Ballier
  • May 5 : 14-17h: Data, metadata and paradata in R-based corpus analysis room 720 14-17h An open discussion of text mining techniques with linguists specialied in NLP, textometry, corpus linguistics. Bilingual corpora ; translation and neural networks and machine-based corpus traductology.

Discussants : Maria Zimina, Antonio Balvet, Nicolas Ballier

A 45 minute talk presenting the project, + a short teaser for the book Humanities Data in R

Between 1935 and 1945, the U.S. Federal Government employed a group of photographers to help build support for New Deal programs and U.S. entry into World War II. The group created a corpus of over 170 thousand documentary photographs showing daily life from all of the then 48 states. The collection, known as the Farm Security Administration-Office of War Information (FSA-OWI) archive for the two government agencies that housed the photographers, is now a common source of historical evidence for scholars of 20th century America. It has been digitized by the Library of Congress; as a work of the federal government, these digitized images are in the public domain. Photogrammar, which I helped found and currently serve as the co-director of, is a project for analyzing and visualizing the FSA-OWI collection. The web-based portion of the project (photogrammar.yale.edu) visualizes the photographs over historical maps, by historical classification schemes, and through an analysis of color composition. Computational techniques, predominantly from computer vision, have been used to infer and reconstruct metadata that is also displayed on the site. Details of some of these techniques are described in the DHQ article “Uncovering Latent Metadata in the FSA-OWI Photographic Archive”.

Photogrammar is supported by grants from the National Endowment for the Humanities and American Council for Learned Societies, and has been very well-received since the website’s launch in 2014. We have been invited to present our work at over a dozen institutions including the Museum of Modern Art and the Smithsonian’s Archives of American Art. The site has attracted nearly 1 million unique visitors and over 6 million page views over the past 12 months. It has received coverage from various media outlets, including the BBC, Atlantic, Slate, Le Monde, and NPR.

Discussant : François Brunet (LARCA)

  • 17 May LARCA seminar : Visualising Cultural Data , 14-16h, room TBA.

Summary: This talk will highlight the theoretical and practical applications of data visualization to the study of cultural data. We start by describing a formal structure for data visualization and the data science process more generally. Intersections will be given between these methods and specific formalisms in humanistic fields including theories of the archive and knowledge production. The second part of the talk focuses on specific applications of exploratory data visualization to study the movement of people in New York City. We see how various visual techniques serve to both confirm some “common sense” conclusions while simultaneously challenging other widely-held notions. The talk will finish by extending these techniques to the more complex data format of networks. Once again, data visualization will provide a powerful tool for extracting and displaying new forms of knowledge.

Biographical sketch: Taylor Arnold is Assistant Professor of Statistics at the University of Richmond. Prior appointments include Lecturer of Statistics at Yale University and Senior Scientist at AT&T Labs Research. His work centres on the computational and computing challenges of doing data analysis on large scale datasets with a focus on text and image processing. Arnold’s text Humanities Data in R (Springer 2015) addresses these challenges in the context of humanities applications, a main area of application for his work. He holds grants supporting related work from the National Endowment of the Humanities (NEH) and the American Council of Learned Societies (ACLS). A forthcoming text, A Computational Approach to Statistical Learning (CRC Press 2018), further explores the technical issues of applying these techniques at scale.

  • 19h of May: 30 min. talk ‘Humanities Data in R’ LARCA PhD days

A 30 minute plenary talk for PhD students in literature and civilisation: a teaser for the more intensive R sessions to take place on May 22-23-24. This talk is part of a two day LARCA event for PhD students (18-19 May).

  • May-May 22-23-24 a three day gentle initiation to R and Humanities Data in R Room 237 (computer pools) following some of the chapters of the book and its companion website : http://humanitiesdata.org/

Please register on-line https://beta.doodle.com/poll/nrfkqq2pq8em5qna#table

user/nicolas_ballier/taylor_arnold_2017.txt · Dernière modification: 2021/11/30 02:57 (modification externe)

Outils de la page