Dear NLP group

Invited speech by professor Elena Volodina, professor at Språkbanken Text; 
Dpt. Swedish, Multilingualism, Language Technology, University of Gothenburg.

Time: Tuesday, June 3, at 14-15.00
Place: Lilla Hörsalen

Title: Open access to research data and automatic pseudonymization. Two years with Mormor Karl project.

Abstract: This talk will be devoted to the challenges of working with data that contains personal information. I will describe a set of experiments with automatic pseudonymization that we have performed within Mormor Karl project. Among others, experiments with detection and labeling of personal categories using BERT models (Szawerna et al. 2024, 2025), attempts att using LLMs to "fill in the blanks" when substituting personal information with pseudonyms (yet unpublished) and a study on whether pseudonyms can provoke biased automated classifications (Muñoz Sánchez et al. 2024). 

The choice of models for our experiments is currently dictated by the sensitive nature of our data. To extend the choice from open source to proprietary models, we are currently collecting a "pseudo-corpus" with fictitious personal information  that we will be able to share freely for future research (you are welcome to contribute to the pseudo-corpus collection as well).
Finally, in this talk I will name several strategies to unify the research on automatic pseudonymization, and outline further
challenges, needs for standardization and a proposal of a shared task.

Warm welcome 

Hercules

_________________________________________________________________________
Dr. Hercules Dalianis, Professor
Department of Computer and Systems Sciences
ph:        +46 8 16 16 16        DSV/Stockholm University
mobile ph: +46 70 568 13 59   P.O. Box 7003, 164 07 Kista
email:     hercules@dsv.su.se   Stockholm, Sweden
www:       http://www.dsv.su.se/hercules/
_________________________________________________________________________