June 2025 - NLP - dsv.su.se

NLP group meeting, June 12 at 12-13 in M20

by Aron Henriksson

Dear NLPers and friends, In our next NLP group meeting on June 12, at 12-13 in M20, there will be an invited talk by Maria Skeppstedt and Magnus Ahltorp. The agenda for the NLP meeting is: 1. Round-of-table news (accepted papers, conference trips, grant applications etc.) 2. Invited talk by Maria Skeppstedt (Research Engineer, Uppsala University) and Magnus Ahltorp (Computational Linguist, Nakajima Koen Research Institute) Multilingual Text Visualisation with Word Rain and Topic Timelines =================================================== The text visualisation methods Word Rain and Topic Timelines aim to provide the user with an overview of large text collections. We will describe the theory behind the methods, and why we developed them. We will also provide examples of how they are used for visualising and exploring corpora belonging to different genres, e.g. parliamentary corpora, patient organisation periodicals and texts about climate change. =================================================== We also plan to arrange some kind of social event in the evening - more details to come! Bring your lunch and we will arrange some “fika”. Welcome! Aron & Hercules —— Aron Henriksson Associate Professor (Docent) Department of Computer and Systems Sciences (DSV) Stockholm University P.O. Box 1073, SE-164 25 Kista, Sweden Visiting address: Borgarfjordsgatan 12, Kista Phone: +46-8-164985

22 hours, 36 minutes

1
0
0 0

Kobinian half-time seminar

by Tony Lindgren

Welcome to the half time seminar of Korbinian Randl with the title "Towards Trustworthy Classification with Large Language Models". External reviewer: Professor Steffen Egger, University of Technology Nuremberg (UTN), Germany. Internal reviewer: Dr. Peter Idestam-Almquist, DSV, Stockholm University Main supervisor: Associate professor Tony Lindgren, DSV, Stockholm University Supervisors: Associate professor Aron Henriksson, DSV, Stockholm University Assistant professor John Pavlopoulos, Department of Informatics, Athens University of Economics and Business, Greece Time: Tuesday 10th of June, 13:00-16:00 (CET) Place: L30, NOD-huset, DSV/Stockholms universitet, Borgarfjordsgatan 8, Kista. Zoom: https://stockholmuniversity.zoom.us/j/65276614265?from=addon Abstract Large Language Models (LLMs) have become central to modern digital life, underpinning applications such as conversational AI, content generation, and software debugging. At the core of these systems lie transformer-based architectures, which excel at modeling context and semantics. This makes them strong candidates for the future of text classification. However, despite their capabilities, LLMs remain largely opaque "black boxes" with limited explainability. They are also prone to hallucinations - the generation of plausible-sounding but factually incorrect outputs - particularly when faced with input scenarios not encountered during training (Perkovi´c et al., 2024; Reddy et al., 2024). This unreliability poses serious challenges in safety-critical domains such as healthcare and food regulation. This dissertation half-time report addresses the challenge of untrustworthy LLM behavior in classification settings by pursuing four core objectives: (a) First, it focuses on the development and refinement of local explainability methods that can shed light on individual LLM decisions and help make their behavior more interpretable. Such systems could, for example, uncover that an LLM relies on some spurious correlation in the data rather than the actual, causally linked, information for its classification. This objective is addressed in PAPER II and PAPER IV which specifically evaluate the usefulness of LLM generated self-explanations and find that counterfactual self-explanations can be a fast, valid, and plausible candidate. (b) Second, it evaluates these methods not only from the perspective of end-user understanding but also as diagnostic tools to identify flaws in data, model training, or architecture-thereby enabling trustworthiness-by-design. For example, knowing that an LLM relies on spurious correlations for its classification, one can curate the data used for fine-tuning and eliminate such correlations. While the improvement of the model will need to be addressed in the second half of the PhD, the thesis explores using explanations for diagnostic purposes in PAPER V. (c) Third, the work shifts away from post-hoc explanations toward inherently interpretable prompting backends that guide LLM behavior during classification. As retraining an entire LLM is infeasible for most ML practitioners due to the enormous data and hardware requirements, such backends are a more accessible and actionable part of prompting pipelines. Furthermore, techniques like Conformal Prediction (Vovk et al., 2005, CP) or Retrieval Augmented Generation (Lewis et al., 2020, RAG) can be used to guide LLM content generation and reduce hallucinations. PAPER I started exploring such backend methods based on CP, but future work will also target the application of RAG for such purposes. (d) Finally, the research supports its empirical contributions through the curation of a publicly available, privacy-compliant dataset that enables reproducible experimentation (PAPER I, PAPER III). Together, these objectives contribute toward safer, more transparent, and more trustworthy LLM deployment in sensitive contexts. Best regards, Tony Lindgren Ph. D., Docent, Head of the Systems Analysis and Security Unit Department of Computer and Systems Sciences Stockholm University Postbox 7003, 164 07 Kista, Sweden Visiting address: Borgarfjordsgatan 12, Kista Phone: +46-8-16 17 01, Mobile: +46-70-190 68 28 http://dsv.su.se<http://dsv.su.se/> Tony Lindgren is inviting you to a scheduled Zoom meeting. Join Zoom Meeting https://stockholmuniversity.zoom.us/j/65276614265?from=addon Meeting ID: 652 7661 4265 --- One tap mobile +46850163827,,65276614265# Sweden +46850500828,,65276614265# Sweden --- Dial by your location * +46 8 5016 3827 Sweden * +46 8 5050 0828 Sweden * +46 8 5050 0829 Sweden * +46 8 5052 0017 Sweden * +46 850 539 728 Sweden * +46 8 4468 2488 Sweden Meeting ID: 652 7661 4265 Find your local number: https://stockholmuniversity.zoom.us/u/cebEmSJCeE --- Join by SIP 65276614265(a)109.105.112.236<mailto:65276614265@109.105.112.236> * 65276614265(a)109.105.112.235<mailto:65276614265@109.105.112.235> --- Join by H.323 * 109.105.112.236 * 109.105.112.235 Meeting ID: 652 7661 4265

3 days, 4 hours

1
0
0 0

Predoc Seminar Thomas Vakili Wednesday June 4, 2025 13.00-15.00 in room L30

by Hercules Dalianis

Dear NLP group Predoc Seminar Thomas Vakili Wednesday June 4, 2025 13.00-15.00 in room L30 with the title "Preserving the Privacy of Clinical Language Models" External reviewer: Elena Volodina, Gothenburg University Main supervisor: Hercules Dalianis, DSV Supervisor: Aron Henriksson, DSV Professor closest to the subject: Panagiotis Papapetrou, DSV Read Abstract https://internt.dsv.su.se/sv/node/1833 Warm welcome Hercules _________________________________________________________________________ Dr. Hercules Dalianis, Professor Department of Computer and Systems Sciences ph: +46 8 16 16 16 DSV/Stockholm University mobile ph: +46 70 568 13 59 P.O. Box 7003, 164 07 Kista email: hercules(a)dsv.su.se Stockholm, Sweden www: http://www.dsv.su.se/hercules/ _________________________________________________________________________

1 week, 1 day

1
0
0 0

2025

2024

NLP June 2025 ----- 2025 ----- June 2025 May 2025 April 2025 March 2025 February 2025 January 2025 ----- 2024 ----- December 2024 November 2024 October 2024 September 2024 August 2024 July 2024 June 2024

NLP June 2025