Hi all,
Here is a reminder about Korbinian’s half-time seminar tomorrow at 1pm in L30 with the
title: "Towards Trustworthy Classification with Large Language Models”.
Best,
Aron
Begin forwarded message:
From: Tony Lindgren <tony(a)dsv.su.se>
Subject: Kobinian half-time seminar
Date: 26 May 2025 at 15:55:47 CEST
To: Peter Idestam-Almquist <pi(a)dsv.su.se>se>, Steffen Eger <steffen.eger(a)utn.de>de>,
John Pavlopoulos <annis(a)aueb.gr>gr>, Ioannis Pavlopoulos <ioannis(a)dsv.su.se>se>,
Aron Henriksson <aronhen(a)dsv.su.se>
Cc: "datascience(a)dsv.su.se" <datascience(a)dsv.su.se>se>,
"nlp(a)dsv.su.se" <nlp(a)dsv.su.se>
Welcome to the half time seminar of Korbinian Randl with the title "Towards
Trustworthy Classification with Large Language Models”.
External reviewer: Professor Steffen Egger, University of Technology Nuremberg (UTN),
Germany.
Internal reviewer: Dr. Peter Idestam-Almquist, DSV, Stockholm University
Main supervisor: Associate professor Tony Lindgren, DSV, Stockholm University
Supervisors:
Associate professor Aron Henriksson, DSV, Stockholm University
Assistant professor John Pavlopoulos, Department of Informatics, Athens University of
Economics and Business, Greece
Time: Tuesday 10th of June, 13:00-16:00 (CET)
Place: L30, NOD-huset, DSV/Stockholms universitet, Borgarfjordsgatan 8, Kista.
Zoom:
https://stockholmuniversity.zoom.us/j/65276614265?from=addon
Abstract
Large Language Models (LLMs) have become central to modern digital life, underpinning
applications such as conversational AI, content generation, and software debugging. At the
core of these systems lie transformer-based architectures, which excel at modeling context
and semantics. This makes them strong candidates for the future of text classification.
However, despite their capabilities, LLMs remain largely opaque “black boxes” with limited
explainability. They are also prone to hallucinations – the generation of
plausible-sounding but factually incorrect outputs – particularly when faced with input
scenarios not encountered during training (Perkovi´c et al., 2024; Reddy et al., 2024).
This unreliability poses serious challenges in safety-critical domains such as healthcare
and food regulation.
This dissertation half-time report addresses the challenge of untrustworthy LLM behavior
in classification settings by pursuing four core objectives: (a) First, it focuses on the
development and refinement of local explainability methods that can shed light on
individual LLM decisions and help make their behavior more interpretable. Such systems
could, for example, uncover that an LLM relies on some spurious correlation in the data
rather than the actual, causally linked, information for its classification. This
objective is addressed in PAPER II and PAPER IV which specifically evaluate the usefulness
of LLM generated self-explanations and find that counterfactual self-explanations can be a
fast, valid, and plausible candidate. (b) Second, it evaluates these methods not only from
the perspective of end-user understanding but also as diagnostic tools to identify flaws
in data, model training, or architecture—thereby enabling trustworthiness-by-design. For
example, knowing that an LLM relies on spurious correlations for its classification, one
can curate the data used for fine-tuning and eliminate such correlations.
While the improvement of the model will need to be addressed in the second half of the
PhD, the thesis explores using explanations for diagnostic purposes in PAPER V. (c) Third,
the work shifts away from post-hoc explanations toward inherently interpretable prompting
backends that guide LLM behavior during classification. As retraining an entire LLM is
infeasible for most ML practitioners due to the enormous data and hardware requirements,
such backends
are a more accessible and actionable part of prompting pipelines. Furthermore, techniques
like Conformal Prediction (Vovk et al., 2005, CP) or Retrieval Augmented Generation (Lewis
et al., 2020, RAG) can be used to guide LLM content generation and reduce hallucinations.
PAPER I started exploring such backend methods based on CP, but future work will also
target the application of RAG for such purposes. (d) Finally, the research supports its
empirical contributions
through the curation of a publicly available, privacy-compliant dataset that enables
reproducible experimentation (PAPER I, PAPER III). Together, these objectives contribute
toward safer, more transparent, and more trustworthy LLM deployment in sensitive contexts.
Best regards,
Tony Lindgren
Ph. D., Docent, Head of the Systems Analysis and Security Unit
Department of Computer and Systems Sciences
Stockholm University
Postbox 7003, 164 07 Kista, Sweden
Visiting address: Borgarfjordsgatan 12, Kista
Phone: +46-8-16 17 01,
Mobile: +46-70-190 68 28
http://dsv.su.se<http://dsv.su.se/>
Tony Lindgren is inviting you to a scheduled Zoom meeting.
Join Zoom Meeting
https://stockholmuniversity.zoom.us/j/65276614265?from=addon
Meeting ID: 652 7661 4265
---
One tap mobile
+46850163827,,65276614265# Sweden
+46850500828,,65276614265# Sweden
---
Dial by your location
• +46 8 5016 3827 Sweden
• +46 8 5050 0828 Sweden
• +46 8 5050 0829 Sweden
• +46 8 5052 0017 Sweden
• +46 850 539 728 Sweden
• +46 8 4468 2488 Sweden
Meeting ID: 652 7661 4265
Find your local number:
https://stockholmuniversity.zoom.us/u/cebEmSJCeE
---
Join by SIP
65276614265@109.105.112.236<mailto:65276614265@109.105.112.236>
• 65276614265@109.105.112.235<mailto:65276614265@109.105.112.235>
---
Join by H.323
• 109.105.112.236
• 109.105.112.235
Meeting ID: 652 7661 4265