ELLIS UniReps Speaker Series

We’re excited to launch a Speaker Series in collaboration with European Laboratory for Learning and Intelligent Systems (ELLIS) community, focusing on key topics relevant to our community.

When, How, and Why Do Neural Models Learn Similar Representations? The ELLIS UniReps Speaker Series explores the phenomenon of representational alignment, where different neural models—both biological and artificial—develop similar internal representations when exposed to comparable stimuli. This raises key theoretical and practical questions:

  • When do similar representations emerge across models?
  • Why does this alignment occur, and what underlying principles drive it?
  • How can we leverage this alignment to explore applications such as model merging, model re-use, and fine-tuning?

Each monthly session features two talks:

  • 🔵 Keynote talk – A broad overview by a senior researcher, providing context on a key topic.
  • 🔴 Flash talk – A focused presentation by an early-career researcher (such as a PhD student or postdoc), highlighting recent findings or ongoing work.

You can nominate yourself or another researcher as a speaker by filling out our nomination form .

The series provides a platform for early-career researchers to share their work and fosters interdisciplinary discussions across deep learning, neuroscience, cognitive science, and mathematics.

Join the speaker series Google group here. In addition, you can follow the last updates on our Twitter and BlueSky profiles!

Below you can find the calendar for next scheduled appointments:

Calendar

July 8th Appointment

  • 🗓️ When: 8th July 2025 – 16:00 CET
  • 📍 Where: Zoom link
  • 🎙️ Keynote: Victor Veitch (Google DeepMind/University of Chicago)
    • Title: TBD
    • Abstract: TBD
  • 🎙️ Flash Talk: Luigi Gresele (University of Copenhagen)
    • Title: All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
    • Abstract: We analyze identifiability as a possible explanation for the ubiquity of linear properties across language models, such as the vector difference between the representations of “easy” and “easiest” being parallel to that between “lucky” and “luckiest”. For this, we ask whether finding a linear property in one model implies that any model that induces the same distribution has that property, too. To answer that, we first prove an identifiability result to characterize distribution-equivalent next-token predictors, lifting a diversity requirement of previous results. Second, based on a refinement of relational linearity [Paccanaro and Hinton, 2001; Hernandez et al., 2024], we show how many notions of linearity are amenable to our analysis. Finally, we show that under suitable conditions, these linear properties either hold in all or none distribution-equivalent next-token predictors. This talk is based on joint work with Emanuele Marconato, Sébastien Lachapelle and Sebastian Weichwald.

June Appointment

  • 🗓️ When: 20th June 2025 – 16:00 CET
  • 📍 Where: Zoom link
  • 📹 Meeting recording
  • 🎙️ Keynote: Mariya Toneva (Max Planck Institute for Software Systems)
    • Title: Aligning Language Models to the Human Brain
    • Abstract: In this talk, I will introduce brain-tuning, a method that aligns language models to the human brain by fine-tuning language models with brain data recorded while individuals listen to natural speech. Despite using fMRI data that corresponds to less than 1% of the models’ pretraining data, brain-tuning 1) improves alignment with semantic brain regions, 2) reduces reliance on low-level features for this alignment, and 3) excitingly, substantially improves performance on semantic downstream tasks. Together, this method and findings strengthen the utility of speech language models as model organisms of language in the brain, and provide new opportunities for cross-pollination between cognitive neuroscience and AI.
  • 🎙️ Flash Talk: Lenka Tětková (Technical university of Denmark)
    • Title: On convex decision regions in deep network representations
    • Abstract: How aligned are machine representations with the way humans understand concepts? In this talk, I’ll explore this question through the lens of convexity in machine-learned latent spaces—a property long studied in cognitive science for its role in generalization, few-shot learning, and communication. Inspired by Gärdenfors’ theory of conceptual spaces, we develop new tools to measure convexity in real-world model representations and apply them across layers of state-of-the-art deep networks. We find that many concept regions — across domains like vision, language, audio, and even medical data — are approximately convex. What’s more, convexity tends to increase with fine-tuning and can even predict fine-tuning performance in pretrained models. These results suggest that convexity is a meaningful, robust property of learned representations, with implications for improving generalization and understanding human-machine alignment.

May Appointment

  • 🗓️ When: 29th May 2025 – 16:30 CET
  • 📍 Where: Zoom link
  • 📹 Meeting recording
  • 🎙️ Keynote: Andrew Lampinen (Google Deepmind)
    • Title: Representation Biases: when aligned representations do not imply aligned computations
    • Abstract: We often study a system’s representations to learn about its computations, or intervene on its representations to try to fix it. However, the relationship between representation and computation is not always straightforward. In this talk, I will discuss a recent paper (https://openreview.net/forum?id=aY2nsgE97a) in which we study this relationship in controlled settings. We find that feature representations are substantially biased towards certain types of features (linear over nonlinear, prevalent over less prevalent), even when the features play an equivalent computational role in the model’s outputs. These phenomena hold across a wide range of models and tasks. I will discuss implications of these feature biases for downstream analyses like regression and RSA, and their relation to our recent finding that simplifying models for analysis may not generalize well out of distribution (https://openreview.net/forum?id=YJWlUMW6YP). These results raise important questions over how to interpret and use representational analysis tools.
  • 🎙️ Flash Talk: Jack Lindsey (Anthropic)
    • Title: On the Biology of a Large Language Model
    • Abstract: In this talk, I’ll describe a new method for revealing mechanisms in language models. First, we train a “replacement model” that substitutes the model’s neurons with sparsely active “features” which are easier to interpret. Then, for a given model input/output, we summarize the intermediate computational steps taken by the model with an interactive attribution graph, which depicts causal interactions between features. We apply attribution graphs to study phenomena of interest in a production-scale language model, including multi-step computations, planning, unfaithful reasoning, hallucinations, and hidden motivations.
    • 💻 Codebase; Interface; Explanation