ELLIS UniReps Speaker Series

We’re excited to launch a Speaker Series in collaboration with European Laboratory for Learning and Intelligent Systems (ELLIS) community, focusing on key topics relevant to our community.

When, How, and Why Do Neural Models Learn Similar Representations? The ELLIS UniReps Speaker Series explores the phenomenon of representational alignment, where different neural models—both biological and artificial—develop similar internal representations when exposed to comparable stimuli. This raises key theoretical and practical questions:

  • When do similar representations emerge across models?
  • Why does this alignment occur, and what underlying principles drive it?
  • How can we leverage this alignment to explore applications such as model merging, model re-use, and fine-tuning?

Each monthly session features two talks:

  • 🔵 Keynote talk – A broad overview by a senior researcher, providing context on a key topic.
  • 🔴 Flash talk – A focused presentation by an early-career researcher (such as a PhD student or postdoc), highlighting recent findings or ongoing work.

You can nominate yourself or another researcher as a speaker by filling out our nomination form .

The series provides a platform for early-career researchers to share their work and fosters interdisciplinary discussions across deep learning, neuroscience, cognitive science, and mathematics.

Join the speaker series Google group here. In addition, you can follow the last updates on our Twitter and BlueSky profiles!

Below you can find the calendar for next scheduled appointments:

Calendar

February Appointment

  • 🗓️ When: 26th February 2026 – 16:00 CET
  • 📍 Where: Zoom link
  • 🎙️ Keynote: Antonio Orvieto (ELLIS Institute Tübingen and MPI)
    • Title: Improving Capabilities of Efficient Foundation Models beyond Pure Language Modeling
    • Abstract: * In this talk, we discuss recent alternatives to Transformers based on linear RNNs and linear Attention. While these new models (e.g., Mamba, Hyena, DeltaNet, RWKV) offer improved throughput and extremely long context processing abilities (e.g., all human genome), they often suffer from poor trainability and compute/expressivity tradeoffs. After providing an introduction to these architectures, we present central results on universality and complexity in the literature (e.g., https://arxiv.org/abs/2402.19047). Building on the presented insights and motivated by the need to reason beyond next-token prediction, we discuss a new architecture: Fixed-Point Mamba (spotlight at NeurIPS 2025; https://arxiv.org/abs/2503.10799), that can automatically enhance expressivity through an adaptive compute mechanism. We also discuss new insights into improving the capabilities of recurrent models using complex-number algebra: our new Selective RoPE architecture (ICLR 2026, https://arxiv.org/abs/2511.17388).*
  • 🎙️ Flash Talk: Sara Kangaslahti (Harvard University)
    • Title: Boomerang Distillation Enables Zero-Shot Model Size Interpolation
    • Abstract: Large language models (LLMs) are typically deployed under diverse memory and compute constraints. Existing approaches build model families by training each size independently, which is prohibitively expensive and provides only coarse-grained size options. In this work, we identify a novel phenomenon that we call boomerang distillation: starting from a large base model (the teacher), one first distills down to a small student and then progressively reconstructs intermediate-sized models by re-incorporating blocks of teacher layers into the student without any additional training. This process produces zero-shot interpolated models of many intermediate sizes whose performance scales smoothly between the student and teacher, often matching or surpassing pretrained or distilled models of the same size. We further analyze when this type of interpolation succeeds, showing that alignment between teacher and student through pruning and distillation is essential. Boomerang distillation thus provides a simple and efficient way to generate fine-grained model families, dramatically reducing training cost while enabling flexible adaptation across deployment environments.

January Appointment

  • 🗓️ When: 15th January 2026 – 16:00 CET
  • 📍 Where: Zoom link
  • 📹 Meeting recording
  • 🎙️ Keynote: Noa Garcia (Osaka University)
    • Title: Evaluation in visual recognition: What are we really measuring?
    • Abstract: Evaluation practices are fundamental to machine learning research. They determine whether our models work as intended, generalize to new data, and adhere to scientific principles, all through the design of trustworthy datasets and benchmarks. In this talk, we ask whether current evaluation methods actually meet these goals. Focusing on visual recognition, we examine how data leakage, bias, and shortcut learning may be undermining the reliability of experimental results.
  • 🎙️ Flash Talk: Thaddäus Wiedemer (MPI-IS Tuebingen, Google Deepmind)
    • Title: Video models are zero-shot learners and reasoners
    • Abstract: The remarkable zero-shot capabilities of Large Language Models (LLMs) have propelled natural language processing from task-specific models to unified, generalist foundation models. This transformation emerged from simple primitives: large, generative models trained on web-scale data. Curiously, the same primitives apply to today’s generative video models. Could video models be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language understanding? We demonstrate that Veo 3 can solve a broad variety of tasks it wasn’t explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and more. These abilities to perceive, model, and manipulate the visual world enable early forms of visual reasoning like maze and symmetry solving. Veo’s emergent zero-shot capabilities indicate that video models are on a path to becoming unified, generalist vision foundation models.

December Appointment

  • 🗓️ When: 18th December 2025 – 16:00 CET
  • 📍 Where: Zoom link
  • 📹 Meeting recording
  • 🎙️ Keynote: Meenakshi Khosla (UC San Diego)
    • Title: Comparative Analysis of Neural Representations: Tools, Limits, and Emerging Principles
    • Abstract: Comparing neural representations across brains and artificial models has become a central tool for understanding intelligence. Yet the field relies on a small set of similarity metrics whose assumptions, invariances, and failure modes are often poorly understood. In this talk, I will present recent work aimed at advancing comparative analysis along four complementary axes. First, I introduce new tools that capture previously overlooked dimensions of representational structure, revealing trends in alignment that standard measures miss. Second, I benchmark and stress-test widely used comparison methods, revealing when they succeed, when they fail, and what kinds of structure they implicitly privilege or ignore. Third, I apply these tools at scale across diverse biological and artificial systems to probe patterns of alignment, divergence, and potential universality. Finally, I investigate the mechanisms that drive representational convergence in the first place, asking why different systems arrive at similar solutions. Together, this work argues for a more principled approach to representational comparisons and highlights the importance of understanding our measurements alongside the systems we seek to compare.
  • 🎙️ Flash Talk: Raj Magesh Gauthaman (Johns Hopkins University)
    • Title: Universal scale-free representations in human visual cortex
    • Abstract: How does the human brain encode complex visual information? While previous research has characterized individual dimensions of visual representation in cortex, we still lack a comprehensive understanding of how visual information is organized across the full range of neural population activity. Here, analyzing fMRI responses to natural scenes across multiple individuals, we discover that neural representations in human visual cortex follow a remarkably consistent scale-free organization—their variance decay is consistent with a power-law distribution, detected across four orders of magnitude of latent dimensions. This scale-free structure appears consistently across multiple visual regions and across individuals, suggesting it reflects a fundamental organizing principle of visual processing. Critically, when we align neural responses across individuals using hyperalignment, we find that these representational dimensions are largely shared between people, revealing a universal high-dimensional spectrum of visual information that emerges despite individual differences in brain anatomy and visual experience. Traditional analysis approaches in cognitive neuroscience have focused primarily on a small number of high-variance dimensions, potentially missing crucial aspects of visual representation. Our results demonstrate that visual information is distributed across the full dimensionality of cortical activity in a systematic way, thus revealing a key property of neural coding in visual cortex. These findings suggest that we need to move beyond low-dimensional characterizations to fully understand how the brain represents the visual world.

November Appointment

  • 🗓️ When: 18th November 2025 – 16:00 CET
  • 📍 Where: Zoom link
  • 📹 Meeting recording
  • 🎙️ Keynote: Yilun Du (Harvard University)
    • Title: Generalizing Outside the Training Distribution with Compositional Generative Models
    • Abstract: Generative models are typically trained to directly fit a probability distribution given a dataset of samples. I’ll introduce the idea of compositional generative modeling, where we decompose generative models into simpler components which we recombine together to fit more complex distributions we do not explicitly have data for. I’ll accomplish this, I’ll introduce Energy-Based Models and the algebra through which they can be composed. Finally, I’ll illustrate how this approach allows us to generalize and solve complex visual generation, reasoning, and planning problems.
  • 🎙️ Flash Talk: Shiyun Yu (KAIST)
    • Title: Representation Matters in Training Diffusion Transformers
    • Abstract: In this talk, I’ll introduce a simple yet effective regularization method called REPresentation Alignment (REPA) for training diffusion transformers (DiTs). By aligning the projected noisy input hidden states in DiTs with clean image representations obtained from external pretrained visual encoders, we demonstrate that this straightforward approach significantly improves both training efficiency and generation quality when applied to popular diffusion- and flow-based transformers such as DiTs and SiTs. In addition, I’ll discuss how REPA has reshaped the way we train DiTs under the “representation-for-generation” paradigm, including extensions such as REPA-E, VA-VAE, and RAE

October 2nd Appointment

  • 🗓️ When: 28th October 2025 – 16:00 CET
  • 📍 Where: Zoom link
  • 📹 Meeting recording
  • 🎙️ Keynote: Stefano Fusi (Zuckerman Institute, Columbia)
    • Title: The dynamics of the geometry of abstraction
    • Abstract: Neurons in the mammalian brain often exhibit complex, non-linear responses to multiple task variables (mixed selectivity). Despite the diversity of these responses, which are seemingly disorganized, it is often possible to observe an interesting structure in the representational geometry: task-relevant variables are encoded in approximately orthogonal subspaces in the neural activity space. This encoding is a signature of low-dimensional disentangled representations, it is typically the result of a process of abstraction and allows linear readouts to readily generalize to novel situations. We show that these representations are observed in multiple brain areas in human and non-human primates. We then studied how the geometry changes during the decision-making process in 5 different brain areas (the hippocampus, dorsolateral prefrontal cortex, anterior cingulate cortex, orbitofrontal cortex, and the amygdala) of non-human primates, and how the analysis of the geometry dynamics can be used to understand the underlying neural mechanisms. We finally show how the representational geometry changes with learning in humans. Collaboration with the Salzman and Rutishauser groups.
  • 🎙️ Flash Talk: Daniel Kunin (UC Berkley)
    • Title: Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
    • Abstract: What features artificial neural networks learn, and how, remains an open question. I’ll introduce Alternating Gradient Flows (AGF), an algorithmic framework that describes the dynamics of feature learning in two-layer networks trained from small initialization. Prior works have shown that gradient flow in this regime exhibits a staircase-like loss curve, alternating between plateaus where neurons slowly align to useful directions and sharp drops where neurons rapidly grow in norm. AGF approximates this behavior as an alternating two-step process: maximizing a utility function over dormant neurons and minimizing a cost function over active ones. We show that AGF unifies and extends existing saddle-to-saddle analyses in fully connected linear networks and attention-only linear transformers, where the learned features are singular modes and principal components, respectively. In diagonal linear networks, we prove AGF converges to gradient flow in the limit of vanishing initialization. Applying AGF to quadratic networks trained to perform modular addition, we give the first complete characterization of the training dynamics, revealing that networks learn Fourier features in decreasing order of coefficient magnitude.

October 1st Appointment

  • 🗓️ When: 8th October 2025 – 16:00 CET
  • 📍 Where: Zoom link
  • 🎙️ Keynote: Maximilian Seitzer (Meta)
    • Title: DINOv3: Training Vision Foundation Models with Large-Scale Self-Supervised Learning
    • Abstract: The promise of self-supervised learning (SSL) is to utilize large unlabelled collections of data to obtain foundational representations useful across a broad range of tasks. DINOv3 makes a large step towards realizing this promise. In this talk, I will discuss the ingredients necessary to scale SSL training to 7B parameters and 1.7B images, including choices on data, modeling, and post-training. In particular, I will introduce the “Gram anchoring” strategy that effectively recovers high quality dense features degrading during long training. The resulting model, DINOv3, achieves state-of-the-art performance across a wide range of benchmarks without requiring fine-tuning, surpassing both self- and weakly-supervised models. Furthermore, the same algorithm applied to satellite imagery obtains state-of-the-art results on geospatial tasks, highlighting the potential of SSL across domains.
  • 🎙️ Flash Talk: Joséphine Raugel (Meta)
    • Title: What made them like us? Disentangling the factors of convergence between brains and computer vision models
    • Abstract: Many AI models trained on natural images develop representations that resemble those of the human brain. However, the exact factors that drive this brain-model similarity remain poorly understood. In order to disentangle how the model architecture, training recipe and data type independently lead a neural network to develop brain-like representations, we trained a family of self-supervised vision transformers (DINOv3) that systematically varied these different factors. We compare their representations of natural images to those of the human brain recorded with both ultra-high field functional magnetic resonance imaging (fMRI) and magneto-encephalography (MEG), providing high resolution in spatial and temporal analyses. We assess the brain-model similarity with three complementary metrics focusing on overall representational similarity, topographical organization, and temporal dynamics. We show that all three factors – model size, training amount, and image type – independently and interactively impact each of these brain similarity metrics. In particular, the largest DINOv3 models trained with the largest amount of human-centric images reach the highest brain-similarity scores. Importantly, this emergence of brain-like representations in AI models follows a specific chronology during training: models first align with the early representations of the sensory cortices, and only align with the late and prefrontal representations of the brain with considerably more training data. Finally, this developmental trajectory is indexed by both structural and functional properties of the human cortex: the representations that are acquired last by the models specifically align with the cortical areas with the largest developmental expansion, the largest thickness, the least myelination, and the slowest timescales. Overall, these findings disentangle the interplay between architecture and experience in shaping how artificial neural networks come to see the world as humans do, thus offering a promising framework to understand how the human brain comes to represent its visual world.

September Appointment

  • 🗓️ When: 16th September 2025 – 16:00 CET
  • 📍 Where: Zoom link
  • 📹 Meeting recording
  • 🎙️ Keynote: Jacqueline Gottleb (Zuckermann Institute, Columbia)
    • Title: Meta-Level Control of Learning and Information Foraging in Natural Settings
    • Abstract: Over the past centuries, neuroscience and psychology have examined behavior in highly simplified settings in which relevant information is carefully curated and given to participants by default. In natural settings, however, we face an existential problem of information selection: we must autonomously decide, based on the practically infinite set of information that confronts us from the environment and memory, when and to what to attend, when and what to commit to memory, and when and what to learn. I propose that challenges of information selection are resolved by a meta-level controller that monitors processes of attention, learning and memory and decides when and how to recruit them based on their anticipated benefits and computational costs. I will discuss a biologically plausible model that implements reinforcement meta-level control (RML) in a circuit comprising the anterior cingulate cortex and neuromodulators dopamine and norepinephrine. I will discuss the power of the model to explain empirical data (behavior and single- neuron activity) on uncertainty-based attention control, and its implications for the detection of learnability – the ability to detect learnable patterns given unlabeled mixtures of true and random associations that characterize natural settings.
  • 🎙️ Flash Talk: William Dorrel (UCL Gatbsy)
    • Title: Normative Theory of Structured Working Memory Representations in PFC
    • Abstract: The prefrontal cortex is hypothesised to form the mind’s workspace. Among its putative functions, two well-supported ideas are working memory, and the representation of structural schemas. Building on observations going back decades, recent work has examined the prefrontal representations combining working memory and structural components, such as a sequence working memory task: you see a sequence of stimuli ABC and have to recall them (working memory) in the order presented (schema). These representations show intriguing structure: each memory is encoded in a different subspaces within the neural activity, and the structural layout of the trial is embedded in the relationship between the subspaces. Modelling work has shown that simple recurrent neural network (RNN) models trained on the same tasks develop similar representations. This prompts a normative question, why do both PFC and RNNs learn this representation? And what determines the intricate structure of these representations, for example, why are the subspaces are aligned in some tasks, but orthogonal in others? We develop a normative theory to reason about these schematic memory representations. Our theory studies the optimal representation for structured working memory tasks, under biologically relevant constraints. By studying the optimal codes applied to different settings, we are able to relate representational differences to task structure, for example, subspace-alignment arises from the correlations between memories. Similarly, we reason about the optimal implementation of different algorithms, allowing us to infer the underlying algorithm from the subtle subspace choices the brain makes, in ways that are impossible from a more naive analysis. In sum, we are able to precisely frame the computation that might arise from these representations, and use it to reason about how neurons should instantiate such computations. Further, the theory is structurally identical to recent normative grid cell theories, demonstrating its use as a shared normative framework for reasoning about the cortical implementation of algorithms and representations.

August Appointment

  • 🗓️ When: 27th August 2025 – 16:00 CET
  • 📍 Where: Zoom link
  • 📹 Meeting recording
  • 🎙️ Keynote: Leland McInnes (Tutte Institute)
    • Title: Rethinking Unsupervised Learning
    • Abstract: With the vast troves of unlabelled data unlocked by neural embeddings classical unsupervised learning techniques are finding new life. Unfortunately these techniques are often ill-equipped to deal with high-dimensional embedding vectors. Why is high-dimensional space different? How do we build new techniques that can work more effectively?
  • 🎙️ Flash Talk: Yu (Demi) Qin (National Renewable Energy Laboratory)
    • Title: Learning to Compare Complex Shapes in Data — 100× Faster with Merge Tree Neural Networks
    • Abstract: *Understanding the shape of data matters across many fields; for scalar fields, merge trees capture that shape. In this talk, I’ll show how learning-based methods make these comparisons fast and accurate. We present the Merge Tree Neural Network (MTNN), which learns to compare topological summaries of scalar fields with both speed and precision. MTNN maps merge trees to vector embeddings using a graph neural network and adds a topological attention mechanism that highlights structure-critical nodes. Across real datasets, MTNN delivers over 100× speedup with <0.1% error, making topological comparison practical at scale. This work received the IEEE VIS 2024 Best Paper Award.Paper Link *

July 31st Appointment

  • 🗓️ When: 31st July 2025 – 16:00 CET
  • 📍 Where: Zoom link
  • 📹 Meeting recording
  • 🎙️ Keynote: Bastian Rieck (University of Fribourg)
    • Title: Shapes, Spaces, Simplices, and Structure: Geometry, Topology, and Machine Learning
    • Abstract: A large driver contributing to the undeniable success of deep-learning models is their ability to synthesise task-specific features from data. For a long time, the predominant belief was that ‘given enough data, all features can be learned.’ However, as large language models are hitting diminishing returns in output quality while requiring an ever-increasing amount of training data and compute, new approaches are required. One promising avenue involves focusing more on aspects of modelling, which involves the development of novel inductive biases such as invariances that cannot be readily gleaned from the data. This approach is particularly useful for data sets that model real-world phenomena, as well as applications where data availability is scarce. Given their dual nature, geometry and topology provide a rich source of potential inductive biases. In this talk, I will present novel advances in harnessing multi-scale geometrical-topological characteristics of data. A special focus will be given to show how geometry and topology can improve representation learning tasks. Underscoring the generality of a hybrid geometrical-topological perspective, I will furthermore showcase applications from a diverse set of data domains, including point clouds, graphs, and higher-order combinatorial complexes.
  • 🎙️ Flash Talk: Florentin Guth (NYU)
    • Title: On the universality of neural encodings in CNNs
    • Abstract: *Deep networks achieve remarkable performance on many high-dimensional datasets, yet we cannot answer simple questions about what they have learned. For instance, do they learn the same “features” no matter their initialization? What about when we change the architecture or the training dataset? I will show how to meaningfully compare weights of deep networks using an alignment procedure on their hidden layers. We find that CNNs trained on image classification tasks share a common set of universal features, even for deep layers. These results explain, at a more fundamental level, observed similarities in neural representations and the success of transfer learning, and pave the way for principled foundation models.Paper Link *

July 8th Appointment

  • 🗓️ When: 8th July 2025 – 16:00 CET
  • 📍 Where: Zoom link
  • 📹 Meeting recording
  • 🎙️ Keynote: Victor Veitch (Google DeepMind/University of Chicago)
    • Title: The Geometry of Large Language Models
    • Abstract: I’ll discuss some results around how language models encode semantic relationships familiar to humans as geometric relationships between their activations. We will focus in particular on the ideas of causal separability—concepts like language and subject can be freely varied—and hierarchical semantics—corgi is a kind of dog is a kind of animal. We will see how using the structure of the softmax link function lets us formalize how concepts are represented as vectors, and derive that the representations of causally separable and hierarchically related must satisfy certain (bi)-orthogonality relationships.
  • 🎙️ Flash Talk: Luigi Gresele (University of Copenhagen)
    • Title: All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
    • Abstract: We analyze identifiability as a possible explanation for the ubiquity of linear properties across language models, such as the vector difference between the representations of “easy” and “easiest” being parallel to that between “lucky” and “luckiest”. For this, we ask whether finding a linear property in one model implies that any model that induces the same distribution has that property, too. To answer that, we first prove an identifiability result to characterize distribution-equivalent next-token predictors, lifting a diversity requirement of previous results. Second, based on a refinement of relational linearity [Paccanaro and Hinton, 2001; Hernandez et al., 2024], we show how many notions of linearity are amenable to our analysis. Finally, we show that under suitable conditions, these linear properties either hold in all or none distribution-equivalent next-token predictors. This talk is based on joint work with Emanuele Marconato, Sébastien Lachapelle and Sebastian Weichwald.

June Appointment

  • 🗓️ When: 20th June 2025 – 16:00 CET
  • 📍 Where: Zoom link
  • 📹 Meeting recording
  • 🎙️ Keynote: Mariya Toneva (Max Planck Institute for Software Systems)
    • Title: Aligning Language Models to the Human Brain
    • Abstract: In this talk, I will introduce brain-tuning, a method that aligns language models to the human brain by fine-tuning language models with brain data recorded while individuals listen to natural speech. Despite using fMRI data that corresponds to less than 1% of the models’ pretraining data, brain-tuning 1) improves alignment with semantic brain regions, 2) reduces reliance on low-level features for this alignment, and 3) excitingly, substantially improves performance on semantic downstream tasks. Together, this method and findings strengthen the utility of speech language models as model organisms of language in the brain, and provide new opportunities for cross-pollination between cognitive neuroscience and AI.
  • 🎙️ Flash Talk: Lenka Tětková (Technical university of Denmark)
    • Title: On convex decision regions in deep network representations
    • Abstract: How aligned are machine representations with the way humans understand concepts? In this talk, I’ll explore this question through the lens of convexity in machine-learned latent spaces—a property long studied in cognitive science for its role in generalization, few-shot learning, and communication. Inspired by Gärdenfors’ theory of conceptual spaces, we develop new tools to measure convexity in real-world model representations and apply them across layers of state-of-the-art deep networks. We find that many concept regions — across domains like vision, language, audio, and even medical data — are approximately convex. What’s more, convexity tends to increase with fine-tuning and can even predict fine-tuning performance in pretrained models. These results suggest that convexity is a meaningful, robust property of learned representations, with implications for improving generalization and understanding human-machine alignment.

May Appointment

  • 🗓️ When: 29th May 2025 – 16:30 CET
  • 📍 Where: Zoom link
  • 📹 Meeting recording
  • 🎙️ Keynote: Andrew Lampinen (Google Deepmind)
    • Title: Representation Biases: when aligned representations do not imply aligned computations
    • Abstract: We often study a system’s representations to learn about its computations, or intervene on its representations to try to fix it. However, the relationship between representation and computation is not always straightforward. In this talk, I will discuss a recent paper (https://openreview.net/forum?id=aY2nsgE97a) in which we study this relationship in controlled settings. We find that feature representations are substantially biased towards certain types of features (linear over nonlinear, prevalent over less prevalent), even when the features play an equivalent computational role in the model’s outputs. These phenomena hold across a wide range of models and tasks. I will discuss implications of these feature biases for downstream analyses like regression and RSA, and their relation to our recent finding that simplifying models for analysis may not generalize well out of distribution (https://openreview.net/forum?id=YJWlUMW6YP). These results raise important questions over how to interpret and use representational analysis tools.
  • 🎙️ Flash Talk: Jack Lindsey (Anthropic)
    • Title: On the Biology of a Large Language Model
    • Abstract: In this talk, I’ll describe a new method for revealing mechanisms in language models. First, we train a “replacement model” that substitutes the model’s neurons with sparsely active “features” which are easier to interpret. Then, for a given model input/output, we summarize the intermediate computational steps taken by the model with an interactive attribution graph, which depicts causal interactions between features. We apply attribution graphs to study phenomena of interest in a production-scale language model, including multi-step computations, planning, unfaithful reasoning, hallucinations, and hidden motivations.
    • 💻 Codebase; Interface; Explanation