Our mission is to accelerate the pace of
machine learning.

We are working to push the limits at the very forefront of machine learning and AI research — through our own research lab, and through the tools we build in partnership with some of the world's best labs.

Read the research →

§1

Research

Transformer Lab is dedicated to exploring the frontier of artificial intelligence. We conduct research across diverse domains in machine learning and publish our findings in the open.

The defining property of the lab is velocity and versatility. We pursue diverse challenges across distinct domains of machine learning, with a bias toward novelty and a deep love for the technically intriguing.

Selected publications

Recent results from our lab's work.

Words, Actions, and Neurons: When a Language Model Agent's Self-Report Lags its Behavior, and When it Lies.Asaria, Salomone, Gandhi. [INTERPRETABILITY] Read the article → July 27, 2026
An agent has three read-outs of its strategy: what it does, what it says, and what a probe reads from its neurons. When a Qwen2.5-3B agent honestly learns to cooperate, its words lag its actions — the narration-vs-behavior gap peaks near 0.47, then closes to zero. When a phi-4 impostor lies in Among Us, the tell stays in its neurons: a role-controlled probe separates deceptive from honest statements at 0.865 AUROC.
More Canadian than American: A Five-Model Audit of US vs. Canadian Value Defaults in Language Models.Asaria, Salomone, Gandhi. [LLM] Read the article → July 14, 2026
Five instruction-tuned models answered ten World Values Survey items against the real opinions of US and Canadian respondents. Every model aligned more often with Canadian public opinion — in 57 of 83 significant comparisons (69%) — overturning the assumption that LLMs default to American values.
Learning, or Just Retrieving? A Copy-Baseline Audit of Zero-Shot Foundation Models in Time-Series and Single-Cell Tasks.Asaria, Salomone, Gandhi. [EVALUATION] Read the article → July 7, 2026
A modality-agnostic audit asking how much of a zero-shot foundation model's apparent skill a cheap retrieve-and-copy baseline already accounts for, across time-series forecasting and single-cell genomics. Introduces the Retrieval-Explained Fraction to separate genuine learning from retrieval.
Does the Silent Workspace Anticipate the Thought? Trace-Aligned Jacobian-Lens Analysis of a Reasoning Model.Asaria, Salomone, Gandhi. [INTERPRETABILITY] Read the article → July 7, 2026
Reasoning models externalize a <think> trace, giving a ground-truth future against which to test whether the Jacobian "lens" really reads the tokens a model is silently disposed to say. On Qwen3-8B, the silent readout shows little sign of an emergent anticipatory workspace.
The Language Is the Lever: Prompt Language, More Than Training Origin, Shapes How Open LLMs Answer Contested Questions.Asaria, Salomone, Gandhi. [LLM] Read the article → July 2, 2026
A controlled bilingual audit of eight size-matched open models (four built in China, four in the West), each answering 145 contested-topic probes in English and Chinese. The language of the prompt, more than the model's training origin, shapes the stance it expresses.
Beyond FAD and CLAP: A Modern Perceptual Re-Ranking and a Controllability Audit of Open-Source Instrumental Music Generators.Salomone, Gandhi, Asaria. [AUDIO] Read the article → June 25, 2026
An encoder-diverse perceptual re-ranking and controllability audit of three open-source instrumental music generators. The best-sounding model is also the only one you can reliably steer — and the reported FAD/CLAP wins don't survive a circularity-free evaluation stack.
How Small Can a Seismic Phase Picker Be? A 34,000-Parameter Model Matches a Pretrained Deep Baseline Under Leakage-Controlled Evaluation.Asaria, Salomone, Gandhi. [EFFICIENCY] Read the article → June 25, 2026
Under a leakage-controlled split of STEAD, a 33,610-parameter 1-D U-Net reaches 0.76 mean P/S pick-F1, beating a pretrained PhaseNet roughly eight times its size (0.64). Standard random splits leak near-duplicate windows and quietly inflate the case for bigger models.
Whose Reasoning Is It? Distilled Reasoning Models Are Faithful to Provided Chains but Only Sparsely to Their Own.Asaria, Salomone, Gandhi. [LLM] Read the article → June 25, 2026
Measures whether a distilled reasoning model's answer actually depends, under intervention, on its chain-of-thought. The models are faithful to reasoning provided to them, but only sparsely to the reasoning they generate themselves.

→ Read all of our research

§2

Research Tooling

Our lab doesn't just release papers and code, we also partner with the world's best labs, across academia and industry, to unlock velocity for their researchers (and their researchers' agents). The tools we build are designed to accelerate the entire research loop, from planning to publication.

platform

Transformer Lab

The workbench our researchers live in: train, tune, evaluate, and inspect models across modalities from one interface.

open source · self-hostable

orchestration

GPU-cluster coordination

Hundreds of distributed jobs across RunPod, Lambda, AWS, Azure, GCP, and in-house hardware — coordinated automatically.

multi-cloud · autoscaling

announcing soon

Intelligence collection

A new approach to gathering and grounding knowledge — not quite ready to reveal. For now we're sharing it only with our closest partners.

stay tuned

announcing soon

Intelligence orchestration

Still under wraps — for now we're sharing it only with our closest partners.