Synthesis Blog

Together, we’re building the future of computer vision and machine learning.
Featured Post

AI Safety III: Interpretability

In the third post on AI safety (first, second), we turn to interpretability, which has emerged as one of the most promising directions in AI safety research, offering some real hope for understanding the “giant inscrutable matrices” of modern AI models. We will discuss the recent progress from early feature visualization to cutting-edge sparse autoencoders that can isolate individual concepts like “unsafe code”, “sycophancy”, or “the Golden Gate bridge” within frontier models. We also move from interpreting individual neurons to mapping entire computational circuits and even show how LLMs can spontaneously develop RL algorithms. In my opinion, recent breakthroughs in interpretability represent genuine advances towards the existentially important goal of building safe AI systems.

Continue reading
All Posts
June 3, 2025

In the third post on AI safety (first, second), we turn to interpretability, which has emerged as one of the…

May 8, 2025

In this post, the second in the series (after “Concepts and Definitions”), we embark on a comprehensive exploration of Goodhart's…

April 17, 2025

In October 2023, I wrote a long post on the dangers of AGI and why we as humanity might not…

March 21, 2025

Today, I want to discuss two recently developed AI systems that can help with one of the holy grails of…

February 25, 2025

Some of the most important AI advances in 2024 were definitely test-time reasoning LLMs, or large reasoning models (LRM), that…

January 28, 2025

We interrupt your regularly scheduled programming to discuss a paper released on New Year’s Eve: on December 31, 2024, Google…

All Series