Synthesis Blog

Together, we’re building the future of computer vision and machine learning.
Featured Post

AI Safety II: Goodharting and Reward Hacking

In this post, the second in the series (after “Concepts and Definitions”), we embark on a comprehensive exploration of Goodhart’s law: how optimization processes can undermine their intended goals by optimizing proxy metrics. Goodharting lies at the heart of what is so terrifying about making AGI, so this is a key topic for AI safety. Starting with the classic taxonomy of regressional, extremal, causal, and adversarial goodharting, we then trace these patterns from simple mathematical models and toy RL environments to the behaviours of state of the art reasoning LLMs, showing how goodharting manifests in modern machine learning through shortcut learning, reward hacking, goal misgeneralization, and even reward tampering, with striking examples from current RL agents and LLMs. 

Continue reading
All Posts
May 8, 2025

In this post, the second in the series (after “Concepts and Definitions”), we embark on a comprehensive exploration of Goodhart's…

April 17, 2025

In October 2023, I wrote a long post on the dangers of AGI and why we as humanity might not…

March 21, 2025

Today, I want to discuss two recently developed AI systems that can help with one of the holy grails of…

February 25, 2025

Some of the most important AI advances in 2024 were definitely test-time reasoning LLMs, or large reasoning models (LRM), that…

January 28, 2025

We interrupt your regularly scheduled programming to discuss a paper released on New Year’s Eve: on December 31, 2024, Google…

January 17, 2025

It is time to discuss some applications. Today, I begin with using LLMs for programming. There is at least one…

All Series