Applications
- Biometrics & security
  
  ID verification
  
  Facial identification and verification for consumer and security applications.
  
  Security
  
  Activity recognition and threat detection across camera views.
  
  Consumer devices & applications
  
  AR/VR/XR
  
  Spatial computing, gesture recognition, and gaze estimation for headsets.
  
  Virtual try-on
  
  Millions of identities and clothing options to train best-in-class models.
  
  Biometrics & security
  
  Driver monitoring
  
  Simulate driver and occupant behavior captured with multi-modal cameras.
  
  Pedestrian detection
  
  Simulate edge cases and rare events to ensure the robust performance of autonomous vehicles.
Resources
- AI Safety IV: Sparks of Misalignment
  
  This is the last, fourth post in our series...
  
  Read More
  
  Visit the Blog
  
  Gen AI
  
  Gen AI book
  
  HumanAPI
  
  Data visualizer
  
  API documentation
  
  Synthetic data
  
  Synthetic data book
  
  Synthetic data guide
  
  Synthetic data whitepaper
  
  Industry survey
  
  OpenSynthetics
Company
- About us
  
  Overview
  
  Ethics
  
  Press
  
  Newsroom
  
  Press Kit
  
  Join Our Team
  
  Together, we’re building the future of computer vision & machine learning
  
  Explore Careers

Applications
- Biometrics & security
  
  ID verification
  
  Facial identification and verification for consumer and security applications.
  
  Security
  
  Activity recognition and threat detection across camera views.
  
  Consumer devices & applications
  
  AR/VR/XR
  
  Spatial computing, gesture recognition, and gaze estimation for headsets.
  
  Virtual try-on
  
  Millions of identities and clothing options to train best-in-class models.
  
  Biometrics & security
  
  Driver monitoring
  
  Simulate driver and occupant behavior captured with multi-modal cameras.
  
  Pedestrian detection
  
  Simulate edge cases and rare events to ensure the robust performance of autonomous vehicles.
Resources
- AI Safety IV: Sparks of Misalignment
  
  This is the last, fourth post in our series...
  
  Read More
  
  Visit the Blog
  
  Gen AI
  
  Gen AI book
  
  HumanAPI
  
  Data visualizer
  
  API documentation
  
  Synthetic data
  
  Synthetic data book
  
  Synthetic data guide
  
  Synthetic data whitepaper
  
  Industry survey
  
  OpenSynthetics
Company
- About us
  
  Overview
  
  Ethics
  
  Press
  
  Newsroom
  
  Press Kit
  
  Join Our Team
  
  Together, we’re building the future of computer vision & machine learning
  
  Explore Careers

Applications
- Biometrics & security
  
  ID verification
  
  Facial identification and verification for consumer and security applications.
  
  Security
  
  Activity recognition and threat detection across camera views.
  
  Consumer devices & applications
  
  AR/VR/XR
  
  Spatial computing, gesture recognition, and gaze estimation for headsets.
  
  Virtual try-on
  
  Millions of identities and clothing options to train best-in-class models.
  
  Biometrics & security
  
  Driver monitoring
  
  Simulate driver and occupant behavior captured with multi-modal cameras.
  
  Pedestrian detection
  
  Simulate edge cases and rare events to ensure the robust performance of autonomous vehicles.
Resources
- AI Safety IV: Sparks of Misalignment
  
  This is the last, fourth post in our series...
  
  Read More
  
  Visit the Blog
  
  Gen AI
  
  Gen AI book
  
  HumanAPI
  
  Data visualizer
  
  API documentation
  
  Synthetic data
  
  Synthetic data book
  
  Synthetic data guide
  
  Synthetic data whitepaper
  
  Industry survey
  
  OpenSynthetics
Company
- About us
  
  Overview
  
  Ethics
  
  Press
  
  Newsroom
  
  Press Kit
  
  Join Our Team
  
  Together, we’re building the future of computer vision & machine learning
  
  Explore Careers

Applications
- Biometrics & security
  
  ID verification
  
  Facial identification and verification for consumer and security applications.
  
  Security
  
  Activity recognition and threat detection across camera views.
  
  Consumer devices & applications
  
  AR/VR/XR
  
  Spatial computing, gesture recognition, and gaze estimation for headsets.
  
  Virtual try-on
  
  Millions of identities and clothing options to train best-in-class models.
  
  Biometrics & security
  
  Driver monitoring
  
  Simulate driver and occupant behavior captured with multi-modal cameras.
  
  Pedestrian detection
  
  Simulate edge cases and rare events to ensure the robust performance of autonomous vehicles.
Resources
- AI Safety IV: Sparks of Misalignment
  
  This is the last, fourth post in our series...
  
  Read More
  
  Visit the Blog
  
  Gen AI
  
  Gen AI book
  
  HumanAPI
  
  Data visualizer
  
  API documentation
  
  Synthetic data
  
  Synthetic data book
  
  Synthetic data guide
  
  Synthetic data whitepaper
  
  Industry survey
  
  OpenSynthetics
Company
- About us
  
  Overview
  
  Ethics
  
  Press
  
  Newsroom
  
  Press Kit
  
  Join Our Team
  
  Together, we’re building the future of computer vision & machine learning
  
  Explore Careers

Category: Research Topics

September 25, 2024

OpenAI’s o1-preview: the First LLM That Can Answer My Questions

OpenAI’s o1-preview has been all the buzz lately. While this model is based on the GPT-4o general architecture, it boasts much improved reasoning capabilities: it can ponder the question for about a minute, reason through multiple possibilities, and arrive at solutions that could not be generated from a single try of GPT-4o. In this post, I discuss the o1-preview model but mainly present the most striking advantage of o1-preview over all previous LLMs: it can meaningfully answer questions from a quiz game called “What? Where? When?”. At this point, it probably does not sound all that exciting compared to winning math competitions and answering PhD level questions on science, but let me elaborate.

September 18, 2024

Using RAG to Enrich LLMs

We continue our series on LLMs and various ways to make them better. We have already discussed ways to increase the context size, world models that arise in LLMs and other generative models, and LLM fine-tuning including RLHF, LoRA, and more. Today we consider another key idea that can make LLMs far more effective and useful in practice: retrieval-augmented generation, or RAG. We discuss the basic idea of RAG, its recursive agentic extensions, the R[e]ALM approach that integrates retrieval into LM training, some key problems of modern RAG approaches, discuss in detail knowledge graphs and how they are being used in RAG, and conclude with a reminder that even simple approaches can work well and a list of directions for future work.

August 13, 2024

Fine-Tuning LLMs: RLHF, LoRA, and Instruction Tuning

We continue our series on generative AI. We have discussed Transformers, large language models, and some specific aspects of Transformers – but are modern LLMs still running on the exact same Transformer decoders as the original GPT? Yes and no; while the basics remain the same, there has been a lot of progress in recent years. Today, we briefly review some of the most important ideas in fine-tuning LLMs: RLHF, LoRA, instruction tuning, and recursive self-improvement. These ideas are key in turning a token prediction machine into a useful tool for practical applications.

July 2, 2024

Do Androids Dream? World Models in Modern AI

One of the most striking AI advances this spring was OpenAI’s Sora, a video generation model that sets new standards for video consistency and diversity. Interestingly, the official report on Sora is titled “Video generation models as world simulators”. It notes that Sora has emerging simulation capabilities and is on a “promising path towards the development of capable simulators of the physical and digital world”. Today, we discuss world models in modern artificial intelligence: what they are, how they have progressed over the last few years, and where they may go in the future.

April 8, 2024

Lost in Context: How Much Can You Fit into a Transformer

The announcement of Gemini 1.5 by Google was all but eclipsed by OpenAI’s video generation model Sora. Still, there was one very important thing there: the promise of processing a context window of up to 1 million tokens. A very recent announcement of new Claude models by Antropic also boasts context windows of up to 1M tokens, with 200K tokens available at launch. Today, we discuss what context windows are, why they are a constraint for Transformer-based models, how researchers have been trying to extend the context windows of modern LLMs, and how we can understand if a large context window usefully works. By virtue of nominative determinism, this is a a very long post even by the standards of this blog, so brace yourself and let’s go!

February 13, 2024

The Unreasonable Ineffectiveness of AI for Math

One of the most interesting AI-related news for me recently was a paper by DeepMind researchers that presented a new mathematical result found by large language models: new constructions for the cap set problem. In this post, we take a step back and discuss the general relation between math and AI. A mathematical proof is easy to verify but may be very hard to find. But there are AI-shaped holes in looking for a proof: math involves multi-step reasoning and planning, hard theorems need to be decomposed into lemmas, there are search strategies involved… However, mathematics has turned out to be unexpectedly difficult for AI. In this post we discuss what people have been doing with AI in math and how LLMs can help mathematicians right now.

December 4, 2023

Generative AI, Part 0: Background on Transformers

Here at Synthesis AI, we have decided to release the “Generative AI” series in an e-book form; expect a full-fledged pdf with all the images soon. But when I started collecting the posts into a single coherent whole, I couldn’t help but feel the huge, glaring omission of the most important topic in modern AI, the secret sauce that drives the entire field of ML nowadays: self-attention layers introduced in the original Transformer architecture. I haven’t planned to cover them before since there are plenty of other excellent sources, but in a larger format Transformers have become an inevitability. So today, I post the chapter on Transformers, which seems to be by far the longest post ever on this blog. We will discuss how the Transformer works, introduce the two main families of models based on self-attention, BERT and GPT, and discuss how Transformers can handle images as well.

October 10, 2023

Generative AI VIII: AGI Dangers and Perspectives

This is the last post in the “Generative AI” series. Today, we look into the future and discuss where the current trends take us, what dangers might artificial general intelligence (AGI) hold for us, and whether we are ready for these dangers (spoiler: not at all). I will present the case for AGI doomers and discuss the main arguments, but please keep in mind that in this post, everything is mostly speculation (although there actually are attempts to put this speculation on firm mathematical ground).

September 20, 2023

Generative AI VII: The AI Spring of 2023

Last time, we finished all intended mathematical content, so it is time for us to wrap up the generative AI series. We will do it over two installments. Today, we discuss and summarize the (lots of) news that have been happening in the AI space over the last half a year. They all conveniently fall into the generative AI space, with expanding capabilities leading to both extreme excitement and serious security concerns. So how are current AI models different from older ones and when are we going to actually have AGI? It all started with GPT-3.5…

August 9, 2023

Generative AI VI: Stable Diffusion, DALL-E 2, and Midjourney

Congratulations, my friends, we have finally come to the end of the series! Although… well, not quite (see below), but we have definitely reached the end of what I had planned originally. Last time, we discussed diffusion-based models, mentioning, if not fully going through, all their mathematical glory. This time, we are going to put diffusion-based models together with multimodal latent spaces and variational autoencoders with discrete latent codes, getting to Stable Diffusion and DALL-E 2, and then will discuss Midjourney and associated controversies. Not much new math today: we have all the Lego blocks, and it only remains to fit them all together.

Biometrics & security

ID verification

Security

Consumer devices & applications

AR/VR/XR

Virtual try-on

Biometrics & security

Driver monitoring

Pedestrian detection

Gen AI

HumanAPI

Synthetic data

About us

Press

Join Our Team

Biometrics & security

ID verification

Security

Consumer devices & applications

AR/VR/XR

Virtual try-on

Biometrics & security

Driver monitoring

Pedestrian detection

Gen AI

HumanAPI

Synthetic data

About us

Press

Join Our Team

Biometrics & security

ID verification

Security

Consumer devices & applications

AR/VR/XR

Virtual try-on

Biometrics & security

Driver monitoring

Pedestrian detection

Gen AI

HumanAPI

Synthetic data

About us

Press

Join Our Team

Biometrics & security

ID verification

Security

Consumer devices & applications

AR/VR/XR

Virtual try-on

Biometrics & security

Driver monitoring

Pedestrian detection

Gen AI

HumanAPI

Synthetic data

About us

Press

Join Our Team

Category: Research Topics

Embrace Synthetic Data.