Category: Featured Post

Lost in Context: How Much Can You Fit into a Transformer

The announcement of Gemini 1.5 by Google was all but eclipsed by OpenAI’s video generation model Sora. Still, there was one very important thing there: the promise of processing a context window of up to 1 million tokens. A very recent announcement of new Claude models by Antropic also boasts context windows of up to 1M tokens, with 200K tokens available at launch. Today, we discuss what context windows are, why they are a constraint for Transformer-based models, how researchers have been trying to extend the context windows of modern LLMs, and how we can understand if a large context window usefully works. By virtue of nominative determinism, this is a a very long post even by the standards of this blog, so brace yourself and let’s go!

Continue reading
Generative AI VIII: AGI Dangers and Perspectives

This is the last post in the “Generative AI” series. Today, we look into the future and discuss where the current trends take us, what dangers might artificial general intelligence (AGI) hold for us, and whether we are ready for these dangers (spoiler: not at all). I will present the case for AGI doomers and discuss the main arguments, but please keep in mind that in this post, everything is mostly speculation (although there actually are attempts to put this speculation on firm mathematical ground).

Continue reading
Generative AI V: Diffusion-based models

By this time, we have discussed nearly all components of modern generative AI: variational autoencoders, discrete latent spaces, how they combine with Transformers in DALL-E, and how to learn a joint latent space for images and text. There is only one component left—diffusion-based models—but it’s a big one! Today, we discuss the main idea of diffusion-based models and go over the basic diffusion models such as DDPM and DDIM. Expect a lot of math, but it will all pay off at the end.

Continue reading
Variational Autoencoders (VAEs): Generative AI I

It might seem like generative models are going through new phases every couple of years: we heard about Transformers, then flow-based models were all the rage, then diffusion-based models… But in fact, new ideas build on top of older ones. Following our overview post, today we start an in-depth dive into generative AI. We consider the variational autoencoder (VAE), an idea introduced in 2013, if not earlier, but still very relevant and still underlying state of the art generative models such as Stable Diffusion. We will not consider all the gory mathematical details but I hope to explain the necessary intuition.

Continue reading
Generative AI Models in Image Generation: Overview

Some of the most widely publicized results in machine learning in recent years have been related to image generation. You’ve heard of DALL-E a year ago, and now you’ve heard of DALL-E 2, Midjourney, and Stable Diffusion, right? With this post, I’m starting a new series where I will explain the inner workings of these models, what their differences are and how they fit into the general scheme of deep generative models. Today, we begin with a general overview.

Continue reading
Easier, Faster, More Powerful: Fall, ‘22 Release

/*! elementor – v3.7.8 – 02-10-2022 */
.elementor-widget-image{text-align:center}.elementor-widget-image a{display:inline-block}.elementor-widget-image a img[src$=”.svg”]{width:48px}.elementor-widget-image img{vertical-align:middle;display:inline-block}

/*! elementor – v3.7.8 – 02-10-2022 */
.elementor-heading-title{padding:0;margin:0;line-height:1}.elementor-widget-heading .elementor-heading-title[class*=elementor-size-]>a{color:inherit;font-size:inherit;line-height:inherit}.elementor-widget-heading .elementor-heading-title.elementor-size-small{font-size:15px}.elementor-widget-heading .elementor-heading-title.elementor-size-medium{font-size:19px}.elementor-widget-heading .elementor-heading-title.elementor-size-large{font-size:29px}.elementor-widget-heading .elementor-heading-title.elementor-size-xl{font-size:39px}.elementor-widget-heading .elementor-heading-title.elementor-size-xxl{font-size:59px}

Introducing Synthesis Humans & Synthesis Scenarios

Our mission here at Synthesis AI has been the same since our initial launch: To enable more capable and ethical AI. Our unique platform couples generative AI with cinematic CGI pipelines to enable the on-demand generation of photorealistic, diverse and perfectly labeled images and videos. Synthesis AI gives ML practitioners more tools, greater accuracy, and finer control over their data for developing, training and tuning computer vision models.

The Fall ‘22 Release stays true to our mission by introducing two new products, Synthesis Humans and Synthesis Scenarios, both built on top of our core data generation platform. The two new products introduce features to help ML practitioners build and implement more sophisticated models and ship CV products faster and more cost-effectively.

Synthesis Humans

Synthesis Humans enables ML practitioners to create sophisticated production-scale models, providing over 100,000 unique identities and the ability to modify dozens of attributes, including emotion, body type, clothing and movement. An intuitive user interface (UI) allows developers to create labeled training data quickly, and a comprehensive API – formerly HumanAPI – supports teams that prefer programmatic access and control.Synthesis Humans is ideal for generating detailed facial and body images and videos with never-before-available rich annotations, offering 100 times greater depth and breadth of diversity than any other provider. There is a broad range of computer vision use cases that currently benefit from the use of synthetic and synthetic-hybrid approaches to model training and deployment, including:

ID verification. Biometric facial identification is used widely to ensure consumer privacy and protection. Applications include smartphones, online banking, contactless ticketing, home and enterprise access,  and other instances of user authentication. Robust, unbiased model performance requires large amounts of diverse facial data. This data is difficult to obtain given privacy and regulatory constraints, and publicly available datasets are insufficient for production systems. Synthesis Humans provides the most diverse data in a fully privacy-compliant manner to enable the development of more robust and less biased ID verification models, complete with confounds such as facial hair, glasses, hats, and masks.

Driver and passenger monitoring. Car manufacturers, suppliers and AI companies are looking to build computer vision systems to monitor driver state and help improve safety. Recent EU regulations have catalyzed the development of more advanced solutions, but the diverse, high-quality in-car data needed to train AI models is labor-intensive and expensive to obtain. Synthesis Humans can accurately model diverse drivers, key behaviors, and the in-cabin environment (including passengers) to enable the cost-effective and efficient development of more capable models. A driver or machine operator’s gaze, emotional state, and use of a smartphone or similar device are key variables for training ML models.

Avatars. Avatar development relies on photorealistic capture and recreation of humans in the digital realm. Developing avatars and creating these core ML models requires vast amounts of diverse, labeled data. Synthesis Humans provides richly labeled 3D data across the broadest set of demographics available. We continue to lead the industry by providing 5,000 dense landmarks, which allows for a highly nuanced and realistic understanding of the human face.

Virtual Try-on. New virtual try-on technologies are emerging to provide immersive digital consumer experiences. Synthesis Humans offers 100K unique identities, dozens of body types, and millions of clothing combinations to enable ML engineers to develop robust models for human body form and pose. Synthesis Humans provides fine-grained subsegmentation controls over face, body, clothing and accessories.

VFX. Creating realistic character motion and facial movements requires complex motion capture systems and facial rigs. New AI models are in development to capture body pose, motion, and detailed facial features without the use of expensive and proprietary lighting, rigging and camera systems. AI is also automating much of the labor-intensive process of hand animation, background removal, and effects animation. Synthesis Humans is able to provide the needed diverse video data with detailed 3D labels to enable the development of core AI models.

AI fitness. The ability of computer vision systems to assess pose and form will usher in a new era in fitness, where virtual coaches are able to provide real-time feedback. For these models to work accurately and robustly, detailed 3D labeled human data is required across body types, camera positions, environments, and exercise variations. Synthesis Humans deliver vast amounts of detailed human body motion data to catalyze the development of new AI fitness applications for both individual and group training activities.

Synthesis Scenarios

Synthesis Scenarios is the first synthetic data technology that enables complex multi-human simulations across a varied set of environments. With fine-grained controls, computer vision teams can craft data scenarios to support sophisticated multi-human model development. Synthesis Scenarios enables new ML applications in areas with multi-person applications, where more than one person needs to be accounted for, analyzed, and modeled. Emerging applications for computer vision use cases incorporating more than a single person for model training and deployment include:

Autonomy & pedestrian detection. Safety is key to the deployment and widespread use of autonomous vehicles. The ability to detect, understand intent, and react appropriately to pedestrians is essential for safe and robust performance. Synthesis Scenarios provides detailed multi-human simulation to enable the development of more precise and sophisticated pedestrian detection and behavioral understanding across ages, body shapes, clothing and poses.

AR/VR/Metaverse: AR/VR and metaverse applications require vast amounts of diverse, labeled data, particularly when multiple people and their avatars are interacting virtually. Synthesis Scenarios supports the development of multi-person tracking and interaction models for metaverse applications.

Security. Synthesis Scenarios enables the simulation of complex multi-human scenarios across environments, including home, office and outdoor spaces,  enabling for the cost-effective and privacy-compliant development of access control and security systems. Camera settings are fully configurable and customizable.

Teleconferencing. With the surge in remote work, we are dependent on high-quality video conferencing solutions. However, low-bandwidth connections, poor image quality and lighting, and lack of engagement analysis tools significantly degrade the experience. Synthesis Scenarios can train new machine learning models to improve video quality and the teleconferencing experience, with advanced capabilities for hairstyles, clothing, full body pose landmarks, attention monitoring, and multiple camera angles.

Synthesis AI was built by ML practitioners for ML practitioners. If you’ve got a human-centric computer vision project that might benefit from synthetic data, and you’ve exhausted all of the publicly available datasets to train your ML models, we can help. Reach out to set up a quick demo and learn how to incorporate synthetic data into your ML pipeline.

Continue reading
Synthetic Data, Technical Standards & the Metaverse

What role should standards play in the development of the Metaverse? That’s the question we’ll be tackling at our upcoming panel discussion, “Technical Standards for the MetaVerse” as part of the MetaBeat event taking place online and in San Francisco on October 4, 2022. Synthesis AI CEO and founder Yashar Behadi will be joined onstage by Rev Labaredian, NVIDIA; Neil Trevitt, Khronos Group; and Javier Bello Ruiz, IMVERSE. Dean Takahashi, lead writer for GamesBeat, moderates.

Continue reading
CVPR ‘22, Part IV: Synthetic Data Generation

We continue the long series of reviews for CVPR 2022 papers related to synthetic data. We’ve had three installments so far, devoted to new datasets, use cases for synthetic data, and a very special use case: digital humans. Today, we will discuss papers that can help with generating synthetic data, so expect a lot of 3D model reconstruction, new generative models, especially in 3D, and generally a lot of CGI-related goodness (image generated by DALL-E-Mini by with the prompt “robot designer making a 3D mesh”).

Continue reading