Generative AI IV: CLIP and multimodal retrieval

Last time, we discussed DALL-E, a model that brings together text Transformers and a discrete VAE for images. While DALL-E was a huge step forward and generated a lot of buzz for generative AI back in 2021, modern generative models such as DALL-E 2 consist of different components. One of them is usually a multimodal encoder that maps different modalities (e.g., text and images) into the same latent space. Today, we discuss such encoders and then make an example of a specific practical problem where they have become instrumental over the last couple years: text-video retrieval, that is, searching for video content by text queries.

Generative AI III: DALL·E

Today, we continue our discussion of generative AI, a direction that keeps transforming many different industries. Last time, we reviewed the difference between continuous and discrete latent spaces, and how the VQ-VAE architecture (based on variational autoencoders that we discussed before) manages to learn a discrete latent space, a codebook that  Today, we will put this idea into further practice with our first real text-to-image model, OpenAI’s DALL-E.

Inside AI Ask Me Anything (AMA) with CEO and Founder, Yashar Behzadi

Synthesis AI CEO and Founder Yashar Behzadi recently sat down for an Ask Me Anything with our friends at Inside AI. The discussion was wide-ranging, touching on what’s next for generative AI, how synthetic data is generated and used, overcoming model bias, and the ethics of AI systems. If you’re tinkering with generative AI and need representative synthetic human data for developing ML models ethically, we’d love to talk — contact us any time.

Synthetic Data: The Early Days, Part I

Previously on this blog, we have discussed the data problem: why machine learning may be hitting a wall, how one-shot and zero-shot learning can help, how come reinforcement learning does not need data at all, and how unlabeled datasets can inform even supervised learning tasks. Today, we begin discussing our main topic: synthetic data. Let us start from the very beginning: how synthetic data was done in the early days of computer vision…

