Category: Uncategorized

Top 5 Applications of Synthetic Data

It’s been a while since we last met on this blog. Today, we are having a brief interlude in the long series of posts on how to make machine learning models better with synthetic data (that’s a long and still unfinished series: Part I, Part II, Part III, Part IV, Part V, Part VI). I will give a brief overview of five primary fields where synthetic data can shine. You will see that most of them are related to computer vision, which is natural for synthetic data based on 3D models. Still, it makes sense to clarify where exactly synthetic data is already working well and where we expect synthetic data to shine in the nearest future.

Continue reading
Driving Model Performance with Synthetic Data VI: Real-to-Synthetic

Today we continue the series on using synthetic data to improve machine learning models.This is the sixth part of the series (Part I, Part II, Part III, Part IV, Part V). In this (relatively) short interlude I will discuss an interesting variation of GAN-based refinement: making synthetic data from real. Why would we ever want to do that if the final goal is always to make the model work on real data rather than synthetic? In this post, we will see two examples from different domains that show both why and how.

Continue reading
Driving Model Performance with Synthetic Data V: Synthetic-to-Real Refinement

We continue the series on synthetic data as it is used in machine learning today. This is a fifth part of an already pretty long series (part 1, part 2, part 3, part 4), and it’s far from over, but I try to keep each post more or less self-contained. Today, however, we pick up from last time, so if you have not read Part 4 yet I suggest to go through it first. In that post, we discussed synthetic-to-real refinement for gaze estimation, which suddenly taught us a lot about modern GAN-based architectures. But eye gaze still remains a relatively small and not very variable problem, so let’s see how well synthetic data does in other computer vision applications. Again, expect a lot of GANs and at least a few formulas for the loss functions.

Continue reading
Driving Model Performance with Synthetic Data IV: Gaze Estimation and GANs

With the Christmas and New Year holidays behind us, let’s continue our series on how to improve the performance of machine learning models with synthetic data. Last time, I gave a brief introduction into domain adaptation, distinguishing between its two main variations: refinement, where synthetic images are themselves changed before they are fed into model training, and model-based domain adaptation, where the training process changes to adapt to training on different domains. Today, we begin with refinement for the same special case of eye gaze estimation that kickstarted synthetic data refinement a few years ago and still remains an important success story for this approach, but then continue and extend the story of refinement to other computer vision problems. Today’s post will be more in-depth than before, so buckle up and get ready for some GANs!

Continue reading
Driving Model Performance with Synthetic Data III: Domain Adaptation Overview

Today, I continue the series about different ways of improving model performance with synthetic data. We have already discussed simple augmentations in the first post and “smart” augmentations that make more complex transformations of the input in the second. Today we go on to the next sub-topic: domain adaptation. We will stay with domain adaptation for a while, and in the first post on this topic I would like to present a general overview of the field and introduce the most basic approaches to domain adaptation.

Continue reading
Driving Model Performance with Synthetic Data II: Smart Augmentations

Last time, I started a new series of posts, devoted to different ways of improving model performance with synthetic data. In the first post of the series, we discussed probably the simplest and most widely used way to generate synthetic data: geometric and color data augmentation applied to real training data. Today, we take the idea of data augmentation much further. We will discuss several different ways to construct “smart augmentations” that make much more involved transformations of the input but still change the labeling only in predictable ways.

Continue reading
Driving Model Performance with Synthetic Data I: Augmentations in Computer Vision

Welcome back, everybody! It’s been a while since I finished the last series on object detection with synthetic data (here is the series in case you missed it: part 1, part 2, part 3, part 4, part 5). So it is high time to start a new series. Over the next several posts, we will discuss how synthetic data and similar techniques can drive model performance and improve the results. We will mostly be talking about computer vision tasks. We begin this series with an explanation of data augmentation in computer vision; today we will talk about simple “classical” augmentations, and next time we will turn to some of the more interesting stuff.

Continue reading
Object Detection with Synthetic Data V: Where Do We Stand Now?

This is the last post in my mini-series on object detection with synthetic data. Over the first four posts, we introduced the problem, discussed some classical synthetic datasets for object detection, talked about some early works that have still relevant conclusions and continued with a case study on retail and food object detection. Today we consider two papers from 2019 that still represent the state of the art in object detection with synthetic data and are often used as generic references to the main tradeoffs inherent in using synthetic data. We will see and discuss those tradeoffs too. Is synthetic data ready for production and how does it compare with real in object detection? Let’s find out. (header image source)

Continue reading
Object Detection with Synthetic Data IV: What’s in the Fridge?

We continue the series on synthetic data for object detection. Last time, we stopped in 2016, with some early works on synthetic data for deep learning that still have implications relevant today. This time, we look at a couple of more recent papers devoted to multiple object detection for food and small vendor items. As we will see today, such objects are a natural application for synthetic data, and we’ll see how this application has evolved in the last few years.

Continue reading
Object Detection with Synthetic Data III: Choose Your Cues Wisely

Today, I continue the series on synthetic data for object detection. In the first post of the series, we discussed the object detection problem itself and real world datasets for it, and the second was devoted to popular synthetic datasets of common objects. The time has come to put this data in practice: in this and subsequent posts, we will discuss common contemporary object detection architectures and see how adding synthetic data fares for object detection as reported in literature. In each post, I will give a detailed account of one paper that stands out in my opinion and briefly review one or two more. We begin in 2015.

Continue reading