June 29, 2020

Synthetic Data Research Review: Context-Agnostic Cut-and-Paste

We have been talking about the history of synthetic data for quite some time, but it’s time to get back to 2020! I’m preparing a new series, but in the meantime, today we discuss a paper called “Learning From Context-Agnostic Synthetic Data” by MIT researchers Charles Jin and Martin Rinard, recently released on arXiv (it’s less than a month old). They present a new way to train on synthetic data based on few-shot learning, claiming to need very few synthetic examples; in essence, their paper extends the cut-n-paste approach to generating synthetic datasets. Let’s find out more and, pardon the pun, give their results some context.

June 11, 2020

Synthetic Data for Early Robots, Part II: MOBOT and the Problems of Simulation

Last time, we talked about robotic simulations in general: what they are and why they are inevitable for robotics based on machine learning. We even touched upon some of the more philosophical implications of simulations in robotics, discussing early concerns on whether simulations are indeed useful or may become a dead end for the field. Today, we will see the next steps of robotic simulations, showing how they progressed after the last post with the example of MOBOT, a project developed in the first half of the 1990s in the University of Kaiserslautern. This is another relatively long read and the last post in the “History of Synthetic Data” series.

May 19, 2020

Synthetic Data for Robots, Part I: Are Simulations Good For Robotics?

In the previous two blog posts, we have discussed the origins and first applications of synthetic data. The first part showed how early computer vision used simple line drawings for scene understanding algorithms and how synthetic datasets were necessary as test sets to compare different computer vision algorithms. In the second part, we saw how self-driving cars were made in the 1980s and how the very first application of machine learning in computer vision for autonomous vehicles, the ALVINN system, was trained on synthetic data. Today, we begin the discussion of early robotics and the corresponding synthetic simulators… but this first part will be a bit more philosophical than usual.

May 5, 2020

Synthetic Data: The Early Days, Part II

We continue from last time, when we began a discussion of the origins and first applications of synthetic data: using simple artificial drawings for specific problems and using synthetically generated datasets to compare different computer vision algorithms. Today, we will learn how people made self-driving cars in the 1980s and see that as soon as computer vision started tackling real world problems with machine learning, it could not avoid synthetic data.

April 23, 2020

Synthetic Data: The Early Days, Part I

Previously on this blog, we have discussed the data problem: why machine learning may be hitting a wall, how one-shot and zero-shot learning can help, how come reinforcement learning does not need data at all, and how unlabeled datasets can inform even supervised learning tasks. Today, we begin discussing our main topic: synthetic data. Let us start from the very beginning: how synthetic data was done in the early days of computer vision…

April 14, 2020

The Data Problem IV: Can Unlabeled Data Help?

In the first three posts of this series, we have seen several ways to overcome the data problem in machine learning: first we posed the problem, then discussed one-shot and zero shot learning, and in the third post presented the reinforcement learning way of using no data at all. In this final installment, we discuss the third direction that modern machine learning takes to help with the lack of labeled data: how can we use unlabeled data to help inform machine learning models?

April 7, 2020

The Data Problem III: Machine Learning Without Data

Today, we continue our series on the data problem in machine learning. In the first post, we realized that we are already pushing the boundaries of possible labeled datasets. In the second post, we discussed one way to avoid huge labeling costs: using one-shot and zero-shot learning. Now we are in for a quick overview of the kind of machine learning that might go without data at all: reinforcement learning.

March 30, 2020

The Data Problem II: One-Shot and Zero-Shot Learning

In the previous post, we posed what we consider the main problem of modern machine learning: increasing appetite for data that cannot be realistically satisfied if current trends persist. This means that current trends will not persist — but what is going to replace them? How can we build machine learning systems at ever increasing scale without increasing the need for huge hand-labeled datasets? Today, we consider one possible answer to this question: one-shot and zero-shot learning.

March 23, 2020

The Data Problem Part I: Issues and Solutions

Today, we are kicking off the Synthesis AI blog. In these posts, we will speak mostly about our main focus, synthetic data, that is, artificially created data used to train machine learning models. But before we begin to dive into the details of synthetic data generation and use, I want to start with the problem setting. Why do we need synthetic data? What is the problem and are there other ways to solve it? This is exactly what we will discuss in the first series of posts.

March 6, 2020

ClearGrasp: Our Collaboration with Google Robotics

Optical 3D range sensors, like RGB-D cameras and LIDAR, have found widespread use in robotics to generate rich and accurate 3D maps of the environment, from self-driving cars to autonomous manipulators. However, despite the ubiquity of   these complex robotic systems, transparent objects (like a glass container) can confound even a suite of expensive sensors that are commonly used.