Bias Reduction
in Any Application

In most datasets, males, light-skinned people, and adults aged 18-40 are overrepresented. Bias is bound to occur in any real-life data collection method because of geographic constraints. It’s simply too time consuming to legally collect enough data at scale that balances permutations of skin tone, gender, and age–not to mention camera position and lighting variations. Our datasets help fill in the gaps and re-balance your model.

Skin Tone

Our synthesized identities represent people from all around the world, including multi-racial ethnicities. Pick and choose the ones you need to balance your existing model, or get a fully balanced set from us, ready-to-go.

Procedural Facial Attributes

Even in 2020, the largest providers of Vision APIs like Google, Microsoft, and IBM still have significant gender bias in their results*. Easily generate fully-accessorized males and females with our programmatic APIs, or download a pre-made dataset to make your models more fair. * According to 2020 research by Wunderman Thompson

Age

Our datasets feature adults 18 and beyond, in an even distribution in combination with other attributes. With our API, you can fine-tune our images to exactly what your model needs to reduce its bias.

Camera Angle & Pose

Publically available datasets primarily feature subjects from a frontal viewpoint, like pictures you might see taken on the red carpet. But in most situations, either the camera isn’t head-on, or people aren’t still and posing for a shot. Our datasets provide camera angles and head poses from every-which-direction.

Lighting & Environment

Pictures taken by real people in real situations are rarely as well lit as a studio or have professional flash hardware available. With fully customizable environments and lights, you can be sure your model robustly handles any shot.

100x Cheaper.
1000x Faster Turnaround.

The average amount spent on single image for full-segmentation is $6.40* – any additional labels cost more above and beyond that. Our synthetic data provides full-segmentation, landmarks, surface normals, and more – for as little as $0.03 per image.

Of course, that’s only the labeling cost. Procuring the images to label is incredibly time-consuming as well. It can take weeks or months to legally collect diverse images of individuals’ faces for most companies. Our datasets are available immediately, and our programmatic API returns generated images and labels in minutes to hours.

*Based on scale.ai pricing, January 2021.

Better. Stronger.
Not only do our datasets provide training data affordably and nearly instantaneously, they do so much more than human-collected & labeled data. So you can build more advanced, more ethical computer vision models.
Pixel-Perfect Accuracy
100% accurate ground truth–every time. Eliminate your QA step on every label.
Privacy-First
Get peace of mind: with non-real humans, privacy concerns are history.
Less Bias
Even sampling across skin tones, ethnicities, and ages, for more ethical machine vision.
New Label Types
Use cutting edge models with depth, normals, dense 3D landmarks, & subsegmentation.
Broader Distributions
Combine identities, hair styles, facial hair, makeup, hats, glasses, face masks, lighting conditions, and camera angles for trillions of possibilities – all at the speed of writing JSON with our API.
Get Going Fast
Check out our snippets to jump-start the training process.
FaceApiDataset
from face_api_dataset import FaceApiDataset, Modality
dataset = FaceApiDataset("test_dataset")
item = dataset[0]

plt.figure(figsize=(20,20))
plt.imshow(item[Modality.RGB])

plt.figure(figsize=(20,20))
landmark_show(item[Modality.RGB], item[Modality.LANDMARKS])
Domain Adapt
As with all synthetic data, there’s a shift from our domain to the one captured by real cameras. Although there’s no universal domain adaptation approach for every use-case, we stand on the shoulders of giants to get great results.
Adaptive Batch Normalization
Adaptive Batch Normalization is a simple technique, can be easily applied to any network with batch normalization layers, and combined with all other techniques for surprisingly good results.
Adversarial Domain Adaptation
Adversarial domain adaptation and its modifications for particular tasks usually result in strong improvement. The downside is that it typically requires heavy pipeline modifications.
Refinement
Image-2-image translation methods coupled with self-regularization loss allows dataset-level refinement. While these methods require additional pipeline to train, it is completely independent and does not require modifications of the main training pipeline.
Combined methods
For the best results all the methods above typically should be combined together.
Ready to Grow With You
We’re here to help you create your solution with the help of our programmatic data platform.
Scales out of the box

Our technology seamlessly scales in the cloud with our customers’ demands, from R&D phases with small amounts of data to production requirements of terabytes of data.

With everything available via an API, we integrate seamlessly with your workflows from day 1.

Learn More about our API

Machine Learning Development Support

If your team needs a little more machine learning muscle, our experts are ready to jump in. We’ll help reduce your time to market, so don’t hesitate to reach out.

Contact Us