The role of synthetic data in developing solutions for autonomous driving is hard to understate. In a recent post, I already touched upon virtual outdoor environments for training autonomous driving agents, and this is a huge topic that we will no doubt return to later. But today, I want to talk about a much more specialized topic in the same field: driver safety monitoring. It turns out that synthetic data can help here as well—and today we will understand how. This is a companion post for our recent press release.
What Is Driver Safety Monitoring and How Manufacturers Are Forced to Care
Car-related accidents remain a major source of fatalities and trauma all around the world. The United States, for instance, has about 35000 motor vehicle fatalities and over 2 million injuries per year, which may pale in comparison to the COVID pandemic or cancer but still sounds like a lot of unnecessary suffering.
In fact, significant progress has been achieved in reducing these deaths and injuries over the last years. Here are the statistics of road traffic fatalities in Germany over the last few years:
And here is the same plot for France (they both stop at 2019 because it would be really unfair to make road traffic comparisons in the times of overwhelming lockdowns):
Obviously, the European Union is doing something right in their regulation of road traffic. A large part of it are new safety measures that are gradually made mandatory in the EU. And the immediate occasion for this post are new regulations regarding driver safety monitoring.
Starting from 2022, it will be mandatory for the European Union car manufacturers to install the following safety features: “warning of driver drowsiness and distraction (e.g. smartphone use while driving), intelligent speed assistance, reversing safety with camera or sensors, […] lane-keeping assistance, advanced emergency braking, and crash-test improved safety belts”. With these regulations, the European Commission plans to “save over 25,000 lives and avoid at least 140,000 serious injuries by 2038”.
On paper, this sounds marvelous: why not have a system that wakes you up if you fall asleep behind the wheel and helps you stay in your lane when you’re distracted. But how can systems like this work? And where’s the place of synthetic data in this? Let’s find out.
Driver Drowsiness Detection with Deep Learning
We cannot cover everything, so let’s dive into details for one specific aspect of safety monitoring: drowsiness detection. This is a key part of both new regulations and actual car accidents: falling asleep at the wheel is very common. You don’t even have to be completely asleep: 5-10 seconds of what is called a microsleep episode will be more than enough for an accident to occur. So how can a smart car notice that you are about to fall asleep and warn you in time?
The gold standard of recognizing brain states such as sleep is, of course, electroencephalography (EEG), that is, measuring the electrical activity of your brain. Recent research has applied deep learning to analyzing EEG data, and it appears that even relatively simple solutions based on convolutional and recurrent networks are enough to recognize sleep and drowsiness with high certainty. For instance, a recent work by Zurich researchers Malafeev et al. (2020) shows excellent results in the detection of microsleep episodes with a simple architecture like this:
But short of requiring all drivers to wear a headpiece with EEG electrodes, this kind of data will not be available in a real car. EEG is commonly used to collect and label real datasets in this field but we need some other signal for actual drowsiness detection.
There are two actual signals that are both important here. First, steering patterns: a simple sensor can track the steering angle and velocity, and then you can develop a system that recognizes troubling patterns in the driver’s steering. For example, if a driver is barely steering at all for some time, and then returns the car on track with a quick jerking motion, that’s probably a sign that the driver is getting sleepy or distracted. Leading manufacturers such as Volvo, Bosch, and others are already presenting solutions based on steering patterns.
Steering patterns, however, are just one possible signal, and a quite indirect one. Moreover, once you have in place another component of the very same EU regulations, automatic lane-keeping assistance, steering becomes largely automated and these patterns stop working. A much more direct idea would be to use computer vision to detect the signs of drowsiness on the driver’s face.
When Volvo introduced their steering-based system in 2007, their representative said: “We often get questions about why we have chosen this concept instead of monitoring the driver’s eyes. The answer is that we don’t think that the technology of monitoring the driver’s eyes is mature enough yet.” By 2021, computer vision has progressed a lot, and recent works on the subject show excellent results.
The most telling sign would be, of course, detecting that the driver’s eyes are closing. There is an entire field of study devoted to detecting closed eyes and blinking (blinks get longer and more often when you’re drowsy). In 2014, Song et al. presented the now-standard Closed Eyes in the Wild (CEW) dataset, modeled after the classical Labeled Faces in the Wild (LFW) dataset but with eyes closed; here is a sample of CEW (top row) and LFW (bottom row):
Since then, eye closedness and blinking detection has steadily improved, usually with various convolutional pipelines, and by now it is definitely ready to become an important component in car safety .
We don’t have to restrict ourselves only to the eyes, of course. The entire facial expression can provide important clues (did you yawn while reading this?). For example, Shen et al. (2020) recently proposed a multi-featured pipeline that has separate convolutional processing streams for the driver’s head, eyes, and mouth:
Another important recent work comes from Affectiva, a company we have recently collaborated with on eye gaze estimation. Joshi et al. (2020) classify drowsiness based on facial expressions as captured in a 10-second video that might have the driver progress between different states of drowsiness. Their pipeline is based on features extracted by their own SDK for recognizing facial expressions:
All of these systems are not perfect, of course, but it is clear by now that computer vision can provide important clues to detect and evaluate the driver’s state and trigger warnings that can help avoid road traffic accidents and ultimately save lives. So where does synthetic data come into this picture?
Synthetic Data for Drowsiness Detection
On this blog, we have discussed many times (e.g., recently and very recently) what are the conditions under which synthetic data especially shines in computer vision. These conditions include situations where existing real datasets may be biased, environmental features that are not covered in real data (different cameras, lighting conditions etc.), and generally situations that call for extensive variability and randomization which is much easier to achieve in synthetic data than in real datasets.
Guess what: driver safety is definitely one of those situations! First, cameras that can be installed in real cars shoot from positions that are far from standard for usual datasets. Here are some frames from a sample video that Joshi et al. processed in the paper we referenced above:
Compare this with, say, standard frontal photographs characteristic for Labeled Faces in the Wild that we also showed above; obviously, there is some domain transfer needed between these two situations, while a synthetic 3D model of a head can be shot from any angle.
Second, where will real data come from? We could collect real datasets and label them semi-automatically with the help of EEG monitoring, but that would be far from perfect for computer vision model training because real drivers will not be wearing an EEG device. Also, real datasets of this kind will inevitably be very small: it is obviously very difficult and expensive to collect even thousands of samples of people falling asleep at the wheel, let alone millions.
Third, you are most likely to fall asleep when you’re driving at night, and night driving means your face is probably illuminated very poorly. You can use NIR (near-infrared) or ToF NIR (time-of-flight near-infrared) cameras to “see in the dark”. But pupils (well, retinas) act differently in the NIR modality, and this effect can be different across different ethnicities. This kind of different camera modalities and challenging lighting is, again, something that is relatively easy to achieve in synthetic datasets but hard to find in real ones. For example, available NIR datasets such as NVGaze or MRL Eye Dataset are done for AR/VR, not from an in-car camera perspective.
That is why here at Synthesis AI we are moving into this (see our recent press release), and we hope to make important contributions that will make road traffic safer for all of us. We are already collaborating with automobile and autonomous vehicle manufacturers and Tier 1 suppliers in this market.
To make this work, we will need to make an additional effort to model car interiors, cameras used by car manufacturers, and other environmental features, but the heart of this project remains in the FaceAPI that we have already developed. This easy-to-use API can produce millions of unique 3D models that have different combinations of identities, clothing, accessories, and, importantly for this project, facial expressions. FaceAPI is already able to produce a wide variety of emotions, including, of course, closed eyes and drowsiness, but we plan to further expand this feature set.
Here is an example of our automatically generated synthetic data from an in-car perspective, complete with depth and normal maps:
Synthetic Data for Driver Attention
But you don’t have to literally fall asleep to cause a traffic accident. Unfortunately, it often suffices to get momentarily distracted, look at your phone, take your hands off the wheel for a second to adjust your coffee cup… all with the same, sometimes tragic, consequences. Thus, another, no less important application of computer vision for driver safety is monitoring driver attention and possible distractions. This becomes all the more important as driverless cars become increasingly common, and autopilots take up more and more of the total time at the wheel: it is much easier to get distracted when you are not actually driving the car.
First, there is the monitoring of large-scale motions such as taking your hands off the wheel. This falls into the classical field of scene understanding (see, e.g., Xiao et al. (2018)): “are the driver’s hands on the wheel” is a typical scene understanding question that goes beyond simple object detection of both hands and the wheel. Answering these questions, however, usually relies upon classical computer vision problems such as instance segmentation.
Second, it is no less important to track such small-scale motions as eye gaze. Eye gaze estimation is an important computer vision problem that has its own applications but is also obviously useful for driver safety. We have already discussed applications of synthetic data to eye gaze estimation on this blog, with a special focus on domain adaptation.
Obviously, all of these problems belong to the field of computer vision, and all standard arguments for the use of synthetic data apply in this case as well. Thus, we expect that synthetic data produced by our engines will be extremely useful for driver attention monitoring.In the next example, also produced by FaceAPI, we can compare a regular RGB image and the corresponding near-infrared image for two drivers who may be distracted. Note that eye gaze is also clearly seen in our synthetic pictures, as well as larger features:
There’s even more that can be varied parametrically. Here are some examples with head turn, yawing, eye closure, and accessories like face masks and glasses.
In total, we strongly believe that high-quality synthetic data for computer vision systems can help advance security systems for car manufacturers and help reduce road traffic accidents not only in the European Union but all over the world. Here at Synthesis AI, we are devoted to removing the obstacles to further advances of machine learning—especially for such a great cause!