Facial recognition models fail to recognize Black, Middle Eastern, and Latino people more often than those with lighter skin. That’s according to a study by researchers at Witchita State University, who benchmarked popular algorithms trained on datasets containing tens of thousands of facial images.
While the study has limitations in that it investigated models that haven’t been fine-tuned for facial recognition, it adds to a growing body of evidence that facial recognition is susceptible to bias. A paper last fall by University of Colorado, Boulder researchers demonstrated that AI from Amazon, Clarifai, Microsoft, and others maintained accuracy rates above 95% for cisgender men and women but misidentified trans men as women 38% of the time. Independent benchmarks of major vendors’ systems by the Gender Shades project and the National Institute of Standards and Technology (NIST) have demonstrated that facial recognition technology exhibits racial and gender bias and have suggested that current facial recognition programs can be wildly inaccurate, misclassifying people upwards of 96% of the time.
The researchers focused on three models — VGG, ResNet, and InceptionNet — that were pretrained on 1.2 million images from the open source ImageNet dataset. They tailored each for gender classification using images from UTKFace and FairFace, two large facial recognition datasets. UTKFace contains over 20,000 images of White, Black, Indian, and Asian faces scraped from public databases around the web, while FairFace comprises 108,501 photos of White, Black, Indian, East Asian, Southeast Asian, Middle East, and Latino faces sourced from Flickr and balanced for representativeness.
In the first of several experiments, the researchers sought to evaluate and compare the fairness of the different models in the context of gender classification. They found that accuracy hovered around 91% for all three, with ResNet attaining higher rates than VGG and InceptionNet on the whole. But they also report that ResNet classified men more reliably compared with the other models; by contrast, VGG obtained higher accuracy rates for women.
As alluded to, the model performance also varied depending on the race of the person. VGG obtained higher accuracy rates for women excepting Black women and higher rates for men excepting Latino men. Middle Eastern men had the highest accuracy values followed by Indian and Latino men, but Southeast Asian men had high false negative rates, meaning they were more likely to be classified as women rather than men. And black women were often misclassified as male.
All of these biases were exacerbated when the researchers trained the models on UTKFace alone, which isn’t balanced to mitigate skew. (UTKFace doesn’t contain images of people of Middle Eastern, Latino, and Asian descent.) Middle Eastern men obtained the highest accuracy rates followed by Indian, Latino, and White men while Latino women outperformed all other women (followed by East Asian and Middle Eastern women). Meanwhile, the accuracy for Black and Southeast Asian women was reduced even further.
“Overall, [the models] models with architectural differences varied in performance with consistency towards specific gender-race groups … Therefore, the bias of the gender classification system is not due to a particular algorithm,” the researchers wrote. “These results suggest that a skewed training dataset can further escalate the difference in the accuracy values across gender-race groups.”
In future work, the coauthors plan to study the impact of variables like pose, illumination, and makeup on classification accuracy. Previous research has found that photographic technology and techniques can favor lighter skin, including everything from sepia-tinged film to low-contrast digital cameras.