How computers see faces and other objects?

Female head with biometric markers (Photo:VCG)

Computers started to be able to recognize human faces in images decades ago, but now artificial intelligence systems are rivaling people’s ability to classify objects in photos and videos.

That’s sparking increased interest from government agencies and businesses, which are eager to bestow vision skills on all sorts of machines. Among them: self-driving cars, drones, personal robots, in-store cameras and medical scanners that can search for skin cancer. There are also our own phones, some of which can now be unlocked with a glance.

How does it work?

Algorithms designed to detect facial features and recognize individual faces have grown more sophisticated since early efforts decades ago.

A common method has involved measuring facial dimensions, such as the distance between the nose and ear or from one corner of the eye to another. That information can then be broken down into numbers and matched to similar data extracted from other images. The closer they are, the better they match.

Such analysis is now aided by greater computing power and huge troves of digital imagery that can be easily stored and shared.

From faces to objects (and pets)

“Face recognition is an old topic. It’s always been pretty good. What really got everyone’s attention is object recognition,” says Michael Brown, a computer science professor at Toronto’s York University who helps organize the annual Conference on Computer Vision and Pattern Recognition.

Research over the past decade has focused on the development of brain-like neural networks that can automatically “learn” to recognize what’s in an image by looking for patterns in big data sets. But humans continue to help make machines smarter by labeling photos, as happens when Facebook users tag a friend.

An annual image recognition competition that lasted from 2010 to 2017 drew top researchers from companies like Google and Microsoft. Among the revelations: computers can do better than humans at distinguishing between various Welsh corgi breeds, in part because they’re better able to absorb the knowledge it takes to make those distinctions quickly.

But computers have been confused by more abstract forms, such as statues.

The "coded gaze"

The growing use of face recognition by law enforcement has highlighted longstanding concerns about racial and gender bias.

A study led by MIT computer scientist Joy Buolamwini found that face recognition systems built by companies including IBM and Microsoft were much more likely to misidentify darker-skinned people, especially women. (Buolamwini called this effect “the coded gaze.“) Both Microsoft and IBM recently announced efforts to make their systems less biased by using bigger and more diverse photo repositories to train their software.