Algorithms Are Biased. This Face Recognition Founder Plans To Provide Data Sets For People Of Color
In face recognition, there’s a huge problem around diversity and the ability of algorithms to understand different colors, different shades and genders.
It’s called algorithmic bias, and it has had tragic consequences for Black people and people of color in the U.S., where 50 percent of us have our faces in databases that may be available to law enforcement.
What’s the reason for the algorithmic bias in face recognition? It’s insufficient training data to teach the algorithm what women are vs men, and what people of darker shades are vs lighter shades, according to Brian Brackeen, CEO and founder of Miami-based face recognition firm Kairos.
Brackeen announced today at SXSW (South By Southwest) that his company plans to produce a data set with people that have been traditionally marginalized.
He plans to make it freely available to the entire world, he said, “so that all of us can create better algorithms that reflect all humanity.”
If Kairos succeeds, it will be a world first, Brackeen told Moguldom on the sidelines of a SXSW session entitled “Face Recognition: Please Search Responsibly.”
Brackeen spoke on a panel at SXSW with Clare Garvie, founding executive director of the Center on Privacy & Technology at Georgetown Law; and Arun Ross, a professor in the Department of Computer Science and Engineering at Michigan State University.
“That’s incredibly good news,” Garvie said in response to Brackeen’s announcement.
“One of the challenges that developers face is an absence of good diverse data sets,” Garvie told Moguldom. “The idea that Kairos or anyone would take the initiative to create a diverse dataset and then make it available to anyone who needs that is incredible. It’s really great. I hope and I trust that they’ve put the thought into it to make sure that it is indeed diverse and it is large enough. I look forward to seeing what they’re creating.”
Garvie said she’s not aware of anyone making a data set public like this. There are some data sets that have historically been public such as certain mugshot datasets.
“But as far as I’m aware, these have not been rigorously designed to be diverse in terms of gender, race and age, etc.,” Garvie said.
Right now, most data sets like the ones Microsoft and Google use come from university campuses and they’re trained on the population at universities, Brackeen said. “So if a campus like USC has more white males than other groups, the algorithm will know better how to identify white males than black female faces.
“What we’re saying is we’re going to grab all this data from all over the world and we’re going to share it so the algorithm can perform it.”
The way Brackeen describes it, he makes it sound easy.
“It’s really hard,” he said. “You can’t just go out and take images off the internet. You’ve got to go out, you’ve got to ask people, you’ve got to pay people to offer images, different angles, different lighting, and then collect all those images into one thing, and say ‘This is a male, this is female, this is a person from Nigeria vs Canada.'”
The labeling and collection are time-consuming and expensive, Brackeen said, “and we’re going to do that for everyone because we think it’s very important.”
Everyone in the world?
“A million images or 10 million images,” Brackeen said. He plans to pay people for those images but then make the datasets available to everyone so the algorithms can learn from the images.
Is he going to pay mainly people of color?
“We’re going to pay people of all races, all shades, all genders, but yes, making sure they’re all equally represented in the data,” he said.
Brackeen said he expects the project to take more than a year to complete. “We’re looking at the logistics right now,” he said. “I hope people share this information so they can make the world a better place — a more equitable place.”