Group develops tool to prevent machine-learning bias, regulatory risk

14 June 2017. By Mike Swift

A report released last year by the Federal Trade Commission cautioned that "Big Data," while it carries the promise of significant benefits for consumers, could also allow personal information to be unwittingly used in ways that discriminate or increase inequality.

The FTC's January 2016 "Big Data" report posed four policy questions company executives need to answer to avoid being hit with a possible FTC enforcement action, including: How representative is your data?

Now, a prominent digital rights group is trying to go further, building a first-of-a-kind tool intended to help programmers and executives prevent a biased algorithm or flawed database from creating a commercial product that could harm consumers, or trigger regulatory problems for the company that created it.

"What prompted the tool is the fact that algorithms are being used to make very significant decisions about people and their lives — decisions about credit and housing and what opportunities people have access to," said Natasha Duarte, a policy analyst with the Center for Democracy and Technology who is leading the effort. "Even things that can seem trivial, like advertising, can be problematic when you're talking about excluding specific groups of people."

The CDT tool will consist of a checklist of more than 50 detailed questions that companies and data scientists can ask themselves about their data and the algorithms, including, "Was your data collected by people or a system that was operating with quotas or a particular incentive structure?" And, "Who is under-represented or absent from your data?"

Ultimately, CDT hopes to create an interactive online product, but Duarte said in an interview with MLex that the organization is still waiting on a grant application to fund the design of an interactive, online tool. For now, Duarte and her team in the coming weeks are trying to finalize the text and organization of the questions before CDT makes them public. She first revealed CDT's plan at a recent conference at Facebook for privacy lawyers and researchers.

Amid an explosion in the use of machine-learning products intended to allow businesses to interpret vast troves of data, there have been some notable early artificial intelligence failures.

In 2015, Google was forced to apologize after news reports surfaced that its newly launched artificial intelligence-driven Photo app identified black people as "gorillas."

Investigative reporters at the Seattle Times discovered that LinkedIn searches entered for a female name often prompted suggestions for a similar sounding male name, but almost never vice versa. The bias was quickly fixed by LinkedIn.

In 2016, an A.I.-driven online "chatbot" named "Tay" created by a Microsoft research team to improve the company's understanding of conversational language soon began associating "feminism" and "cults" and making racist statements after it ingested the content of millions of unfiltered postings on the social network. Microsoft's goal had been to create an autonomous entity on Twitter that was supposed to sound like a human teenager.

More recently, a Massachusetts A.I. researcher trying to train an algorithm to organize restaurant reviews noticed it was improperly downgrading its ratings for Mexican restaurants. He realized that by trying to train his algorithm on natural language speech by having it read the Web, he had biased the algorithm into downgrading its rating of the restaurants, because the words "Mexican" and "illegal" were frequently associated on the Web.

Despite those high-profile failures, companies are jumping into the new world of A.I. Research group Forrester predicted that investment in artificial intelligence systems would jump by 300 percent between 2016 and 2017.

The FTC's "Big Data" report did not discourage the use of artificial intelligence, and there has been no enforcement action to date. At the time, Maureen Ohlhausen, now acting FTC chairman, said the report gave "undue credence to hypothetical harms."

But more recently, Terrell McSweeny, the Democratic member of the FTC, said companies that use A.I. need to do a better job of explaining how it will be used and should incorporate ethics by design.

Increasingly, A.I. is not just the exclusive province of large tech companies such as Google or Microsoft, but is within the reach of much smaller companies, and a much wider population of engineers. That means that in-house lawyers would be wise to stay in active touch with their engineers' use of A.I.

"Five years ago, machine learning was something only Google did for products; three years ago Yahoo was doing it for marketing. Now, even summer interns are doing it, and it gets good results in many domains," said Luis Villa, a former in-house lawyer at Mozilla and the Wikimedia Foundation.

"Therefore, as lawyers, your [Silicon Valley] company is almost certainly experimenting with it, and maybe using it in production, so you need to proactively find out about it, since they won't think to ask you. It's quickly becoming standard," said Villa, echoing a warning he issued as a speaker at a technology legal conference at Stanford University on Monday.

CDT believes many companies using machine learning want to protect themselves from bringing flawed products to market.

"A lot of the time, the harms from biased automatic decision-making systems are unintentional," Duarte said. "It's not that companies and engineers are setting out to discriminate. They want to do the right thing. They want to build tools that work. So what we wanted to do was give them ... the tools that allow them to think about raising questions about bias."

Countdown to the GDPR