Artificial intelligence is barely limited by international borders, as businesses, universities and governments tap into a global pool of ideas, algorithms and talent. Yet the AI the programs that resulted from this global gold rush may still reflect deep cultural divisions.
New research shows how government censorship affects AI algorithms and can influence applications created with these algorithms.
Margaret roberts, a political science professor at UC San Diego, and Eddie Yang, a doctoral student there, examined AI language algorithms trained on two sources: the Chinese version of Wikipedia, which is blocked in China; and Baidu Baike, a similar site operated by China’s dominant search engine Baidu, which is subject to government censorship. Baidu did not respond to a request for comment.
The researchers were curious whether the censorship of certain words and phrases could be learned by AI algorithms and find their way into software that uses those algorithms. This can influence the language a chatbot or voice assistant uses, the wording by a translation program, or the text in autocomplete tools.
The type of language algorithm used learns by analyzing how words appear together in large amounts of text. It represents different words as connected nodes in a physical space; the closer the words, the more similar their meaning.
A translation program can infer the meaning of an unknown word by examining these relationships in two different languages, for example.
UCSD researchers found key differences in the resulting AI algorithms that the researchers said appear to reflect information censored in China. For example, the one formed on Chinese Wikipedia represented “democracy” closer to positive words, such as “stability”. The algorithm formed on Baike Baidu represented “democracy” closer to “chaos”.
Roberts and Yang then used the algorithms to construct two programs to assess the sentiment – the positive meaning versus the negative meaning – of the headlines. They found that one person trained on Chinese Wikipedia gave more positive scores to titles that mentioned terms such as “election,” “freedom” and “democracy,” while one trained on Baidu Baike assigned more positive scores to titles. featuring “surveillance”, “social control” and “CCP”. The study will be presented at 2021 Conference on Fairness, Accountability and Transparency (FAccT) in March.
In recent years, researchers have highlighted how racial and gender prejudices can hide in many artificial intelligence systems. Algorithms trained on text taken from the web or old books, for example, learn to reproduce biases posted by human authors of this text. In 2018, Google researchers demonstrates cultural biases in image recognition algorithms, which can, for example, only recognize Western wedding scenes.
Roberts notes that the differences seen in their study may not be entirely due to government censorship. Some may be the result of self-censorship or simply of cultural differences between those who write the articles in the encyclopedia. But she says it’s important to recognize that government policy can lead to other forms of bias in AI systems. “We see this as a starting point for trying to understand how government-formed training data appears in machine learning,” says Roberts.
Roberts argues that researchers and policymakers need to think about how governments in the future might influence how AI systems are formed in order to make censorship more effective or to export particular values.
Graeme hirst, a professor at the University of Toronto who specializes in computational linguistics and natural language processing, has some qualms with the study methodology. Without carefully studying the differences between Chinese Wikipedia and Baidu Baike, Hirst says, it’s difficult to attribute variations in algorithms to censorship. It’s also possible that Chinese Wikipedia contains anti-Chinese or overtly pro-democracy content, he says. Hirst adds that it is not clear how the sentiment analysis was carried out and whether a bias may have been introduced to it.
Others see it as a welcome contribution on the ground.
“In a sense, it’s not surprising,” says Suresh Venkatasubramanian, a professor at the University of Utah who studies the ethics of AI and co-founded the FAcct conference.
Venkatasubramanian points out that AI algorithms trained on Western news articles might contain their own anti-Chinese biases. “But I think it’s still important to do the work to show that it’s happening,” he says. “Then you can start to wonder how it looks, how to measure it, what does it look like, etc.”
More WIRED stories