FAccT 2021. Journalism, data leverage, education, and language models

Summary of Day 3 at FAccT 2021. Julian Anguin’s Markup, language models, measurements, and data average
language models
responsible AI

March 10, 2021

Keynote: Algorithms, Accountability, and Journalism

The final keynote was given by Julia Anguin, an investigative journalist and a co-founder of The Markup, “a nonprofit newsroom that investigates how powerful institutions are using technology to change our society.” I was already familiar with her body of her because she’s famous for her ProPublica article on racial bias in the risk assessment software, COMPAS. Her talk consisted of two parts. First, she talked about how her organization Markup is run and what methods they use to publish investigative journalism reports. Later, she gave several examples of their work from last year, which was mostly on algorithm auditing.

Markup consists of engineers and journalists. It’s interesting to hear her saying that engineers are essentially investigative journalists themselves. They collaborate with journalists in the team but they bring tech expertise to tech reporting. Once Markup has written a report, they have an extensive vetting process (called the “bulletproof” stage) where they actively seek out critiques from various external experts. What’s interesting is that each work product comes with a pair of publications; a news article targeting general public and an extensively detailed methodological write-up. They also have established a practice of publishing datasets and codes so that other people can replicate their analysis and apply those to their own projects.

When she gave examples of their algorithm auditing projects, it was amusing to learn about various novel approaches they have made to probe algorithms. In one project, they simply analyzed search results, but in other projects, they created numerous web accounts to try to reverse-engineer opaque decisions algorithms make. Markup also seemed to put resources into creating tools such as Blacklight that runs privacy tests on virtual browsers for websites to inspect their privacy violations, or Citizen Browser where volunteer citizens use Markup’s app to scrape data from Facebook so that the team can examine various decisions the Facebook app makes.

During the Q&A, one memorable remark she made was all data is political. Whether it’s leaked, accessed through protocols, or publicly available, data collection starts with a certain intention. She said there’s no national database on police violence in the US, which reveals the political will and intention of the US government, which was revealing. To fight this issue, she said it’s really important to know the limitations of data and be very transparent about it.

She mentioned that Markup often tries to co-publish pieces with other news companies for distribution reasons. She said sometimes the process is challenging because these organizations do not have technical experts to review their lengthy methodological write-up. This resonated with me most because this is what I’ve been experiencing while volunteering at a local non-profit. Nevertheless, when one of the audience members asked her about how individual tech workers help, she said the most straightforward way to help journalists it via financially supporting them, especially the local news organizations.

Data Leverage

In the tech world, there’s a tendency to trivialize data labor. Tech companies also conduct problematic practices to collect data. For the public, who are in a very vulnerable position, what can they do? Data Leverage: A Framework for Empowering the Public in its Relationship with Technology Companies explores several ways to leverage the fact that we the public are data providers. The defined the data leverage as power to influence a company held by those who implicitly or explicitly contribute data on which companies rely. The large goal of identifying data leverage is to give more agency to people. They explore three options; data strike, data poisoning, and conscious data contribution. Each options has its pros and cons and some have more complex legal landscape to navigate. Personally I found data poisoning most interesting, which is about inputting fake and random data (somewhat similar to adversarial attacks).


Any attempts to address FAccT topics in mathematical models begin with constructing a measurement; we are trying to measure unobservable and abstract concepts. Measurement and Fairness proposes measurement modeling framework in fairness research in computer science. The authors target fairness specifically because fairness itself is an essentially contested construct; it’s heavily context-dependent and some constructs are conflicting. Besides, in general, determining which measurements and metrics to use requires extreme caution. To mitigate these problems, the authors came up with two criteria; construct reliability and construct validity. The former checks whether similar inputs can return similar measurements. The latter checks whether measurements are meaningful and useful such as whether they capture relevant events, or whether the impact from using the measurements has been addressed.


Since last year, the conference has been including more papers about ethics education. This year, “You Can’t Sit With Us”: Exclusionary Pedagogy in AI Ethics Education stood out. This survey paper collected more than 250 AI ethics courses in computer science curriculums from more than 100 universities around the world and analyzed the pattern. Sadly, they found a predominant pattern of exclusion in many courses. The authors found this exclusion had many shapes; the discipline not valuing other ways of knowing, lack of collaboration with other disciplines, and lack of interest in learning other’s work. The fact that computer science itself can’t solve AI ethics problems, this seemed very worrisome.

Language Models

What happens when an authoritarian government and AI meet? Censorship of Online Encyclopedias: Implications for NLP Models explores how censorship in training data influence downstream processes in NLP applications using Chinese language models. One of the dataset they looked into was Baidu Baike, a censored language dataset that is often used in Chinese language models. They first checked the word embeddings and examined the position of words such as democracy, surveillance, social control, CCP (Chinese Communist Party) with respect to positive and negative words. They found that democracy often appeared with negative words and the rest in the example were the opposite. They also found similar pattern in a sentiment classification application that uses web news headlines. These results were concerning because as the authors addressed, these applications can be used to monitor public opinion and curate social media posts, essentially as a highly effective propaganda machine in a massive scale.

For those who’ve been following the news of Timit Gebru, an AI ethics researcher who was fired by Google, On the Dangers of Stochastic Parrots:Can Language Models Be Too Big? was the paper that was at the core. The paper explores potential risks of the current trend in language modeling where researchers and practitioner pay more attention to larger models. As shown in the paper, these models have billions, sometimes trillions of parameters. The first risk the authors bring up is environmental and financial cost. Studies have shown that training a single BERT base model (without tuning) requires as much energy as a trans-American flight. This gets more problematic if you become aware that most language models serve English-speaking communities and the most impacted communities from climate change are not those. There are more sinister risks too such as training data still lacking diversity (i.e., “big” doesn’t mean “diverse.”) and lacking oversight, which creates harmful effects. Plus, since many efforts and resources in ML community go into building large language models, other research topics are naturally overlooked. The authors suggest simple mitigation strategies - step back and think; evaluate various approaches, ask yourself if we really need large models, and conduct pre-mortem analysis.