Critiquing and rethinking at ACM FAT 2020

ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*) is a computer science conference that aims to bring together “researchers and practitioners interested in fairness, accountability, and transparency (FAccT) in sociotechnical systems” to nurture multidisciplinary approach and collaborations. Fortunately, I was able to attend the conference every year since 2018 when it first started. I attended the entirety of the conference, which includes tutorials, main conference, and the Critiquing and Rethinking Accountability, Fairness and Transparency (CRAFT) session. (FYI, The conference affiliated with ACM in 2019 (Atlanta, GA) and changed its name to ACM FAccT immediately following this year’s conference in Barcelona, Spain.)

What’s new

Compared to the last year’s event, the organizers made several changes to reflect that FAccT topics require a more holistic view rather than a pure computer science one. In addition to the computer science (CS) track, they created tracks dedicated to social sciences and humanities (SSH), and legal scholarship (LAW). They have also added the CRAFT session, inspired by the NeurIPS 2018 Workshop, “Critiquing and Correcting Trends in Machine Learning” (CRACT). The session focused on critiques of the field of FAccT, such as its blind spots, omissions, or alternative possibilities that take a more holistic approach. This is to give voice to those who suffer the impact, daring formats and beyond academia.

Tutorials

This year, the tutorials are divided into four categories (hands-on, translation, implications, and invited). The hands-on tutorials included explanatory tools (Google’s What-If and IBM’s “AI Explainability 360”) and University of Colorado’s fairness-aware recommendation package called librec-auto. These are useful tools, but I assumed I could learn about these tools on my own from their websites. Instead, I attended translation and implications tutorials where the former includes topics on ethics, law and policy, and the latter is about case studies of FAccT topics in industry. Interestingly, some of the translation tutorials focused on positionality, the social and political context that influences, and potentially biases, a person’s unique but partial understanding of the world. Here is the summary of the tutorials I attended.

Explainable AI in Industry: Practical Challenges and Lessons Learned

This tutorial discussed several techniques on how to provide explainability in ML models and introduced several case studies. Generally, there are two approaches to achieve explainable AI. We can build an interpretable model such as linear models or decision trees, or we consider coming up with post-hoc explanation for a given model. We focused on the latter. The post-hoc explanation approach can be rephrased as the “attribution” problem. In other words, we want to know why a model makes a certain prediction and attribute the model decision to the feature of an input. The main technique we discussed, integrated gradients¹ address this aspect by interpreting explainability as feature gradient. For instance, if I changed a feature X, and then the target Y has changed, we may consider the change of X as an explanation to change of Y. Integrated gradients are model-agnostic interpretation of this idea. Here, we look for feature regions where model performance changes dramatically (i.e., large gradient) and integrate them to create an explanation. One application example was diabetic retinopathy to predict severity of the disorder in retinal images. The method was able to provide explanation to deep learning model prediction by locating retinal legions. Although this is useful for its model-agnostic approach, it still lacks global explanation.

Leap of FATE: Human Rights as a Complementary Framework for AI Policy and Practice

Even though AI governance has emerged as a hot topic, many are disappointed by the current ethical framework for AI-related problems. The speakers suggest that we can use human-rights based approach to bring more rigorous and specific groundwork for AI governance. Existing treaties or resolutions such as Universal Declaration of Human Rights (UHDR) or International Covenant on Civil and Political Rights (ICCPR) are good examples. Adopting a human rights perspective for AI governance provides several benefits. First, these are well-established universal standards that have been existing for many decades. Second, they have more currency than ethics, and thus provide better accountability especially when these are violated (i.e., violation to human rights). Since these resolutions are much more specific than ethics, it is possible to make human rights impact assessment of AI technology and companies who own it.

Two computer scientists and a cultural scientist get hit by a driver-less car: Situating knowledge in the cross-disciplinary study of F-A-T in machine learning

This tutorial focused on positionality and attempted to give audience first-hand experience of how researchers’ perspectives and approaches can differ across many disciplines. After a brief introduction, we were split into several groups and read three papers all from different disciplines such as computer science, social science and philosophy. The computer science paper was about identifying racial bias in online comments. The social science paper discussed intricate relation between racial minority and their particular usage of certain words in English. In addition to using several questionable methods, the authors from the first paper clearly lacked domain knowledge. Thus, the conclusion of this paper inevitably made a blanket statement on the topic, which was inaccurate and ignorant. It was a good hands-on exercise on the importance of domain knowledge and collaboration across multiple disciplines, which is usually required in FAccT-related studies.

Main conference

There were three keynote talks and 15 sessions of talks for accepted papers. These sessions are grouped into the following topics:

Accountability
Explainability
Auditing/Assessment
Fairness
Ethics and Policy
Values
Data collection
Cognition and education
Sensitive attributes

Keynote

They were given by Ayanna Howard from Georgia Institute of Technology, Yochai Benkler from Harvard Law School, Nani Jansen Reventlow from Digital Freedom Fund, which supports partners in Europe to advance digital rights through strategic litigation. Howard’s talk was about human bias in trust in AI. She provided examples from her experiments using robots and how human participants interacted with them. In one of her experiments, she found that humans can overly trust AI decision (a robot’s decision in the experiment). She mentioned that understanding cognitive bias is crucial when designing an AI system. Benkler’s talk was on the role of technology in political economy. He discussed how technology has been contributing to the increase of productivity but also aggravating the concentration of wealth, which led to more severe economic inequality. Finally, Reventlow discussed how we can sometimes use litigation strategically to make industry and government more accountable. She mentioned that the best way to build a strategic litigation case is to make the litigation itself embedded in a broader case and to bring various stakeholders such as activists and policymakers together.

Accountability

The first paper² criticizes that the term “algorithmic accountability” is inherently vague and thus it needs specification in terms of “the actor, forum and forum, the relationship between the two, the content and criteria of the account, and finally the consequences which may result from the account.” Raji et al.³ also mentioned that due to this vagueness, there is a gap in algorithmic accountability. The authors showed ways to overcome this by illustrating how other industries such as aerospace, medical devices, and finance conduct audit and carry out governance. Finally, Katell et al.⁴ gave an actual example where researchers co-developed algorithmic accountability interventions (“Algorithmic Equity Toolkit”) by adopting participatory action research methods in collaboration with American Civil Liberties Union of Washington (ACLU-Washington).

Explainability

There were many interesting papers in this section. Sokol and Flach⁵ provided convenient yet exhaustive explainability fact sheets for project assessment. Malgieri and Kaminski⁶ criticized that even though explainability is addressed in GDPR, there is no consensus on what it entails. They also made an excellent point that explanation does not necessarily provide justification on fairness. Barocas et al.⁷ criticized the faulty assumptions often made when models provide explainability or actionability. For instance, to change the model prediction of an individual, the model may make suggestion to the person to make more money, which is not exactly easily actionable. They suggest that there should be fiduciary obligations on explainability, and we should study how people actually respond to the explanations. Lucic et al⁸ created Monte Carlo Bounds for Reasonable Predictions (MC-BRP), which generates feature values that would result in a reasonable prediction (i.e. a range of predicted values), based on the n most important features, and general trends between each feature and the target variable, such as whether the target value increase with a certain feature. Bhatt et al.⁹ conducted a survey on how industry handles explainability in deployment. Their results showed that explainability is mostly used for debugging and the goals of explainability are not clearly defined. Some ML engineers they talked to also addressed that there is technical challenge in real-time deployment for explainability. Hancox-Li¹⁰ discussed so-called “Rashomon effect” where multiple models of similar performance provide different explanations, which makes difficult for stakeholders to choose which explanation to believe.

Auditing/Assessment

Black et al.¹¹ presented a model-agnostic tool that can be used in the exploration stage to identify discriminatory behavior in ML systems. Their concept is roughly based on counterfactuals but they addressed that coming up with a right causal diagram is often challenging. Instead, they used a technique called an optimal transport map, which transforms one probability distribution into another, while minimizing a given cost defined over their respective supports. For example, we might use an optimal transport map from the distribution of men to women in order to obtain a (female, male) pair of inputs with which to query the model. If the model’s output differs for these two people, then it may be evidence that the model discriminates on the basis of gender. Rodolfa et al.¹² did a case study in collaboration with a local government to show how to balance equity and model performance depending on which fairness metric to choose and how it affects certain demographics. Lum¹³ audited a pre-trial risk assessment tool from San Francisco, CA, and found that overbooking (i.e., booking charges that are dropped or end in an acquittal) increased the recommended level of pre-trial supervision. Sánchez-Monedero et al.¹⁴ criticized that some automated hiring systems developed in the US were used in the UK and other European countries where the transparency and data protection law background is quite different from the US. Borradaile et al.¹⁵ investigated how the Corvallis (Oregon) Police Department has used social media to monitor certain users and found that they have outsourced the task and racial bias existed among those who were monitored.

Fairness

Slack et al.¹⁶ developed Fair-MAML, which demonstrates K shot learning for cases where the number of training samples is extremely small. Barabas et al.¹⁷ introduced a concept of “studying up” commonly used in the field of anthropology, which refers to studying “the relative upper hand” in terms of the amount of agency and authority they have in a given context such as established legal systems or law enforcement. Pujol et al.¹⁸ showed a study of conflict between differential privacy and fairness, and suggested to customize privacy implementation for each assignment instead of using differential privacy for all tasks. Liu et al.¹⁹ looked at the complicated interplay between the deployed model for automated decision making in hiring, and individuals who are affected by the model. Because the model’s decision rewards people disproportionately, people tend to change their behavior in response to how these decisions are made. The authors showed that subsidizing the cost of investment create better equilibria for the disadvantaged group. Yang et al.²⁰ examined the ImageNet dataset and found problematic behavior in how “person” category was constructed, for instance, due to lack of diversity in images and bias in annotation. Binns²¹ investigated the notion of individual and group fairness being in conflict and clarified it being based on a misconception, which requires resolution from a broader context and by focusing on the sources of unfairness.

Ethics and Policy

Washington and Kuo²² found that AI ethics codes from corporations conflated consumers with society and were largely silent on agency. They introduced the concept of digital differential vulnerability to explain disproportionate exposures to harm within data technology and suggest recommendations for future ethics codes. Bietti²³ discussed a current trend in tech industry where self-regulation or hands-off governance prevails and “ethics” is increasingly identified with technology companies’ self-regulatory efforts with shallow appearances of ethical behavior, so-called “ethics washing.” Abebe et al.²⁴ states that computational research has valuable roles to play in addressing social problems.

Values

Dotan and Milli²⁵ the rise and fall of certain model types. They argue that the rise of a model- type is self-reinforcing and the way model-types are evaluated encodes loaded social and political values such as centralization of power, privacy, and environmental concerns. Venkatasubramanian and Alfano²⁶ criticizes the current conceptualization of algorithmic recourse, the systematic process of reversing unfavorable decisions by algorithms and bureaucracies across a range of counterfactual scenarios, and suggests both stakeholder and expert panels should establish acceptable action sets (i.e., not one-way), make a role for fiduciaries who are charged to act on behalf of those they represent, and handle the changes over time (i.e., ongoing engagement).

Data collection

In this section, various authors criticized that the current data collection practices in industry and governments lack transparency and standardized approach. Geiger et al.²⁷ investigated human- annotated datasets widely used in the ML community and found that the so-called benchmark datasets have questionable reliability in the first place. Marda and Narayan²⁸ investigated predictive policing system in New Delhi to discover the lack of public accountability mechanisms, biases that are present within Delhi Police’s data collection practices and how these translate into predictive policing.

Cognition and education

Bates et al.²⁹ collected data from MSc Data Science teaching team based at University of Sheffield to reflect on their experiences of working at the intersection of disciplines such as FATE (i.e., FAT + ethics). Based on their findings, they suggest to create empathetic learning spaces for interdisciplinary teaching teams and collaborate with decolonization experts to avoid Eurocentrism with data science competency in mind.

Sensitive attributes

Bogen et al.³⁰ brings up the topic of data collection on sensitive attributes such as race and gender specifically for interventions for antidiscrimination. They illustrated case studies from several industry domains such as credit, employment, and healthcare, and found that the practices widely varies across the domains and also within in certain cases, such as whether the companies are bound by legal restrictions or they can collect self-reported identification as data. The paper encourages various stakeholders to actively help chart a path forward that takes both policy goals and technical needs into account.

CRAFT sessions

CRAFT stands for Critiquing and Rethinking Accountability, Fairness and Transparency (CRAFT) session. These are designed to bring more diverse voices on FAT* topics. I attended the following sessions.

When Not to Design, Build, or Deploy

Given the recent push for moratoria on facial recognition, protests around the sale of digital technologies, and the ongoing harms to marginalized groups from automated systems such as risk prediction, a broader discussion around how, when, and why to say no as academics, practitioners, activists, and society, seems both relevant and urgent.

Some of the notable topics the panel discussed include more case studies on the unnecessary surveillance driven by AI systems, raising tech workers’ agency (accompanied with ethics training for engineers), and making interventions at various points in a production pipeline, and so on. When it comes to how we would formalize the effort in terms of policy, some mentioned that engineers can potentially make released models more difficult to replicate, societies can apply similar regulations to AI as they do to any weapons, and governments can bring legal enforcement on the issue.

Centering Disability Perspectives in Algorithmic Fairness, Accountability & Transparency

It is vital to consider the unique risks and impacts of algorithmic decision-making for people with disabilities. The diverse nature of potential disabilities poses unique challenges. Many disabled people choose not to disclose their disabilities, making auditing and accountability tools particularly hard to design and operate. Further, the variety inherent in disability poses challenges for collecting representative training data in any quantity sufficient to better train more inclusive and accountable algorithms.

Speakers mentioned that disability perspectives combine AI systems design and accessibility issues. This is similar to the lack of diversity issue in training data in face recognition. If deployed AI systems cannot recognize disabled people. This requires ML engineers to consider whom their models might fail for and consider inclusive and viable solutions in advance. The disabled community also has intersectionality because how disabled are represented in the media and society do not reflect the diverse spectrum of the disabled community. This is also related to that various demographic groups have different socioeconomic and genetic backgrounds that might affect the probability of them to become disabled.

Reflection and suggestions

Overall, it was nice to see the conference growing rapidly every year. I appreciated the organizers’ move to create separate tracks for social science and law, to encourage more holistic views in research. I enjoyed the CRAFT sessions very much. Due to the interdisciplinary nature of the FAT* topics, sometimes I sensed that the one-way communication of the talk format has severe limitations on how we move further from here. The CRAFT sessions were the attempts to mitigate this issue and also bring the ground workers such as activists, lawyers, policy makers, industry data scientists together.

I also see there are room for improvements. First of all, every talk was strictly limited to 8 minutes, which was too short. Especially when it comes to computer-science papers, some speakers showed many math equations on their slides, which is probably not the best strategy for the 8-min talk. Some speakers did a great job of laying out the background knowledge but not all of them. If we are truly interested in multidisciplinary effort, improving the communication is crucial.

Speaking of communication, at the very end of the conference, there was a townhall meeting and the Q&A was extremely short. I think this should be more focused because we are a group of various backgrounds and to make impactful changes from our research, it is important to open the floor for more discussions. This can be potentially mitigated by organizing birds-of- feather (BOF) sessions or group-lunch for certain topics. The conference is growing fast, which means we will have a lot of newcomers every year. It would be better for them to have easy-to- network venues.

The conference this year used an anonymous board to submit questions online and did not use their Slack channel much. I understand the need for anonymity but the Slack channel was extremely useful in terms of establishing networks between researchers from different domain and socializing. Tutorial preparation could have also been announced on the Slack channel as well like last year. There were some tutorials where the attendees didn’t get enough information about the preparation and it was a missed opportunity.

This issue was brought up last year but the conference may want to attempt to create a shared knowledge base to educate the attendees to have everyone on the same page. There are many definitions of key terms such as fairness, and there are important historic case studies researchers talk about all the time. It would be nice to have standardized material that is open to all attendees before the conference so that we can all well-informed and follow the conference material more easily.

This might be a radical and debatable idea, but if the conference is truly interested in tackling practical and more current issues related to the FAT* topics, they might consider bringing more industry people, especially the ones who are often criticized by various researchers to make them more accountable and help them create a better solution, instead of ostracizing them. Not all companies may participate but considering the urgency of the issue, it wouldn’t do any harm for us to reach out to industry and bring them to the table. This will also provide opportunities for researchers as well if the industry stakeholders share their difficulties of implementing good AI governance in practice, and the researchers can help tackling them.

References

Footnotes

M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic Attribution for Deep Networks,” ArXiv170301365 Cs, Jun. 2017.↩︎
M. Wieringa, “What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona Spain, 2020, pp. 1–18, doi: 10.1145/3351095.3372833.↩︎
I. D. Raji et al., “Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing,” p. 12, 2020.↩︎
M. Katell et al., “Toward situated interventions for algorithmic equity: lessons from the field,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona Spain, 2020, pp. 45–55, doi: 10.1145/3351095.3372874.↩︎
K. Sokol and P. Flach, “Explainability fact sheets: a framework for systematic assessment of explainable approaches,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona Spain, 2020, pp. 56–67, doi: 10.1145/3351095.3372870.↩︎
M. E. Kaminski and G. Malgieri, “Multi-layered explanations from algorithmic impact assessments in the GDPR,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona Spain, 2020, pp. 68–79, doi: 10.1145/3351095.3372875.↩︎
S. Barocas, A. D. Selbst, and M. Raghavan, “The hidden assumptions behind counterfactual explanations and principal reasons,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona Spain, 2020, pp. 80–89, doi: 10.1145/3351095.3372830.↩︎
A. Lucic, H. Haned, and M. de Rijke, “Why does my model fail?: contrastive local explanations for retail forecasting,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona Spain, 2020, pp. 90–98, doi: 10.1145/3351095.3372824.↩︎
U. Bhatt et al., “Explainable Machine Learning in Deployment,” p. 10, 2020.↩︎
L. Hancox-Li, “Robustness in Machine Learning Explanations: Does It Matter?,” p. 8, 2020.↩︎
E. Black, S. Yeom, and M. Fredrikson, “FlipTest: fairness testing via optimal transport,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona Spain, 2020, pp. 111–121, doi: 10.1145/3351095.3372845.↩︎
K. T. Rodolfa, E. Salomon, L. Haynes, I. H. Mendieta, J. Larson, and R. Ghani, “Case study: predictive fairness to reduce misdemeanor recidivism through social service interventions,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona Spain, 2020, pp. 142–153, doi: 10.1145/3351095.3372863.↩︎
K. Lum, C. Boudin, and M. Price, “The impact of overbooking on a pre-trial risk assessment tool,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona Spain, 2020, pp. 482–491, doi: 10.1145/3351095.3372846.↩︎
J. Sánchez-Monedero, L. Dencik, and L. Edwards, “What does it mean to ‘solve’ the problem of discrimination in hiring?: social, technical and legal perspectives from the UK on automated hiring systems,” in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona Spain, 2020, pp. 458–468, doi: 10.1145/3351095.3372849.↩︎
G. Borradaile, B. Burkhardt, and A. LeClerc, “Whose Tweets are Surveilled for the Police: An Audit of a Social-Media Monitoring Tool via Log Files,” p. 11, 2020.↩︎
D. Slack, S. A. Friedler, and E. Givental, “Fairness Warnings and Fair-MAML: Learning Fairly with Minimal Data,” p. 10, 2020.↩︎
C. Barabas, C. Doyle, J. Rubinovitz, and K. Dinakar, “Studying Up: Reorienting the study of algorithmic fairness around issues of power,” p. 10, 2020.↩︎
D. Pujol, R. McKenna, S. Kuppam, M. Hay, A. Machanavajjhala, and G. Miklau, “Fair Decision Making Using Privacy-Protected Data,” p. 11, 2020.↩︎
L. T. Liu, A. Wilson, N. Haghtalab, A. T. Kalai, C. Borgs, and J. Chayes, “The Disparate Equilibria of Algorithmic Decision Making when Individuals Invest Rationally,” p. 11, 2020.↩︎
K. Yang, K. Qinami, L. Fei-Fei, J. Deng, and O. Russakovsky, “Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy,” p. 12, 2020.↩︎
R. Binns, “On the apparent conflict between individual and group fairness,” Proc. 2020 Conf. Fairness Account. Transpar., pp. 514–524, Jan. 2020, doi: 10.1145/3351095.3372864.↩︎
A. L. Washington and R. Kuo, “Whose side are ethics codes on?: power, responsibility and the social good,” Proc. 2020 Conf. Fairness Account. Transpar., pp. 230–240, Jan. 2020, doi: 10.1145/3351095.3372844.↩︎
E. Bietti, “From Ethics Washing to Ethics Bashing,” p. 10, 2020.↩︎
R. Abebe, S. Barocas, J. Kleinberg, K. Levy, M. Raghavan, and D. G. Robinson, “Roles for Computing in Social Change,” p. 9, 2020.↩︎
R. Dotan and S. Milli, “Value-laden Disciplinary Shifts in Machine Learning,” ArXiv191201172 Cs Stat, Dec. 2019.↩︎
S. Venkatasubramanian and M. Alfano, “The philosophical basis of algorithmic recourse,” p. 10, 2020.↩︎
R. S. Geiger et al., “Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?,” p. 12, 2020.↩︎
V. Marda and S. Narayan, “Data in New Delhi’s predictive policing system,” Proc. 2020 Conf. Fairness Account. Transpar., pp. 317–324, Jan. 2020, doi: 10.1145/3351095.3372865.↩︎
J. Bates et al., “Integrating FATE/Critical Data Studies into Data Science Curricula: Where are we going and how do we get there?,” p. 11, 2020.↩︎
M. Bogen, A. Rieke, and S. Ahmed, “Awareness in Practice: Tensions in Access to Sensitive Attribute Data for Antidiscrimination,” p. 9, 2020.↩︎