Can machine learning measure the impact of bias on U.S. asylum decisions?

Vyoma Raman
Human Rights Center
9 min readOct 7, 2022
Illustration by Adobe Stock

By Catherine Vera and Vyoma Raman

Sandra and Ana’s reasons for fleeing Tegucigalpa, Honduras were largely identical. Both women were actively involved in making their children’s school a safe haven from gang violence. Both women made the decision to flee home after receiving death threats against their families from armed gang members. Although their reasons for seeking refuge in the United States aligned, the outcomes of their cases diverged when they entered the U.S. immigration court system. Sandra’s case was tried in San Francisco, California, and Ana’s was tried in Charlotte, North Carolina. After recounting their respective escapes from Tegucigalpa, only Sandra was granted asylum.¹

The divergence of these womens’ experiences is not an anomaly. A team of student researchers² from the Human Rights Center’s (HRC) Investigations Lab at Berkeley Law recently developed a novel methodology to study such inequities in the U.S. asylum process. They found that the decision of U.S. immigration courts to either grant or deny a refugee’s request for asylum can be heavily influenced by “extraneous factors,” or characteristics such as hearing time and years of judge experience that have nothing to do with the merits of an asylum seeker’s claim. According to the student researchers, asylum decisions are affected by individual and systemic bias, which can be quantified in terms of extraneous factors such as “political partisanship,” which they defined as the prevailing political climate of the state in which the asylum court sits, as well as the individual variability of the presiding judge based on their gender and the nature of their previous work experience.

The Backstory

In May 2021, the non-governmental organization Human Rights First (HRF) filed two Freedom of Information Act (FOIA) requests, one with the Executive Office for Immigration Review (EOIR) and another with the Department of Homeland Security (DHS), inquiring about immigration court and detention data containing information on race. For instance, a field for “complexion” — presumably referring to skin tone — is contained in the I-213, a form used by the DHS in deportation cases. Given the U.S.’s long history of racialized immigration policies, from the Chinese Exclusion Act to the recent deportation of Haitians at the southern borderHRF wished to analyze the impact of race on immigration custody decisions, detentions, and asylum outcomes with the support of a data science research team at the HRC.

When EOIR and DHS responded to the FOIA requests months later, they provided some of the data requested, such as bond amounts and asylum seeker nationality, but failed to include information on asylum seeker race or complexion. Although one could argue that the data is too sensitive for the agencies to share externally, this omission severely restricts investigation into the impact of race on asylum adjudication and detention decisions and, by consequence, the ability of independent bodies to hold EOIR and DHS accountable for a potential racial divide in asylum policies and implementation.

This concern is particularly unnerving given that lawyers have long known that a judge’s biases, whether conscious or unconscious, can have a significant impact on a case. This has often been referred to as the “what the judge had for breakfast” effect, which recognizes that extraneous factors can cause the judge to be in a particular mood, leading to rulings that are influenced not just by reason, but also emotion⁴. Lawyers also know that courts in certain geographic regions may be markedly more favorable, or less favorable, to specific types of cases or parties, leading to “forum shopping,” a process wherein a lawyer will endeavor to have a case tried in the venue most likely to favor their client.⁵ Knowledge of judicial partialities and venue proclivities generally are a result of a lawyer’s intuition and experience, or are the subject of lore in a local legal community.

In response to the exclusion of race-related data by the EOIR and DHS, the HRC researchers devised an alternate method of holding these agencies accountable: by contributing to a growing body of research which provides the empirical evidence necessary to confirm beliefs about biases in the judicial process, and specifically the asylum process. Drawing on prior literature and expert domain knowledge, they created and analyzed a system to quantify systemic biases across geographic and temporal boundaries. A critical application of this research is to guide policy-makers endeavoring to bring more fairness into the asylum system.⁶

The Work

In a paper to be presented at the Association for Computing Machinery 2022 Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, the team compiled and analyzed nearly 6 million U.S. immigration court proceedings across a range of 228 case features — many of which, such as the judge’s gender or the political leaning of the state in which the trial was held, should ostensibly have no impact on the outcome of the case. Nonetheless, using an array of data science techniques, including predictive modeling and a time series analysis, the HRC’s Investigations Lab instead found that the combination of extraneous factors into scores for partisanship and consistency were highly predictive of case outcomes. Clearly, this result is contrary to the professed fairness of the American justice system.

Prior researchers have also confirmed significant inequities in the U.S. asylum system as a result of extraneous factors. For example, the day after their city’s NFL football team lost a game, U.S. immigration judges denied 1.4% more asylum petitions; they also found that immigration judges denied 1% more asylum petitions on snowy days and 2.3% more on windy days.⁷ Immigration judges were up to 3.3 percentage points more likely to reject the current asylum petition if they had approved the previous asylum petition.⁸ Hearings that started just before lunch or just before the end of the day had a higher than average grant rate.⁹ Investigators from Reuters examined the cases of Sandra and Ana among thousands of others, conducting a large-scale analysis of decision variability across immigration courts. They found broad variations between regions and individual judges in asylum case outcomes.¹⁰ Introducing machine learning¹¹ techniques to legal analytics enables researchers not only to detect patterns, but also to make predictions about judicial outcomes for a given set of inputs.

The team’s algorithm uses two scoring metrics to computationally measure individual and systemic variability. First, “inter-judge cohort consistency” postulates that an individual judge’s ruling is “fair” if most other judges from the same court, hearing similar cases, would have ruled in the same way.¹² For each case assigned to a particular judge, the algorithm computes the proportion of other judges who reached the same outcome for other cases in the cohort. Next, it aggregates these into an overall consistency score for each judge to measure how frequently a particular judge agrees with other judges on similar cases in the cohort. To measure systemic variability, the algorithm characterizes a judicial decision as “partisan” if outcomes of similar cases vary based on the political climate in which they are decided. Specifically, the “political climate” of a state depends upon (1) the political party of the president in office at the time, and (2) the political party supported by the majority of voters in the state in which the court is located during the last presidential election. The algorithm then computes an aggregated score representing the potential level of partisan influence on the current decision, to indicate the degree to which a decision outcome may change depending on political climate.

Among the most significant findings were that immigration court decisions are predominantly impacted by these two extraneous factors — the surrounding political climate (“partisanship”) and the individual variability of the presiding judge (“inter-judge cohort consistency”). These combined effects accounted for nearly 60% of total decision variability, meaning these two factors — which are unrelated to the merits of an asylum case — play a significant role in how a presiding judge will rule.

The project also revealed the influence of other extraneous factors on asylum outcomes. For example, 26% of asylum seekers who were represented by an attorney were granted asylum, whereas only 3% of asylum seekers who were unrepresented received asylum. Whether or not an asylum seeker has an attorney should not influence a judge as to the merits of the underlying case.¹³ Another significant finding was that female judges tended to have lower average consistency scores than male judges, meaning that in their rulings, female judges on the whole tend to exercise a higher degree of independence from their cohort than male judges.

The Implications

Court clerks are responsible for randomly assigning asylum cases to a particular immigration judge. The outcome of this arbitrary process may be one of the biggest determinants of whether asylum is granted or denied — which can mean the difference between life and death.¹⁴ It is highly problematic that asylum rulings are heavily dependent on arbitrary factors, particularly a judge’s conscious or unconscious personal biases, or the influence of other irrelevant factors. This project urges increased scrutiny and accountability in asylum adjudication, and recommends improved and standardized variability metrics to better diagnose and monitor the impact of extraneous factors on asylum adjudication.

Co-authors Vyoma Raman and Catherine Vera (L-R) present their research at the ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization on October 7, 2022. Photo by Sethu Sankar.

Click here to read the entire research paper. Click here to read the fact sheet. Click here to read the press release. For more information or press inquiries, please contact Human Rights Center communications specialist Maggie Andresen (mandresen@berkeley.edu) & (communications.hrc@berkeley.edu).

[1]: Mica Rosenberg, Read Levinson, and Ryan McNeil. Reuters Investigates: Heavy Odds. 17 October, 2017. https://www.reuters.com/investigates/special-report/usa-immigration-asylum/. “They fled danger at home to make a high-stakes bet on U.S. immigration courts […] The difference between residency and deportation depends largely on who hears the case, and where.”

[2]: Student team leads: Catherine Vera and Vyoma Raman. Team members: Aarushi Karandikar, Aayush Patel, Aliyah Behimino, Carolyn Wang, CJ Manna, Elena Wuellhorst, Ellie Wong, Karina Cortes Garcia, Helia Sadeghi, Maggie Carroll, Rosie Foulds, and Upasana Dilip. Project Advisers: Dr. Alexa Koenig, Stephanie Croft, and Sofia Kooner. The team is composed of an interdisciplinary body of students and advisors specializing in disciplines ranging from data science and computer science to human rights and immigration law.

[3]: Sullivan, Eileen. 2022. “U.S. Accelerated Expulsions of Haitian Migrants in May.” The New York Times, June 9, 2022, sec. U.S. https://www.nytimes.com/2022/06/09/us/politics/haiti-migrants-biden.html.

[4]: Chen and Loechner, citing Frederick Schauer, Thinking Like a Lawyer, Harvard University Press, Cambridge, MA 2009.

[5]: American Tort Reform Foundation’s (ATRF) Judicial Hellholes Report, 2021–2022. https://www.judicialhellholes.org/wp-content/uploads/2021/12/ATRA_JH21_layout_FINAL.pdf

[6]: For example, this research project provided data which was cited in white papers submitted by Human Rights First to the Biden Administration regarding due process concerns in an aspect of the asylum process known as “credible fear determinations.” Pretense of Protection: Biden Administration and Congress Should Avoid Exacerbating Expedited Removal Deficiencies https://www.humanrightsfirst.org/sites/default/files/PretenseofProtection.pdf

[7]: Daniel L. Chen, Markus Loecher, Mood and the Malleability of Moral Reasoning: The Impact of Irrelevant Factors on Judicial Decisions, September 21, 2019. Available at SSRN: https://ssrn.com/abstract=2740485 or http://dx.doi.org/10.2139/ssrn.2740485.

[8]: Daniel Chen, Tobias J. Moskowiz, Kelly Shue, Decision-Making Under the Gambler’s Fallacy: Evidence from Asylum Judges, Loan Officers, and Baseball Umpires, 131 Quarterly J. Econ. 3 (2016). These judges may be susceptible to “the gambler’s fallacy,” i.e., a judge who sees a high quality case will predict that the next case is likely to be lower in quality in a probabilistic sense even before seeing the next case. Alternatively, a judge may take such action in order to avoid the appearance of being too lenient or too harsh. Id. at 6–7.

[9]: Daniel L. Chen & Jess Eagel, Can Machine Learning Help Predict the Outcome of Asylum Adjudications? In Proceedings of the 16th Edition of the International Conference on Artificial Intelligence and Law, 237–40. ICAIL ’17. New York, NY, USA: Association for Computing Machinery, 2017. https://doi.org/10.1145/3086512.3086538. Interestingly, a different pattern was found among Israeli parole judges, who were more harsh just before meal breaks, and then returned to baseline immediately thereafter. Shai Danziger, Jonathan Levav & Liora Avnaim-Pesso, Extraneous Factors in Judicial Decisions. In Proceedings of the National Academy of Sciences 108, no. 17 (April 26, 2011): 6889–92. https://doi.org/10.1073/pnas.1018033108.

[10]: Mica Rosenberg, Read Levinson, and Ryan McNeil. Reuters Investigates: Heavy Odds. 17 October, 2017. https://www.reuters.com/investigates/special-report/usa-immigration-asylum/.

[11]: Machine learning is a subfield of artificial intelligence. It utilizes statistical methods to design algorithms which are iteratively refined or “trained” by being applied to actual data. As the algorithms become more accurate, they are increasingly able to make classifications or predictions about the likely outcome for a given set of inputs.

[12]: Clearly, this postulation is subject to limitations. For example, some specific courthouses are known as asylum “deserts” meaning that hardly any judges sitting in that courthouse ever grant asylum to any applicants. In denying asylum to an applicant, a judge sitting in that court would be deemed “fair” under the algorithm, although more accurately the judge is simply ruling in a way which is consistent with colleagues at that courthouse, and the decision may not be truly “fair” in the traditional sense of that word. It was necessary to make similar assumptions in the algorithm for the sake of simplicity, expediency, or to constrain the complexity of computations so as not to exceed available computational resources. These limitations are addressed more fully in the team’s forthcoming research paper.

[13]: This discrepancy in outcomes between represented and unrepresented applicants may be partly explained by the fact that many attorneys may only accept cases that they believe have merit and a significant chance of attaining a successful outcome.

[14]: Andrew Schoenholtz, Jaya Ramji-Nogales & Philip Schrag, Refugee Roulette: Disparities in Asylum Adjudication, 60 Stan. L. Rev. 295, 300 (2007–2008).

--

--