Predictive Policing and Algorithmic (Un)Fairness
Although statistics is not a field typically associated with social impact, Kristian Lum, Research Assistant Professor of Computer and Information Science, uses her expertise in statistical and machine learning models to address important societal issues and hopes to amplify the voices of less powerful groups and individuals in conversations about technology that affects them. Much of her work focuses on the ways in which data and machine learning may work in unexpected or unfair ways, and– in some cases– how to change that by developing and deploying models in ways that are more fair.
One example of the use of algorithms in high-stakes decision making can be seen with predictive policing in the criminal justice system. Predictive policing models use analytical techniques to identify likely targets for police intervention in order to prevent or solve crimes. However, relying on machine learning-based systems does not actually remove human bias from the process, and can reinforce and, in some cases, amplify historical racial biases in law enforcement.
Kristian first became interested in investigating predictive policing while working at the Human Rights Data Analysis Group (HRDAG), a non-profit focused on undertaking quantitative projects that are pertinent to uncovering abuses of and advancing human rights. While there, she primarily worked to develop new statistical methods for estimating the size of hidden populations, especially in the context of estimating the number of unrecorded casualties in violent conflicts. This is important not only to construct an accurate historical narrative of the conflict, but also because these numbers can affect policy.
From there, Kristian has explored the many uses of predictive modeling in the criminal justice system and began researching risk assessment models that are used to inform judicial decision-making. These models provide recommendations to human decision-makers about appropriate levels of supervision given the model’s estimates of the likelihood the individual will have bad outcomes, like being re-arrested. However, training data that encodes historical patterns of racially disparate enforcement can lead to predictions that reinforce bias.
This is one of the more common application areas studied in algorithmic approaches to building fair models. In practice, this means looking at how uses of algorithmic predictions can obscure and legitimize biased inputs and perpetuate unfair decisions and outcomes. In some cases, her work also touches on how some of these problems can be mitigated with statistical methods that explicitly account for the bias in the data or process that produces the data.
Modeling the Spread of COVID-19
Kristian explains that she has always let her interests guide her research. “This has led to a somewhat non-linear research path throughout my career so far,” she says.
Recently, Kristian has extended her work to use mathematical and statistical models to focus on the spread of COVID-19. “In one of those projects, we modeled the spread of COVID-19 between and within communities and jails and found that jails– due to their porous nature– can be accelerants to the spread of the pandemic,” she describes.
In another project, Kristian and collaborators built a statistical framework for estimating the parameters of common epidemiological models based only on data on deaths from COVID-19 and information on the clinical progression of the disease– information that they think is among the least unreliable of that available.
As Kristian wraps up several long-term projects she’s been working on, she’s especially interested in using this opportunity to start thinking about some new research directions. Next up, she’s excited to work on individual-level variability in recidivism prediction.
Kristian’s Background
Kristian received her PhD in Statistical Science from Duke University, and joined Penn in 2020 after spending six years at the Human Rights Data Analysis Group as Lead Statistician. Previously, she worked as a data scientist at DataPad, a small technology startup, and was an Assistant Research Professor and Statistician in the Virginia Bioinformatics Institute at Virginia Tech.
One of the reasons Kristian went into statistics was because of its usefulness for answering questions across many scientific fields. “I have pretty broad interests and love applying statistical methods and ideas to exciting questions,” she says. “As interesting questions arise, I am really enthusiastic about finding collaborators with the relevant area expertise to work with!”
Kristian has consulted for a number of city governments on policy issues and risk assessment, and is a key organizer on the steering committee of the ACM FAccT (formerly FAT*) conference, which brings together researchers and practitioners interested in fairness, accountability, and transparency in socio-technical systems. She is also the primary developer of the DGA package, open source software that implements a popular Bayesian method for population estimation.
Between work and her one-year old daughter, there’s not a whole lot of time left in the day. When Kristian does find some extra time, she likes sewing, cycling, weeding, and pre-pandemic, she really enjoyed traveling.