Understanding Disparities in Policing Outcomes: A Guide to Data Sources, Methods, and Interpretation
Summary
Public concern about racial and ethnic disparities in policing outcomes has intensified, driving increased demands for transparency, accountability, and data-driven assessment of police actions. In response, law enforcement agencies are increasingly publishing enforcement data, issuing analytical reports, and partnering with researchers to examine traffic stops, arrests, searches, and use of force. While these efforts are essential, interpreting disparity analyses requires careful attention to data quality, measurement choices, and methodological limitations. This guide is designed to help readers better understand commonly used data sources, statistical techniques, and the interpretation of internal or external law enforcement agency reports on policing outcomes (e.g., traffic stops, arrests, and use of force).
The guide first addresses measurement and units of analysis, emphasizing that how outcomes such as stops, arrests, and use of force are defined and counted fundamentally shapes analytical conclusions. Variations across agencies in reporting requirements, data fields, and units of analysis (e.g., stop-level, subject-level, incident-level, or officer-level) can lead to inconsistent or misleading comparisons if not clearly documented and understood.
Next, the guide reviews key data sources used to examine policing outcomes. Official police administrative data—such as calls for service, stop data, arrests, and use of force reports—form the backbone of most disparity analyses and offer important strengths. Specifically, they systematically and quantitatively document police activity and allow for consistent internal comparisons over time. At the same time, these data are collected for administrative purposes, largely reflect only the officer’s perspective, and often lack key contextual detail and temporal sequencing needed to fully explain observed disparities. The guide highlights how additional data sources, such as report narratives, body-worn camera footage, surveys, interviews, and policy reviews, can provide critical context but should be viewed as complementary rather than definitive evidence of disparities.
The core of the guide synthesizes common analytical approaches used to quantitatively assess disparities, including descriptive statistics, bivariate analyses, benchmark analyses, veil-of-darkness tests, interrupted time series analyses, multivariate regression models, predicted probabilities, and outcome tests. For each method, the guide explains what the analysis does, how results should be interpreted, and—critically—what conclusions can and cannot be drawn. A key takeaway is that findings can vary substantially depending on analytical choices, particularly the selection of benchmark populations, and that statistically significant results do not necessarily indicate substantively meaningful differences.
Police officer decision-making is complex and shaped by legal, situational, organizational, and contextual factors that are often not fully captured in administrative data. Therefore, a central premise of the guide is that it is prudent to employ a holistic, multi-method approach to understanding possible disparities in policing outcomes. Each data source, method, and statistical technique offers distinct strengths and limitations. When used as part of a holistic assessment, they collectively provide an opportunity to assess the totality of the evidence toward understanding disparities in policing outcomes more than any single approach.
This is advisable because it is beyond the capacity of any statistical technique to attribute racial differences in policing outcomes to individual officers’ or organizational racial bias or discrimination. When used and interpreted appropriately, disparity analyses provide important contextual information and can support transparency, inform supervision and training, identify areas for improvement or further examination, and guide evidence-based organizational change—while avoiding overstatement of what the data can reveal about bias or intent.