How to Estimate Community Crime Rates?

This article predicts community crime rates with demographic and socio-economic data to answer two specific questions: What are the key attributes of a community in estimating local crime rates? Does crime rate have any local features, meaning if crime rates have heterogeneity?

To answer these questions, we are going to…

  1. Construct a hierarchical regression model (Bayesian mixed effects model) first to predict crime rate.
  2. Use US community data to estimate interested parameters.
  3. Test whether there exists heterogeneity of crime rates in different communities.

Data Description and Treatment

The Communities and Crime Unnormalized Data Set in US combines:

  • Demographic and socio-economic data from the ’90 Census,
  • Law enforcement data from the 1990 Law Enforcement Management and Admin Stats Survey, and
  • Crime data from the 1995 FBI UCR.

The data contains 2215 observations (communities) from 44 US states and 147 attributes. The dimension was reduced via Principal Component Analysis (PCA) and Factor Analysis (FA) and 9 attributes were selected, which are:

x1: percentage of residents with age 16-29
x2: percentage of youth not graduate from high school 
x3: unemployment rate (percentage)
x4: divorce rate (percentage)
x5: percentage of kids in single-parent family
x6: percentage of permanent population (from 1985)
x7: log local population
x8: log per capita income
x9: log population density

Please note that after removing missing values and those states who have less than 10 observations since it is impossible to run linear mixed effect regression with #observations less than #attributes. So J = 34 US states now.

Hierarchical Regression Model

Model Assumptions

Demographic and socio-economic characteristics vary across states, which most probably allows for across-state heterogeneity. The corresponding assumptions of the hierarchical regression model are:

The θ and Σ estimates will be used as parameters of our prior distributions. As the prior belief is from data, this model should be roughly interpreted as a Bayesian regression with weak but unbiased prior information. We will use Metropolis-Hastings Algorithm (combination of Metropolis and Gibbs algorithm) to approximate all parameters of interest.

Full Conditional Distributions

Mean and HPD Interval of β

With these approximated parameters, we can construct a general predictive model for community crime rates (with the intercept term β0 ).

Model Diagnostics

Residual test shows that the normality condition is satisfied well and Bayesian regression with hierarchical structure in parameters fits the data better than OLS model which assumes homogeneity.

Convergence test for θ — ACF
Convergence Test for θ — Trace plot
Convergence Test for σ0²— ACF
Convergence Test for σ⁰² — Trace plot

Approximated σ0² show strong auto-correlation (Effective Sample Size= 14377/50000). However, our MCMC simulation guarantees that it is converged (stationary) and mixing (ergodic) well (Gelman and Rubin’s diagnostic for σ0² ≈ 1, upper bound= 1.02).

Heterogeneity Diagnostics

There exists evident heterogeneity among variances in different states.

Heterogeneity of β

It is obvious that the assumption of heterogeneity of β is reasonable since β shows much more variation than its prior mean θ.

Heterogeneity of σ²

It is evident that the assumption of heterogeneity of σ² is reasonable since σ² in different states have quite different posterior distributions.


To build a predictive model for community crime rate, there were 3 steps in developing the Bayesian linear regression model:

  1. PCA and FA to select independent variables.
  2. In the hierarchical regression model that allows heterogeneity across 34 states, we used Metropolis-Hastings Algorithm to approximate all the parameters of interest. Model diagnostics indicate satisfactory performance of the hierarchical regression model and the validity of our assumption of heterogeneity.
  3. Parameters approximated enable us to construct predictive model for community crime rate, not only of US, but also of each state.

Remarks: Weakness

  • Observations are not evenly sampled from all states. We have data for 44 states. For California, the data sampled 278 obs (12.6% of all obs), while for Maryland, there are only 9 obs (0.4% of all obs).
  • No data of local police departments.
  • We assume community crime rates are i.i.d in each state which is not quite realistic. Further research should take correlation among communities within the same state into account.
  • Some coefficients approximated are not quite reasonable, for example, percentage of of kids in single-parent family has negative effect on crime rate.