Overview

Summary of our project

--

Introduction

There are more than 15,000 food establishments in Chicago but fewer than three dozen food inspectors. 15% of these establishments will have at least one critical violation. Many of them are discovered long after the violations have occurred, thereby exposing the public to risk of food-borne illnesses.

To address this problem, the City of Chicago developed a model to forecast establishments with high likelihood of critical violations and prioritized them for inspections. In a pilot study, the data-optimized order of inspections identified unsafe establishments earlier than the usual workflow, by 7.5 days on average.

Data

Building on the intuition gained from Chicago’s project, we developed a model for predicting food inspection outcomes. Our data sources were the publicly available dataset of about 130,000 food inspections conducted in Chicago since 1 January 2010, as well as business licenses, crimes and sanitation code 311 complaints data from Chicago’s open data portal. We also requested daily climate data from the National Centers for Environmental Information (NCEI).

In preparing our data for analysis, we transformed new variables which we felt were potential predictors of inspection outcome. These include proportion of past inspections failed, days since last inspection, most recent inspection outcome, and three-day rolling mean of daily max temperatures. We regrouped inspection type and facility type into broad categories. We then mapped the main inspections dataset with climate data by date, with business licenses data by license ID, and with crime and sanitation data by “cluster”.

From our data exploration, we found that past failure, inspection type and geographical “cluster” are associated with failed outcome. Failure rates tend to increase in the mid-year months corresponding to higher temperatures. There is also geographic variation in crimes and sanitation complaints which may be related to outcome.

Model

Prior to modelling, we handled missing values, applied one-hot encoding on categorical variables, and split our data into training and test sets in 7:3 ratio. Using a binary outcome (pass or fail), we created two baseline models as benchmarks: an “all-zero” model predicting all inspections to fail, and a “random label” model predicting failures at a probability of the overall proportion of failures.

We then modelled our data using various classifiers and used logistic regression as an ensemble method to “stack” all the models. We also performed parameter tuning of our models via cross-validation. A time series cross-validator was used to prevent information on lag predictors leaking across validation sets.

We evaluated our models using overall test accuracy, sensitivity, specificity and F1 score. Using an out-of-sample test set (the last 2 months of inspections data), we simulated how the models would be applied in real-life to predict failures and prioritize inspections. Each model was used to order inspections, and then scored on average number of days earlier (or later) failures were inspected compared to the actual dates, and proportion of failures caught in the first month.

Conclusion

The baseline “all zero” and “random label” models had test accuracies of 22% and 34% respectively. Our best fitted model was the random forest model with a test accuracy of 83% and area under curve of 0.89 on the ROC.

In a two-month simulation study, the model was able to reduce the time taken to discover food establishments that failed their inspections by two weeks compared to the current inspection process. 90% of the failures were caught in the first month using an inspection order ranked by predicted probabilities, compared to only 64% on the business-as-usual workflow.

Top predictors for inspection failures included proportion of past inspections failed, three-day average high temperature, nearby sanitation complaints and burglaries, geographic location and time since last inspection.

--

--