Conclusion

Our closing thoughts

--

Discussion

Top 10 features in our random forest model

We obtained results that were both similar and different from Chicago’s project. Like their project, our random forest model performed very well and had similar important features. For example, our top two predictors were proportion of past inspections failed and a binary past failure flag. These correspond to previous history of critical violations in Chicago’s model.

Our other top predictors were also similar, such as three-day average high temperature, density of nearby sanitation complaints and nearby burglaries, geographical district and “cluster”, and days since last inspection. Regarding the association with burglaries, it is postulated that areas with higher property crime are more likely to have sanitation problems.

Comparing our top 2 models, the random forest slightly outperformed the AdaBoost model in terms of overall test accuracy (83% vs. 81%) and area under the ROC (0.89 vs. 0.87). However, in the simulation study, the AdaBoost model performed on par (if not better) with the random forest. Within a two-month period, the average time to pick up a failed inspection was reduced by 15 days when the AdaBoost model was used to prioritize inspections, compared to 14 days if the random forest was used.

In general, forecasting with either model would catch 90% of the failures within the first month, compared to only 64% on the existing workflow. These “real-life” measures of performance are higher than what Chicago’s team obtained in their simulation study (failures found 7.5 days earlier on average; 69% in the first month with forecasting vs. 55% without forecasting).

Limitations

The premise of our project relies on the explicit assumption that finding violations is time invariant — that is a food establishment found to have a critical violation on any particular day would also have a violation if inspected earlier. Of course, this is not always true. In evaluating our models, the relatively short time window of two months would help mitigate factors that lead to temporal violations. However, this also meant that temperature, which varies across longer periods of time, had limited effect as a differentiating factor.

Additional data can also be used to supplement this model. For example, restaurant review data, from online sites such as TripAdvisor or Yelp, can indicate the sanitary conditions of an establishment. The use of online consumer reviews to predict food inspection outcomes has been demonstrated by studies such as this and this.

Other datasets on Chicago’s open data portal are potentially useful as well, such as rodent baiting requests. We also considered features such as type of cuisine, average menu item price and employee wage. Unfortunately, data on these variables were not readily available.

--

--