The ML Production Readiness of Tesla’s Autopilot


Feature expectations are captured in a schema.

All features are beneficial.

No feature’s cost is too much.

Features adhere to meta-level requirements.

The data pipeline has appropriate privacy controls.

New features can be added quickly.

All input feature code is tested.

Model Development

Model specs are reviewed and submitted.

Offline and online metrics correlate.

All hyperparameters have been tuned.

The impact of model staleness is known.

A simpler model is not better.

Model quality is sufficient on important data slices.

The model is tested for considerations of inclusion.


Training is reproducible.

Model specs are unit tested.

The ML pipeline is Integration tested.

Model quality is validated before serving.

The model is debuggable.

Models are canaried before serving.

Serving models can be rolled back.


Dependency changes result in notification.

Data invariants hold for inputs.

Training and serving are not skewed.

Models are not too stale.

Models are numerically stable.

Computing performance has not regressed.

Prediction quality has not regressed.




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store