Published in



A Deep Learning System from an Engineer’s Perspective

From Engineering Deep Learning Systems by Chi Wang and Donald Szeto

Figure 1. An overview of a typical deep learning system that includes basic components to support a deep learning development cycle. In later chapters we discuss each component in detail and explain how they fit into this big picture.
  • Dataset manager
  • Model trainer
  • Model server
  • Metadata and artifacts store
  • Workflow manager
  • Model metrics store

Hosted services

You may wonder if you need to design, build and host all deep learning system components on your own. Indeed, there are open source and hosted alternatives for them. We hope that after you have learned the fundamentals of each component, how they fit in the big picture, and how they are used by different roles, will help you make the best decision for your use case.

Application Programming Interface

The entry point of our deep learning system is an application programming interface (API) that is accessible over a network. We opted for an API because the system needs to support not only human user interfaces, but also with applications and possibly other systems.

Dataset Manager

Before a model can be trained, there exists data. The job of the dataset manager is to help organize data into units of datasets. These datasets are bounded in size and are tagged with metadata that describe them, e.g. this dataset contains images that are encoded by a certain algorithm. Both the data and the metadata of datasets can be used during model training.

Model Trainer

Once you have good data the logical next step would be to perform training on them to produce a model. A majority of functionality is provided by frameworks such as TensorFlow or PyTorch, and in this book we do not aim to reinvent that. Rather, we will focus on how to perform model training efficiently and securely in a resource-constrained scenario, and explore advanced training techniques such as hyperparameter tuning, distributed training. We will also talk about experimentation where multiple models are trained with their performances compared.

Model Server

Once models are trained, they can be used to produce inferences on data that is not previously seen by the trainer. Similar to training, many frameworks provide the functionality of producing inferences using models produced within the same framework. Again, in this book, we are not going to explain how to produce inferences from models. We will instead focus on serving architectures that can serve multiple models to high traffic volume.

Metadata and Artifacts Store

This is the store where trainer code, inference code, and trained models are stored together with their metadata that describe them. These metadata help preserve the relationship between datasets, trainer code, inference code, trained models, inferences and metrics to provide complete traceability in the system. Certain static metrics, such as model training metrics, may also reside in this store. Later in the book, we will discuss the importance of this store for experimentation and advanced training techniques.

Workflow Manager

The workflow manager is the glue piece that ties all executions within the system together. A typical example would be

  1. Launch model training on new datasets
  2. Deploy trained model to model server based on passing some predefined criteria

Model Metrics Store

Contrast to static model training metrics that may live in the metadata and artifacts store, the model metrics store holds time series metrics that are generated from serving models. In this book, we will not talk about how to build the store, but will talk about important metrics that should be captured, and explore existing options that can be used to store them.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Manning Publications

Follow Manning Publications on Medium for free content and exclusive discounts.