Accelerate Your Deep Learning Experiments with IBM’s Neural Network Modeler
Designing, then coding, neural networks is difficult and error prone, so we asked ourselves a few questions about how to improve the process:
- How can we make it easier to build and understand neural network architectures?
- How do we visually design neural network while supporting the benefits provided by different deep learning frameworks?
- How can we integrate neural network design, training, and evaluation into a seamless workflow?
- How can working collaboratively contribute to the goal of more efficiently and quickly evolving neural network architecture?
In short, how can we get to a design like this as fast as possible?
TL;DR: We are happy to announce the beta release of Neural Network Modeler. Now available in IBM Watson Studio, this preview release of Neural Network Modeler hints at a future where intuitive graphical interfaces accelerate the design of deep learning experiments by facilitating the optimization and sharing of your neural networks.
Before speaking in detail about Neural Network Modeler’s technical functionality, let’s review the business problems it addresses.
Barriers to Designing Neural Networks
Very few organizations have realized the potential of neural network modeling or made the leap to integrate deep learning into their business processes on an enterprise-wide basis. As a result, if companies are using deep learning at all, adoption tends to be fragmented. Small teams or individual data scientists in different departments are working on their own projects, using whichever tools and frameworks they prefer, without much collaboration or knowledge of what their peers are doing.
This is a perfectly viable approach if deep learning is considered a niche, experimental, or peripheral activity. But if we believe that deep learning is the future of data science, we need to get smarter about how we scale it up for enterprise adoption.
There are three primary bottlenecks that stand in the way of mainstream adoption of deep learning:
First, neural network design is a complex and highly specialized field. Although many good universities now include deep learning as part of their computer science curriculum, it may take a few more years until the talent pool is deep enough for most companies to staff enough deep learning specialists.
So how can we reduce the barrier to entry for training neural networks so more people will understand configuration screens like this?
To ease adoption in the meantime, we need tools that make it easier for today’s data science teams to design new neural networks, and — just as important — to understand, re-use and enhance networks that their peers have already created.
Lack of standardization
The second, related issue is standardization. The sudden popularity of deep learning has led to a proliferation of open source frameworks such as TensorFlow, Keras, Caffe and PyTorch. These frameworks have arisen and evolved to meet the varying needs of different data science communities and problem areas.
Once a network is designed, getting the source code should be as simple as this:
The resulting lack of standardization makes sharing and re-using models much more challenging. For example, a data scientist who typically builds neural networks in TensorFlow may find it very difficult to understand a solution coded in a framework such as Caffe or PyTorch.
To solve the standardization problem, we either need to enforce the use of a single framework across all deep learning teams, or find a common language that can describe networks in a framework-agnostic way.
The third and final issue is operational. Neural network design is just one stage of a much larger workflow, which includes the training, evaluation, deployment, monitoring, and enhancement of deep learning models.
And even the neural network design process itself is not a one-shot process. Neural network design is about iteratively conducting experiments until a network is found that meets the needs of the business.
The result of designing a neural network is a range of hyperparameters that must be tuned to determine if the network can be trained well enough to meet the target problem’s performance needs. If the answer is NO, then all is not lost. We now know how well this particular network performed against the data. And much can be learnt simply by evaluating the optimal hyperparameters than emerge from the hyperparameter optimization process.
Experimentation continues until an optimal neural network is discovered and the trained model ready for deployment. But even then the workflow is not complete. The feedback loops continues as the deployed model is continuously evaluated until the network stops performing well and the cycle begins anew.
However, a deep learning practitioner doesn’t just have to be an expert in network design; they also have to understand the infrastructure used to train their models — which typically involves managing clusters of GPUs.
To make matters worse, once a model reaches the deployment stage, they may also need to understand the infrastructure that will run the model in production in the cloud — a completely different architecture from the training systems.
Considering the scarcity of deep learning expertise, it’s vital that organizations find ways to help specialists spend most of their time working on the parts of the process that make use of their unique skills. That means we need to find a way to automate as much of the infrastructure management work as possible.
Introducing Neural Network Modeler
In combination with the beta release of a new hyperparameter optimization (HPO) microservice in Watson Machine Learning, the Neural Network Modeler gets us closer to addressing all three of the primary bottlenecks to designing neural networks.
The Modeler provides a drag-and-drop interface where users can quickly create a new neural network by selecting, configuring, and composing as many different layers as they need.
Unlike traditional approaches to network design, which are typically based on typing code into a text editor or IDE, this visual approach makes it easier for new practitioners to get started with deep learning, as well as helping more experienced teams understand each other’s designs more quickly.
Once a data scientist has finished designing a network, they can simply click to generate the code required to train the model, using whichever framework the user prefers (currently TensorFlow, PyTorch or Caffe2).
The tool therefore effectively acts as an abstraction layer across frameworks: it allows data scientists to view and re-use each other’s network designs, even if they prefer to use a different framework to actually perform the training and evaluation.
The code that is generated by Neural Network Modeler can then be packaged up with training data sets to create a “training definition.” The training definition is then published in Watson Studio, and can be used to configure and execute the training process in a GPU cluster.
For more complex projects, users can also define an “experiment” — a collection of training definitions, together with metadata that tells the GPU cluster how to conduct training sessions with multiple models or sets of hyperparameters. This empowers users to run sophisticated experiments without having to worry about infrastructure configuration and management.
Once the training and evaluation process is complete, and the best model is ready for deployment, the user can simply click to publish it in the Watson Studio repository, where other data scientists can find and use it on their own datasets.
Take the next step