Democratizing GPR (Ground Penetrating Radar) with deep learning
A high-performance and cost-effective deployment using PyTorch and AWS Inferentia
The problem
At Screening Eagle, our mission is to protect the built world. We provide hardware and software for buildings and infrastructures health. Ground Penetrating Radar (GPR) technology is one of our core products. GPR is a well-established method of monitoring civil structures. It uses radar waves to image inside walls and underground, and it allows mapping supporting structures inside walls as reinforcement bars or pipes buried underground, without the need to break nor dig anything.
GPR is extremely useful but requires highly trained specialists to read its output. The elements detected by GPR are hard to see, hidden in noise, and both the physical characteristics of the objects and of the background contribute to the final image making interpretation difficult. Very specialised knowledge is required to understand the output, and insights are available after a long time of extensive manual processing and visual inspection.
At Screening Eagle we simplify this process by using deep learning to automatically identify elements present in the GPR image.
We use computer vision to recognise what objects are present and where they are, a task commonly called segmentation. Segmenting an image means that every single pixel is assigned to a category (e.g., ‘water pipe’ or ‘reinforcement bar’ or just ‘background’), so as to have a high resolution description of the content of the image.
With regards to the framework used, PyTorch is a natural choice for teams that want to be at the forefront of computer vision research. Its community is growing rapidly and it is very active, with the latest developed architectures readily available for implementation. PyTorch is flexible and allows us to experiment with different architectures and gives the freedom to combine them, and generate custom models adapted for our specialised task.
Synthetic Data Generation
A good model is nothing without good training data, but good labels for GPR images are scarce. The location of something inside a concrete block must be determined with millimeter accuracy. There are only three ways to have precise labels to guide the AI training: (1) we know in advance how the block is formed, scan it with GPR and use the prior knowledge as targets; (2) we have a given block, we scan it with the GPR and then we break it to see what is inside; (3) we ask experts to judge what it is represented in GPR images, but in the case the labels would be only as precise as the experts are. All these options are too costly and not practical in real life. To have a sufficient and precise dataset to train a deep learning model, we concluded it was more efficient to create our own data by simulation. We do so using GprMax, an open source software for computational electromagnetism.
This approach has many advantages. Firstly, there is complete control of the ground truth of each and every one of the samples (utilities location, material features). Secondly, uncommon conditions can be simulated, which contributes to the detection and interpretation of anomalies in real structures. Finally, we can avoid biassed data sets adding random variability to our simulated structures or scenarios. We are living the dream of having on-demand and virtually unlimited data! With tools like GprMax, we obtain the simulation as a simple matrix, which is perfect to be used as PyTorch input. It also allows us to iterate rapidly over our research.
From Wrappers To Our Own Library
At the beginning of our journey, we integrated PyTorch with a popular wrapper, PyTorch Lighting, which aims to abstract Deep Learning boilerplate and let us focus on the science. Both PyTorch and PyTorch Lighting are intuitive to use and very well documented. The segmentation results came in quickly and were promising. Soon we got excited to test the newest segmentation model and customise them for our specific needs. The more we experimented, the more the wrapper started feeling too tight for our goals. Custom functionalities were hard to implement, and the high level of abstraction in PyTorch Lighting was becoming a complication more than a time-saving solution.
At that point we were ready to jump out of the wrapper and to leverage PyTorch’s full power. While we built our own framework from scratch, we still heavily rely on PyTorch building blocks. Like this we got the best of both worlds: abstraction when launching training sessions, and more control over the training and validation for-loops. Additionally, writing our own framework gave us the chance to learn the ins and outs from PyTorch by going deeper into detail, which in the end was a pleasurable and fruitful learning experience.
Infrastructure
While we were working on improving the segmentation results, and on creating realistic and large synthetic datasets to accomplish the task, the amount of time and the resources requested by the process was growing exponentially. We turned to Distributed Data Parallel (DDP) processing to decrease the computation time. At its core, DDP is a way to achieve data parallelism, to replicate a model across multiple GPUs. Our own implementation includes the usage of parts of the PyTorch library such as torch.distributed, torch.multiprocessing and DistributedSampler.
Thanks to these tools, we were able to implement our custom Training loop using DDP, by spawning multiple processes and distributing the dataset (with the DistributedSampler) and model (by instantiating a new model using DistributedDataParallel class) across multiple machines.
The code lines that achieve what’s explained above are as follows:
dist.init_process_group(backend=’nccl’, init_method=’env://’, world_size=world_size, rank=rank)self.distributed_model = torch.nn.parallel.DistributedDataParallel(self.model, device_ids=[gpu], find_unused_parameters=self.find_unused_parameters)train_sampler = DistributedSampler(self.train_dataset, shuffle=self.shuffle, num_replicas=world_size, rank=rank)valid_sampler = DistributedSampler(self.valid_dataset, shuffle=self.shuffle, num_replicas=world_size, rank=rank)
We achieved a significant speed-up for our training, and full control over the loop procedure.
To make sure that all the operations connected to training the model run smoothly, we need a large infrastructure and significant hardware and software management skills. In Screening Eagle, a dedicated team has full control and care of the infrastructure, so that the AI research team is relieved from managing infrastructure and is fully focused on the AI work.
Kubernetes as an orchestration platform, provided us with the perfect solution. Kubernetes is an ‘open source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation’. It uses manifest to specify the desired state of an object that Kubernetes will maintain. A manifest is a specification of a Kubernetes API object in JSON or YAML format. Using kustomize (kubernetes tool) and CRDs (Custom Resource Definitions) our SRE Team made for us an abstraction layer to make the manifests fully customizable with a little effort and knowledge about kubernetes manifests. As researchers, we just need to fill in a kubernetes manifiesto with the specifics of the task we need to run (like number of gpus, type of hardware, hyperparameters, model, dataset and so on), and let the cluster handle the orchestration of hardware and pods, instances scaling and driver management.
One additional issue we found at first, was that we used drivers installed directly on the host, but we quickly moved to kubernetes operators. This allows us to fully allocate the responsibility of driver management to the software in a kubernetes way (declarative using manifests), instead of manual installations or classic provisioning techniques.
The tasks of generating synthetic datasets, and the training-testing cycle requires the coordination of complex workflows to parallelise operation and therefore speed up the process as much as possible. Metrics collection plays a fundamental role in this. Metrics are the only way to get insights on how things are working, so we try to have metrics for every single process. GPU Usage metrics allow us to scale up and down our cloud instances. Training metrics guide the decisions regarding the viability of a training.
Currently we are working on improving these metrics and processes even further to move on to Auto-Machine Learning. We collect our metrics using Prometheus, the standard monitoring tool into the kubernetes world. On top of it, our custom tools, services and training software needs custom implementation: here is where Argo Workflows comes in.
Argo Workflows is an orchestrator that runs on top of kubernetes that allows us to declare complex workflows using kubernetes manifests. For example, to create a new synthetic dataset, we perform mainly four steps:
- We create thousands of material patterns by using thousands of materials definitions and parameters. Based on thousands of materials definitions and parameters, we create thousands of material patterns that will be used on the next step. Based on the number of samples we want to create we define on the workflow a level of parallelism to be used, Argo creates tons of pods to be executed into this step and Kubernetes tries to execute all of them using and also scaling up cloud instances.
- When all the pods of the previous step finish, a new set of thousands of pods are created to use these materials to render our datasets samples.
- Now that we have all the samples a new step creates different versions of the samples using image transformations.
- Finally the new dataset is validated, defined into our research cluster as a new PVC to be claimed by the training pods and uploaded to S3 to make it available for another case of use on the cloud and also as backup.
Optimizing Deployment for Scale with AWS Inferentia
The implementation of this infrastructure abstraction was a powerful breakthrough and allowed us to advance fast toward the solution of our case. Once our demands started to increase, a natural evolution was to look to the cloud to scale. We chose AWS cloud as it offers the broadest and deepest portfolio of compute resources.
We moved to a hybrid cloud, using cloud instances to our clusters to perform AI tasks alongside our on-premise machines.
A fundamental advantage of using a hybrid-cloud is instance autoscaling. Autoscaling allows us to ask for specific hardware type or number just specifying it on the manifests, and because on-premises machines are fixed and are not able to scale up on-demand, cloud instances help us to fit our needs dynamically joining new instances to our on-premise clusters.
Our solution has the benefit of providing on-site and real-time insight into what is underground. It can also run on any portable device and create segmented maps of any GPR image uploaded by the users. The challenge here is to maintain the high-speed computational power in a cost-effective manner. To overcome these challenges, we looked at Amazon EC2 Accelerated computing portfolio.
After exploring a few options, we found that AWS Inferentia powered, Amazon EC2 Inf1 instances were the best fit for high performance, and cost effective ML acceleration.These instances and AWS Neuron SDK allowed us to maintain hardware portability and take advantage of the latest technologies without being tied to vendor-specific software libraries. We adapted our docker images and software to be able to launch inference tasks into Inf1 instances and we adapted our clusters to be able to ask and manage Inferentia instances. So now engineers can ask for unlimited Inferentia machines just asking for neuron devices and we’re taking advantage of really powerful hardware, while reducing inference costs by 50% when compared with gpu based instances on AWS.
Our solution involves the following tools:
- PyTorch-Neuron tracing API to trace the model
- AWS Inferentia instances and AWS Neuron, its own software development kit (SDK), to keep GPU-level speed while having a 50% lower cost in production
- inferencer-framework library in conjunction with the NN-Inferencer library, C++ multi-platform/multi-target libraries that respectively perform full deep learning pipeline and inference
PyTorch-Neuron provides a JIT traced PyTorch exported model, which also is the main format required by the inferencer-framework library. The tracing is accomplished by the python API in one line only:
model_neuron = torch.neuron.trace(model, example_inputs=[image])
We use a GoCD CI/CD server pipeline for exporting the trained model. It runs a Kubernetes pod with a Docker container that runs a GoAgent with the Neuron SDK, PyTorch, Pytorch-Neuron and other compilers and development tools.
For inference we use a micro-service based on a CLI application running in the cloud that provides a high-speed inferring data on-demand.
A CLI prepares the data for inference by using the Inferencer-Framework library API. This library abstracts the fundamental processes required for inference: capturing or getting the input data, data preprocessing, launching the inference process, and then post processing the output according to the application requirements.
Inferencer-Framework is able to work using batches of data. This modality considerably speeds up the computational time by always running and continuously performing each stage for a stage being prepared when the next one requests it. The inference step is carried out by another library: the NN-inferencer. This is responsible of processing the input and setting it as the neural network model expects, running the inference and processing the output coming from the neural network model.The diagram splits in two parts:
- Load configuration: The configuration for the inference processes for capturing, preprocessing and postprocessing must be loaded from YAML files. Each configuration file belongs to a neural network model and an application.
- Initialise all the subprocesses in parallel threads with core affinity for optimising the resources of the CPU that manages them.
The core process of the library is the inference block. It has four major blocks:
- First, the application captures input data as a tensor for detections.
- After capturing, the data should be preprocessed to conform to the input format of the neural network model.
- Then, the inference process is launched where the NN-Inferencer runs its own processes.
- After that, the output of the neural network model is post processed for the application requirements. For example, if it is a mask and must be filtered by a threshold.
The NN-inferencer library is written in C++, which makes it multi-platform/multi-target compatible, thanks to a CMake project.
The current technology supported is libTorch (the PyTorch backend implementation in C++). As it will run on Inferentia based instances with the JIT traced exported model with AWS Neuron SDK, it has to be compiled linked with libtorchneuron.so library located in the torch_neuron/lib/ package directory, together with libTorch libraries (CPU support only).
These lines are an example for the links using CMake:
# Linking libTorch library (base link and CPU support)target_link_libraries(${PROJECT_NAME} PRIVATE “$ENV{LIBTORCH_PATH}/lib/libc10.so”)target_link_libraries(${PROJECT_NAME} PRIVATE “$ENV{LIBTORCH_PATH}/lib/libtorch.so”)target_link_libraries(${PROJECT_NAME} PRIVATE “$ENV{LIBTORCH_PATH}/lib/libtorch_cpu.so”)if (USE_AWS_NEURON)# Linking libTorch AWS Neuron librarytarget_link_libraries(${PROJECT_NAME} PRIVATE “$ENV{LIBTORCH_AWS_NEURON_PATH}/lib/libtorchneuron.so”)target_link_libraries(${PROJECT_NAME} PRIVATE “$ENV{LIBTORCH_AWS_NEURON_PATH}/lib/libnrt.so”)endif()
Conclusion
We made incredible advances in automatizing the detection of bars inside concrete. Our software will soon incorporate the model, saving users time and resources and democratizing the use of GPR technology. Thanks to PyTorch, the journey to a working model was relatively easy and we were in the position of experimenting as much as we needed. Once we had a working model, deploying it cost-effectively, at scale without losing performance was key and this is where AWS Inf1 instances helped.In order to translate the research work into a product our users can enjoy, Python’s compatibility with AWS’ Inf1 instances was essential. The potential for AI in this industry is still all to be discovered, and as our ambition grows, so does our team! We are always looking for talents to join us in our journey.
This article was written by the Screening Eagle AI team: Selene Gallo, PhD, Senior AI research engineer, Antonio Sanchez, research engineer, Luis Redondo, PhD, Team Lead, Guillermo Del Valle, research engineer, Diego Torres, Senior Engineer Expert Programmer, Julian Martinez, SRE Lead and Jesús Hormigo, Chief of Cloud & AI Officer. Together, we aim to transform the GPR industry using AI! We are a young and dynamic team, and concentrate in a few heads a great deal of knowledge in Computer Vision and operations. We are always looking for new talents to join us! Reach out if you are interested!