From idea to AI deployment: using deep learning for finger-vein recognition

Published in

AlgoDeep

8 min readMay 6, 2019

In this era of data explosion, the data scientist is becoming one of the most promising jobs. People outside this domain might think that data scientists are brilliant kings or queens of data who earn a very high salary and use complicated algorithms every day to look for treasures in tons of chaotic data. But in reality, data scientists are the ones who get dirty things done.

Coding is cool, but deploying is a real struggle

If you are a data scientist working on AI projects, we know you are suffering.

Currently, before diving into analyses and algorithms, we data scientists are asked to make sure that execution environments are ready, especially for AI projects. For each different AI project, a corresponding environment has to be set up. While issues of incompatible packages or software always happen, it takes time to fix them. In the meantime, the clock is ticking, and the deadline is coming. And so far, we have not yet written a single line of meaningful code. When we eventually finish the code and start to train models, part of our energy has to be spent to monitor the training. When training deep learning models, allocation and monitoring of one or more GPUs are other head-aching problems. As for using the model for deployment, well, it is never evident to work it out.

Data scientists are quite often requested to be generalists. They are supposed to work as data engineers, software engineers, and even system engineers at the same time. However, the valuable part of a data scientist’s work is mining useful information from data and transforming it into business value. What companies need is full-stack data scientists who have business insights, mathematics and statistics knowledge, and the ability to change ideas into algorithms by programming instead of general engineers who also build and maintain data science tools [1].

Building a data science team has become essential to any company who is afraid of being left behind in this wave of digital transformation. However, it is not an easy task to build a data science team because of the high expense and uncertain short-term profits. Therefore, it requires a more efficient process to deploy algorithms as soon as possible to transform the value of Data Science into advantages.

A dynamic data science team depends largely on a reliable data platform [2].

Life is short, deploy quickly

Is there any way to deliver rapidly in reply to the demands of clients and markets? Well, of course! Let’s talk about a project of finger-vein recognition which we swiftly carried out. If you would like to know more about details about this project, feel free to read our second article about using deep learning to perform biometric authentication by finger-vein recognition.

You often hear about fingerprint recognition of which the most famous use case was the Touch ID of iPhone. Finger-vein recognition is another biometric authentication method. Other biometric authentication systems based on fingerprints, iris, voice, and facial image are possible to be fooled with fake resources. However, it is harder to deceive a finger-vein based biometric authentication system, because it only authenticates living people.

A biometric authentication system has two possible modes, the identification, and the verification. The objective of this finger-vein recognition project serves to accomplish the two modes. Our model is a CNN model with fine-tuning a pre-trained VGG-16, which is based on this paper but was slightly modified.

The finger-vein recognition project from the idea to deployment only took us less than two weeks. During this time, we have focused mainly on data preprocessing and trying different models. An AI project is not a simple and certain project from A to B. During experiments, data scientists learn from poor results to get better ones. A quick result is essential to data scientists because we can improve models and adjust strategies according to the outcome.

Coding, collaborating, training, analyzing, deploying, and nothing else

In a data science team, each project uses different datasets, environments, and models. We always wish that data scientists could collaborate on a project without interfering with other projects. It requires that various projects are independent of each other. One way to achieve this is to use container services such as Docker. However, to write or find a proper Docker image and build it without provoking any conflict in the system requires a data scientist to know at least how Docker works and how to solve potential technical problems.

But it takes time, and it is often not secure.

In our finger-vein recognition project, we did this project mainly in Jupyter Notebook. Python 3.6 packages such as numpy, matplotlib, OpenCV, keras, and Tensorflow are required. Sometimes versions of packages should be adapted to the previous code of projects. It is annoying that we should build different images for a package with a different version.

In our project, we used the AlgoDeep AI platform to go swiftly from idea to quick deployment.

The platform automates the process of building AI projects. Once datasets are ready, data scientists can immediately start their work without having to perform any tedious task such as building training environments. Each workplace is a single container. You can choose versions of your packages for your projects. Just. Focus. On. AI.

Starting Jupyter Notebook directly in a workspace (the platform is in French)

Besides, the workspace also allows efficient collaboration with other data scientists. We can work in the same workspace. Otherwise, if a data scientist has his or her idea, it can create a new workspace within the same application.

We can collaborate without disturbing each other. Once the workspace is established, data scientists can begin right away to code! 🤩🤩

Generally, deep learning models and machine learning models in large scale spend a significant amount of time on training. Sometimes we have to leave the PC in the middle of a training session, or sometimes the Internet connection gets broken for a while, then we will get problems from notebooks or even lose our training. If we train it using bash and timex in Linux, it is hard for us to know when the training has finished and what results are.

Training the model and setting up an email alert easily

But using the platform we just have to click on one button: “start training” to launch a training session of the finger-vein model using GPUs. What’s more, we can choose to receive an email notification when the training finishes. In the meantime, we can do something else without worrying about our training. It saves lots of time compared to conventional ways of training a model!

Moreover, during a training session, training metrics can be visualized in real-time, so you know how the model performs right away without having to wait until the end of the training. And interrupting the training if it doesn’t perform well is, of course, an option.

Deploying your model

Once the training is complete, and you have selected the best performing AI model for your given task, comes the time to deploy this model, and transform it into a high-performance API.
Deploying a model is often a hassle: data scientists speak one language, while engineering & operations expect their mother tongue to be the norm. Not ideal. Using the platform, we can easily abstract all the complex steps of listing and encapsulating all versions of all libraries and frameworks used to train a model, and launch a managed container that has everything ready for high-speed inference.

In fact, it gets as easy as 1, 2, 3:

1. Select the training of the best performing model and start the deployment process (literally one click).
2. Verify that the model replies well to a request using manual entry (say for a project involving time series) or a file (say for a project involving images or more complex data). You can also add a few pre and post processing as a last minute adjustment.
3. Publish the deployment: the platform will then generate an endpoint linked to the API, expecting the data format you provided as a test, and serving the “predictions” to any application of your project!

Life is more simple with AlgoDeep

A reliable data platform is vital for successful AI projects. In a conventional system for AI projects, elements of the process are rarely automated. But AlgoDeep AI platform provides essential features for facilitating the work of data scientists such as setups of training environments, collaborative workspace, real-time performance visualization, automatic GPU allocation, and automated deployment for AI projects.

It gets easy to conduct an AI project with AlgoDeep AI platform. 🥳

References

[1]Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department. JEFF MAGNUSSON

[2] Beware the data science pin factory: The power of the full-stack data science generalist and the perils of division of labor through function. ERIC COLSON

It is important to note that this amount of autonomy and diversity in skill granted to the full-stack data scientists depends greatly on the assumption of a solid data platform on which to work.

[3] Hong HG, Lee MB, Park KR. Convolutional Neural Network-Based Finger-Vein Recognition Using NIR Image Sensors. Sensors (Basel). 2017.