Project Talks: Deep Learning Framework Comparison

10 min readJul 4, 2023

Deep learning frameworks used in our project

Are you interested in AI and want to learn about the software used to build deep neural networks like the ones that power ChatGPT and Stable Diffusion? Read more below about our WAT.ai team members’ experiences working on our project comparing the performance of various deep learning frameworks like TensorFlow, PyTorch, and more!

What is WAT.ai?

WAT.ai is a student design team that is focused on building undergraduate talent in artificial intelligence at the University of Waterloo. Through participating in 8-month long projects, team members get to build relevant and impactful experience through working on research and industry applications of AI. The Deep Learning Frameworks Comparison (DLFC) project was one such project that ran between September 2022 — May 2023.

Many of our core members who joined the frameworks team joined because they were passionate about AI and wanted to develop skills in deep learning. People with different backgrounds and skill sets can join the teams. We had a few new members who had never touched deep learning and were eager to learn! Team members could collaborate with like-minded individuals and learn from each others’ experiences. Plus, the project is a great thing to highlight to employers on a resume!

WAT.ai members also get a chance to attend cool events like the Canadian Undergraduate Conference for AI (CUCAI). WAT.ai also hosts its own education sessions and events at the University of Waterloo, open to anyone interested!

What is your project all about?

The goal of our project is to compare the speed of various software frameworks used to build neural networks. Currently, the two most popular deep learning frameworks are TensorFlow and PyTorch, both in the Python programming language. However, there are many other open source alternatives in both Python and Julia that often claim to have speed benefits over the others. We built a couple of networks from scratch and evaluated their training and testing speed on standard datasets. Each team member was responsible for a different framework, including TensorFlow, PyTorch, MXNet, and JAX in Python, and in Flux and KNet in Julia.

Hear from our team members!

We asked our team members about their experiences on the project, here’s what they had to say! Note, the responses were edited for length and clarity.

Can you introduce yourself?

Trevor: I am a 1st year MASc student in Systems Design Engineering and a Technical Project Manager (TPM) of the DFLC team.

Anusha: I am a 4th year BMath student in Data Science and a TPM of the DLFC team.

Musaab: I am a 4th year BASc student in Nanotechnology Engineering and a Core Member of the DLFC team

Urban: I am a 4th year BASc student in Mechatronics Engineering and a Core Member of the DLFC team.

Yash: I am a 3rd year BASc student in Management Engineering and a Core Member of the DLFC team.

Why did you join WAT.ai?

Trevor: I joined WAT.ai because I wanted to be a part of a new design team focused on developing skills in the AI field through unique projects.

Anusha: I’m quite interested in working with data but lack experience in the AI side of things. I wanted to get more experience in AI and ML and wanted a way to apply what I learned in my coursework.

Musaab: I have mostly delved into AI projects and learning by myself, so this gave me the opportunity to work with a group on a project. I also saw this as a great learning opportunity and introduction into the various frameworks that are seen in industry.

Urban: I joined WAT.ai because I was interested in joining a design team working on AI specifically and to also further enhance my knowledge in the field as I wrap up my studies.

Yash: I wanted to gain experience learning more about deep learning and developing my skill set in AI. Furthermore, I wanted to gain more technical experience that I could put on my resume, which also contributed to why I joined WAT.ai.

How was your experience with your framework, did you enjoy using it?

Trevor: As a TPM, I worked with several frameworks to assist people in their work. I enjoyed learning about how each of the frameworks do different things and gained an appreciation for how the popular frameworks like PyTorch and TensorFlow have made an effort to make their APIs easy to use.

Anusha: I worked with Julia’s Flux library for this project. This was my first experience with Julia, so it was a lot of fun picking up a new programming language. Additionally, I believe I got to learn more about Neural Networks through this library since the models were not as easy to implement as in some of the popular Python frameworks. Implementing everything from a lower level gave me a deeper understanding on why certain things work.

Musaab: This was my first experience using Mxnet, but I will say it was a fun one. I was fortunate that it shared a lot of similarities with PyTorch, and so I was able to merge a lot of my knowledge and concepts and apply them on MXNet.

Urban: I focused on JAX. I have used other frameworks before such as TensorFlow, Keras and Pytorch but never JAX. Overall, it was much more challenging than any of the other frameworks as you needed to build all the components from scratch such as defining all vectors/matrices for all the weights and parameters, manually updating them and performing activation functions. While I did enjoy working on the network at a lower level, it was difficult and harder to learn when using JAX.

Yash: This was my first experience using the framework KNet, which focuses on using Julia. I found KNet interesting, as it helped me go out of my comfort zone and learn a new programming language. Furthermore, KNet is recognized to be catered for research and is not heavily utilized in the industry. Overall, I had challenges using the framework KNet, but I learned a lot and developed my skill set.

What were some of the struggles you found using your framework, and how did you resolve them?

Trevor: I learned a lot about the functional point of view of creating deep learning models through working with Flux, KNet and JAX. It was a very different way of thinking compared to the object-oriented viewpoint of most of the Python frameworks. JAX was probably the most challenging to use and I had to consult with their documentation a lot throughout developing the models and training code. However, it paid off to see the performance benefits of using the Jax framework!

Anusha: Since Julia is mostly used in academia rather than the industry, it was quite hard to get tutorials on doing specific things, unlike PyTorch and TensorFlow, which have a lot of tutorials. However, Julia, and specifically Flux, has tons of documentation and that helped me resolve some of the problems. It also taught me the importance of having good documentation and the importance of going through documentation in detail as it can provide a lot of helpful information.

Musaab: I think the major struggles I faced were mostly to do with exploring an outdated function or documentation. Several things may not have also been implemented on MXNet and/or were no longer present, and workarounds took a bit to figure out.

Urban: As mentioned, the low level nature of JAX made the initial learning curve much more difficult. Additionally, the documentation, although not bad, was lacking for certain types of models. The difficulty plus limited examples made development difficult. Ultimately, the only way to resolve this problem was to gain a really strong understanding of the underpinnings of JAX and to be very knowledgeable on the details of neural networks. You just have to put in the grunt work.

Yash: When I encountered an issue, I immediately looked at the documentation for KNet to find a possible solution to my issue. Most of the time, it worked. However, sometimes I would discover the documentation was outdated. This resulted in me asking my peers for help who have experience utilizing Julia with a similar framework. They helped out and provided feedback on implementations for different tasks I had.

What were some key takeaways you got from being in WAT.ai?

Trevor: There are a lot of keen students interested in learning about AI and I’m excited to continue with this team in the next year to develop more exciting projects!

Anusha: Firstly, I learnt how to pick up a new programming language quickly. I also got to enhance my skills for reading and understanding documentation, while learning the importance of having good documentation. Most importantly, I got to learn quite a bit about image processing and implementing Neural Networks!

Musaab: Biggest takeaway was that there is quite a lot that I have to learn, and I am excited for this journey in my learning.

Urban: For one, I got a better understanding of neural networks. I got exposed to JAX which requires more work but in the end gives you more precise control — if you want to build more performant solutions, also supported by our findings, JAX seems to be a good choice. Overall, it reminded me of just how much more there is to learn.

Yash: I was able to learn and further develop my skill set in deep learning. I was also able to network with other individuals and meet a lot of cool people. Through WAT.ai, I was able to learn what kind of a person I am, allowing me to grow and learn in machine learning.

Project Details

Now that you’ve heard a bit more about our team, let’s dive a bit deeper into our project. As a recap, each of our team members used a different framework to build neural networks from scratch. We then compared the speed of training and evaluation of the networks. We also did research into the differences of the mechanics of the frameworks and compared how easy they were to use as a developer.

The Python-based frameworks TensorFlow and PyTorch are “beginner friendly” and good choices for someone taking their first step into deep learning. There were also more challenging Python frameworks like MXNet and JAX, which people worked with. JAX, in particular, takes a functional programming approach to writing neural network code, which is quite different from the object oriented approach of TensorFlow and PyTorch.

Furthermore, some of our members took an extra challenge to learn frameworks based on Julia, namely Flux and KNet. Julia is a newer programming language that has many similarities to Python, MATLAB, and R, and is designed for performance, especially for numerical and scientific computing. So on top of learning about neural networks, those team members also learned a new programming language.

Each team member used their framework to build, train, and test a network on two tasks: A multi-layer perceptron (MLP) applied to the MNIST dataset and a ResNet convolutional neural network applied to the CIFAR-10 dataset. We’ll break down those two tasks below!

Task 1: MLP on MNIST

The first network we built is a multi-layer perceptron (MLP) to do image classification on the MNIST handwritten digits dataset. MLPs are considered the simplest neural network, though the structure is also used in more complex architectures. The structure of an MLP consists of an input layer, then one or more hidden layers, and then an output layer. Each layer consists of artificial neurons, which process inputs by a weighted sum, then applying an activation function. The output of each neuron is sent to all neurons in the next layer. In our task, we use the MLP to classify a handwritten digit from the MNIST dataset into one of ten classes. Through building a MLP, we also learned about the other key concepts in deep learning such as data splitting, loss functions, optimizers, and metrics.

MLP neuron structure. Image source: https://www.oreilly.com/library/view/r-deep-learning/9781788478403/0c4ae722-74b3-422b-a67d-4b21e4aa1c96.xhtml

Task 2: ResNet CNN on CIFAR-10

The second network we built was a convolutional neural network (CNN) to do image classification of the CIFAR-10 tiny image dataset. CNNs use the convolution operation as a way to efficiently process images by applying shared filters across the spatial dimensions. Similar to how images have RGB colour channels, CNNs also process images in different channels and can combine information from across channels to create higher level feature representations, which are fed into an output classification layer. In this network, we implemented the ResNetv2 architecture. ResNet is a popular architecture used for image processing and was the foundation for many innovations in subsequent neural network architectures. Through this task, we also learned about data augmentation and how to use GPUs for neural network training.

CIFAR-10 tiny image classes and examples. Image source: https://www.cs.toronto.edu/~kriz/cifar.html

ResNetv2 block architecture from “Identity Mappings in Deep Residual Networks” (He et al., 2016). Image source: https://arxiv.org/abs/1603.05027

Results

After testing all the frameworks, we found that JAX was one of the fastest frameworks in both the MLP and CNN tasks, having an impressive 8.5x — 17x speedup over the other frameworks for the CNN per-epoch training time metric on GPU. For more details on the code, our methods, and results, please refer to our GitHub page and the paper we submitted to CUCAI.

Wrap up

That’s the end of our Project Talks post! Thank you so much for taking the time to read our blog. Feel free to contact our members to ask questions through email or LinkedIn. We hope that you are eager to be part of the team and be our companions in the journey of deep learning! If you are also further interested in learning more about what we did, check out our project’s GitHub and paper.

Lastly, if you want to stay up to date with WAT.ai, we encourage you to follow the WAT.ai LinkedIn page. Stay tuned for posts about how to join new projects and AI education sessions running in September 2023!