Code like a pro(-ish); right from 101 — Tools from a Deep Learning Perspective

Shubham Agarwal
Analytics Vidhya
Published in
7 min readJan 27, 2020
Photo by Kevin Ku on Unsplash

TL;DR: This post is about the following items which I believe are helpful when you start coding in Python as a Deep Learning practitioner. Hope this reading helps you learn at least something new and relevant!

1. IDE: PyCharm + configuring remote SSH
2. Environments: Conda, pip and Python3
3. Terminal Multiplexer: Tmux vs Screen
4.a Github Repositories: Contribute PRs; raise issues; open-sourcing; GSoC
4.b Gists: Maintain your frequently used code
5. Notebooks: Visualize your data + configuring remote SSH
6. Deep Learning specific:
6.a: Framework: PyTorch vs Tensorflow
6.b: Shell scripting: A step towards reproducibility
6.c: Reusing code: Write less code and do more research!
6.d: Monitoring usage: some gist
6.e: Debugging DL codes: nice suggestions by A. Karpathy
7. Code quality: Comments, naming convention, Clever vs Simple

Suppose you are making a curry/meal (DL model) and you can easily find the recipe (Mathematics/Concepts). But the ingredients (tools) you use directly impact the end result. Knowing the right tools should not be a luxury but a necessity to get the desired results/research. This reading discusses mostly some of the best learning tools out there for Python. Python developers have recently outnumbered Java in general and also it is the most popular language among Deep Learning practitioners. Reference: this, this, this and this. Let’s dive into the ingredients necessary for building DL models.

IDE:

Never underestimate the power of IDE: PyCharm to the rescue

IMO, an IDE can drastically reduce your coding and debugging time while also helping out with the industry standards. I loved coding with RStudio (IDE for R) as a Data Scientist. When I switched to Python, however, I struggled to find a proper IDE. Asked so many people around, Sublime Text was mainly a viable opinion back in those days. As of now, I know 2 IDEs which are really cool for Python programmers: VSStudio and PyCharm.

I love PyCharm and most of the further material is about it. I highly recommend installing PyCharm as Python IDE (especially if you are on Windows). Go on the website and download the community version which is open-sourced and free. Some very basic functionalities that I made while learning PyCharm — obviously not the complete list but can get you started. PyCharm is much more than that. PyCharm also provides multiple extensions such as md viewer, jupyter and bash support (to name a few).

SSH mounting: If you are doing serious work in DL, chances are that you develop locally and run your code on servers.
1. One option is to use sshfs to mount your remote server’s repository on a local machine. Here code actually resides on the server and creates a kind of “symlink” on your machine. Everything you do would eventually reflect on the server.
2. With PyCharm’s ssh mounting functionality, your code resides on your local machine and it works in the complete opposite way, projecting and synchronizing the code on servers. It has another advantage of distributing your code on multiple servers. Say you have resources (GPUs) on different servers with no common underlying directory structure, PyCharm allows you to maintain the same exact replica of your local code on all the servers. Plus you can directly debug and use the conda environment of the servers.

Follow my gist to configure ssh mounting using PyCharm.

Environments for Python

You might be working on different projects simultaneously, some require Python2/3.7/3.6. I personally recommend switching to Python3+ and the latest version of the libraries. But if you are replicating code, then probably sticking to specified requirements makes sense to avoid breaking their code. In any case, I personally create different environments for different projects and favor Conda over virtualenv. See some of the instructions for installation reference.

Upgrade to Python 3+
Switch to the latest version of Python/PyTorch and all these bugs would disappear!

Terminal

When you are a coder, you definitely need a terminal multiplexer (see the figure below). I used to follow Screen and then switched to Tmux. See this for Tmux vs Screen. Also, see this tmux cheatsheet for reference. Very handy!

On the same note, I personally favor bash for my experiments while I have also tried iTerm and ZSH. Some comparisons and blogs. Whatever suits you!

Git — Github / Gists

Have all your code updated in repositories on git. This allows you proper version control, cross-platform sharing of your code and also protecting your code if any mishap happens. See this public gist on how to install git. I personally prefer Github — since 2019 Github allows you to make private repositories.

There are a lot of concepts about git like branching, merging, cherrypicking, etc. which would come, by practice. See this for resources. My personal suggestion is to contribute a lot to other open-source repositories — raise PRs/issues: 1. You try to pick up coding standards from other people. 2. You are benefitting society as well as your own profile. 3. It gives immense pleasure as a coder when your PR gets accepted in a famous repo.

As you can see, I maintain a lot of Git gists for the codes that I am pretty sure I would re-use. I definitely recommend the same. IMO, this promotes writing clean and standard functions/classes (OOPS concept) reusable across different projects.

If you are an Undergrad/Masters student, I would definitely recommend trying to contribute to an open-source organization using programs like Google Summer of Code (GSoC). See my blog for references. It is not impossible but takes up some effort — they just want to see how enthusiastic you are and if you know basic coding skills (such as the OOPS concept). I made a dashboard in Shiny (R) as a sample for my GSoC application. :) Be innovative!

Notebooks — Jupyter

Be a professional — Focus on data!

Most of the available datasets have some kind of biases. It is good to know before-hand what kind of data we are dealing with. Garbage In==Garbage Out. Data analysis is really important and jupyter notebooks come really handy for this. Install jupyter in existing environment asconda install jupyter. You can start the jupyter on a server and access it using ssh as mentioned here.

DL Related:

1. Framework

While there are a lot of articles that differentiate between available DL frameworks (like this, this, this and this), two of them are most popular: PyTorch (among researchers) and Tensorflow (in industry). PyTorch is supported by FB and Tensorflow by Google. I have used both of them for different projects and I personally find PyTorch codes to be more pythonic and easy to debug because of dynamic graphs. While doing research, if I find a really nice code in TF for my project, I would stick to it (at least for replication) otherwise for reimplementation PyTorch becomes my first choice. There is no clear-cut winner and it should be you to decide which is more intuitive and according to the application (research vs deployment). I have heard really nice feedback about TF2.0 with Keras APIs.

2. Shell scripting

Shell scripting eases the flow for pipeline/modular python codes while passing arguments deterministically. It is also very important for reproducibility. I try to use relative paths when running python scripts from the shell scripts. Some very basic commands here and here.

3. Monitoring your model run

Use Tmux to split panes. See this gist to actively monitor your model run. Try to time your code as well. See this gist.

Use Tmux to split panes. See this gist to actively monitor your model run.

4. Reusing code

Some researchers told me “Write as little code as possible. Do more research!” which I totally agree with. Your research ideas matter more than the code (if you are doing a Ph.D.). Always try to find a starter code and build on top of it. Or if you are following a research paper and they provide the code, start by replicating their code. Don’t hesitate to mail them and ask for help. Never suffer alone and ask for help!

While some senior researchers argue that we should start implementing from scratch, however, the DL field is evolving so rapidly, things would not remain static. Kind of a trade-off IMO. Eg. seq2seq ruled NLP for some time but now everybody (mostly) has shifted to transformers and their variants. Most of the people I know, build upon this cool transformers repo by HuggingFace.

I disagree with this. Try to build on top of other’s code compared to implementation from scratch.

My two cents: look for a repo with at least >10 stars or forks (to get trust about the code being mostly bug-free) and last commit within a month. There is no hard and fast rule. You can definitely explore to see the code which you find understandable.

5. Debugging DL codes

A nice recipe blog post by Andrej Karpathy and another nice resource. My piece of suggestion: Modularize as much as possible and write test cases for every specific module. Model bugs are really difficult to find, modular unit cases are the safety nets.

Code quality

It is really really important to add comments in your code — I consider it as a message to your future self or anybody who is reading it. I try to follow the Golden Rule while coding “do as you would be done by”. Use instructive names for functions/variables/classes. At least read once in your lifetime Google’s Python Style guide. Use typing in Python 3.6+ (Reference and cheat sheet). I personally also use my initials (like SA: commit) to differentiate between my commits (on git) while collaborating in an organization. This helps me to refer my commits quickly. Also, referring to important changes and issues fixed. Use branches when working with multiple people and take responsibility for your code. See a very nice write-up by AI2 researchers about the coding practices in general. Write clean, simple, decipherable vs clever code.

Clever vs Simple coding by @fchollet

Thank you for being a brave reader! Tools always change, however, coding practices usually don’t. These are some of the best tools out there IMO as per January 2020.

Coding is like an art — the best coders are always humble; they always believe that there is room for improvement. Learning never stops! I also have the thirst to write the perfect code. Write simple, clean, concise, reproducible and responsible code. Happy coding!

Acknowledgment: Thanks to everyone who helped! Special thanks to Team Alana, where we also tried to focus on coding practices while participating in Alexa Prize 2018. Also to Raghav Goyal while we collaborated in Visdial Challenge 2018. Feel free to drop a response and your suggestions for collaborative improvement!

--

--

Shubham Agarwal
Analytics Vidhya

Learning never stops! Opinions my own. Grad Student. Multimodal Conv AI. Homepage: https://shubhamagarwal92.github.io