ML Cloud Computing Part 1: Setting up Paperspace
When we first started our journey, one of the biggest bottlenecks I came across was the lack of access to GPU crunching power. Being a Mac user, my options were woefully limited. Compounding things is the fact Apple is using AMD cards, which lack anything near the TensorFlow driver support that is offered by Nvidia and Cuda. Options available to you are as follows:
- Build a dedicated ML Linux box with Nvidia GPUs — This is the approach my co-founder Brett Koonce went with. Cost is going to be in the high $1500 range.
- eGPUs — With the bandwidth that thunderbolt 3 now offers and the new Mac OS Nvidia driver support offered in Mojave eGPUs are now a great option. The card and enclosure will set you back close to $800.
- Cloud computing — Why buy a GPU when you can rent one as needed by the hour? Google Cloud, AWS, and Azure offer such services. Depending on the hardware, configuration cost can be somewhere between $0.20 to $5 an hour.
I opted for the cloud approach since it gives me the most flexibility with the smallest investment. And I can continue to use the Mac development environment.
In theory, this sounds great, but it was surprising how much initial setup was required. I am going to do my best to document the process and hopefully save someone who is new to all this some time. I am optimizing for two things, ease of use and cost. After kicking the tires on AWS, Azure, and Google Cloud, I ended up settling on a service called Paperspace.
Part 1: Let's get started and setup a Paperspace account and machine.
Step 1: Create an account
- Head on over to https://www.paperspace.com/
- After verifying your email, you should be in.
- Make sure to setup billing.
- Use this referral code to get a $10 credit: FUDOQSH.
Step 2: Create a Machine
- Go to https://www.paperspace.com/console/machines.
- (1) Navigate to Core -> Compute.
- (2) Select the “New Machine” button or click the big green plus.
- (3) Go ahead a pick the region that is closest to you.
- I am in SF so I am opting for the west coast.
A quick note. Many times, Paperspace will not let you create a machine unless you give them more information. Hopefully, if you have billing setup, this will not be an issue. If an annoying popup happens when trying to create, fill out the infobox and email their support. You should be good to go within a few hours.
- (4) Select the Public Templates tab.
- (5) Select the ML-in-a-Box template.
- (6) I would recommend the P4000 or the P5000.
- If you are just starting out, pick a cheaper option and when performance starts bottlenecking, you can email Paperspace and they will upgrade you to what you need… or you can create another box.
- (7) 100 GB should be more than enough. If you plan on working with large datasets you can upgrade later.
- (8) Toggle off Auto Snapshot.
- (9) Toggle on Public IP.
- (10) Hopefully you already setup billing so all you need to do now is to select create your paperspace.
- After a few minutes, your machine will be ready and they will send you an email.
- NOTE. When the machine is ready it defaults to ‘ON,’ so if you leave after this step the machine will be on and billing your account. You must come back and shut it down after Paperspace finishes provisioning your new machine.
- After a few minutes, you should get the following email letting you know your machine has successfully been provisioned.
- Make a Textmate file and copy the temporary password to it.
- (11) Your machine is now on and ready for use! Click the gear icon to get to the system info.
- Under system info, you can access all the information you need to manage and connect to your machine.
- (12) System info. You will need the hostname and Public IP later in this tutorial. For easy access save this info into your TextMate file.
- (13) Machine actions here you can shut down or restart the machine.
- (14) This is a new addition and very helpful. If for some reason you have a long job or you want to protect yourself from accidentally leaving on the machine for a month and racking up a $400 bill you can set an auto shutdown timeout.
You are all set! Next, let’s set up SSH so that it is easy to connect to your machine through terminal.
Part 2: Making it easy to access your Paperspace machine.
Step 1: Let's try to SSH into your machine using the standard way
- Launch terminal. (Cmd + v)
- Make sure you have started your Paperspace machine. I recommend downloading the Paperspace app. It makes starting and stopping the remote machine a lot easier.
- Type in the command from the email:
- You will see a prompt asking if you want to add the new fingerprint. Go ahead and type:
- It will then prompt you for a password. Use the one provided in the email.
- To quickly verify everything is working type in
lsinto the console. You should see a list of root folders on the Paperspace machine.
Step 2: Simplifying Logging into Paperspace
Having to type in your password every time is going to slow down your workflow, so this step while tedious is worthwhile. The following is taken from the fast AI setup instructions which can be found here.
- On the local machine do the following.
- If you do not have homebrew then you should install that first. https://brew.sh/
- Install ssh-copy-id in terminal type:
brew install ssh-copy-id
Step 3: Ensure public keys are available
- If you don’t have an
.sshdirectory in your home folder, create it (
- If you don’t have an
id_rsa.pubfile in your
~/.sshfolder, create it (
ssh-keygenand hit Enter 3 times)
Step 4: Copy public key to Paperspace
- Replace IP address in syntax below with your own, and run command.
ssh-copy-id -i ~/.ssh/id_rsa.pub paperspace@###.###.###.###
Step 5: Add Paperspace info to
- Make sure you are in the right directory.
- If you don’t have a
configfile, create one. This example creates the file using nano editor.
- Add these contents to your config file (replace IP address here with your Paperspace IP address).
# StrictHostKeyChecking no
LocalForward 8888 localhost:8889
- Here’s the nano command for saving file
- Here’s the nano command for exiting a file
Step 5: Use your newly created SSH key-pair
- Type the following into terminal.
Great success, no more passwords!
Part 3: Making it easy to transfer files to your remote machine.
Using the terminal to navigate the file structure of your remote machine with ssh can get tedious, especially when you are going to be uploading large amounts of training data.
Step 1: Download and install Cyberduck https://cyberduck.io/
- Cyberduck makes it easy to sync files between your Mac and the remote. This will come in extra handy when you start transferring all your ML data between systems.
Step 2: Setup your Paperspace remote in Cyberduck
- Launch Cyberduck and click the add bookmark button (1)
- (2) Select SFTP from the dropdown
- (3) Pick a nickname for the bookmark
- (4) Type in your Paperspace public IP
- (5) Put paperspace as the username
- (6) Point to your ssh private key folder
- Now close out of the bookmark
- Click on your paperspace_box bookmark
- (7) Fill in your user name
- (8) Fill in your paperspace password
- (9) Make sure to point to your ssh private key folder
- (10) Select add to keychain
- Click login, cyber duck will remember all the info after this step.
And there you have it! You can now browse your Paperspace box.
Going through this the first time can be tedious, but as you will see, it is well worth the upfront investment when you start training your own ML models. What would normally take days on your MacBook now will take only hours or even minutes.
In the next part of this series, we will walk you through setting up PyCharm to use Paperspace as a remote number cruncher.
As ever, QuarkWorks is available to help with any software application project — web, mobile, and more! If you are interested in our services you can check out our website. We would love to answer any questions you have! Just reach out to us on our Twitter, Facebook, or LinkedIn.