So at this point in FIRE, you should know you can’t just feed sentences into a neural network.

You’re missing the keyword:

TENSORS!!!

That sounds scary but it’s really just a fancy word for arrays; trust me.

I will be demonstrating how to train a RNN to write like Shakespeare.

First we will we have deal with data.
Then we will define our model architecture and finally, do training.

For simplicity, the network will predict on the character level. So given a sequence of characters, predict the next characte
. (See what I did there?)

Keep in mind while the…


So in order to work with the matterport Mask-RCNN repo on Kaggle, we need to set up some things, and then we’ll use git to load the repo onto a Kaggle kernel.

Creating a Kaggle Python Notebook

Once you get on the competition page, select the “Kernels” tab, then the blue “New Kernel” button. On the pop-up, choose the Notebook option.

=Select “New Kernel” on this screen.

Connecting to the Internet (and Other Settings)


This will be short (although that’s what I said to myself last time).

Let’s say we have a generator in Python that looks something like this:

def generate_data(data):
i = 0
shuffle(data)
while True:
# if reached end of data, restart and reshuffle
if i >= len(data):
shuffle(data)
i = 0
yield data[i].image, data[i].mask # move forward to the next image
i += 1

To make our lives easier, we’ll assume we have a method that applies a particular augmenter (from the imgaug library) to some data. It has the following signature:

def augment_data(data, augmenter):
# method implementation

We care…


What is the first thing you should do?

SSH into the server!

The very basic command should be something like:

ssh user@[ip-address]

I will first give a tutorial on the configuration setup. You can skip this if you just need to restart JupyterLab.

Configuration

So first, I highly recommend using an environment:

pip3 install --user pipenv

This will install pipenv for the user. You can read more here:

Now, if you just launch JupyterLab directly in this current SSH shell: You will find that once you close the SSH connection, you terminate JupyterLab!

So what do we do?

We use tmux…


So far, you might’ve been okay with just writing and sharing code via Google Drive, email, or being physically next to each other, but we can use Git (and GitHub to make it all much easier).

If you just want the commands, skip down to “The Commands.”

The Why

Here are common reasons to switch to a version management system (Git is the most popular one):

  • share and collaborate on code with multiple people
  • make a backup of your code (e.g. “I tried something that broke my code, I need to roll back to a version before I broke it.”)
  • release your…

For more details, check Google’s documentation.

Your data set is supposed to be stored in a drive attached to your instance. You’ll have to mount the drive to access the data. So here’s how.

After you’ve logged into the instance, in the terminal, use the lsblk command to list all attached disks and find the disk that you want to mount.

$ sudo lsblk
In this example, sdc is the device ID.

Then create a directory [DATA_DIR] as the mount point for the disk. This would be the directory where you can access the data.

$ sudo mkdir -p [DATA_DIR]

Use mount to mount the disk to the…


Quick Background on Mask R-CNN

Mask R-CNN is a machine learning model that generates a bounding box and mask for each instance of an object. It’s an extension of Faster R-CNN with an added mask. This is a great foundation for instance segmentation projects and has produced better results than other models.

I will be using the nucleus data in ASN 6 as an example.

All the training code for this example can be found here

Remember to git clone to copy the repository!

Training

First step, load the data.

For your custom dataset, you should create a class with three methods that allow you to…


So according to the kaggle posted here: https://www.kaggle.com/c/umd-fire171-asn6-image-segmentation-challenge-2019

“The pixels are one-indexed and numbered from top to bottom, then left to right: 1 is pixel (1,1), 2 is pixel (2,1), etc.”

So we need some way to encode this run length encoding into a tensor.

Woah right? A tensor? Sounds scary right?

Nah, it is basically just an array of arrays and you can do array operations. In this case we just need a binary array where 1 indicates which pixels belong to the instance.

This conversion is important because we get better spatial information from it. (I think most…


Shout-out to Tim! So this is where we left off last week from Tim’s blog:

https://medium.com/@tmthylin/writing-an-image-data-preprocessor-using-davis-2019-9ebc45702ca3

First, take a look at this:

Image from the DAVIS dataset (credit: DAVIS Challenge)

Montage of images from the DAVIS dataset (credit: DAVIS Challenge)

When starting in any new field, always the hardest thing to do is to just jump in and get started playing around. In deep learning, the first thing (and the linchpin, usually) is to look at the data, so we’ll want an organized way to load the image data and work with in our Python code.

The goal here is to get you from having a dataset to implementing a basic (but extensible) image processing pipeline that we can feed straight into Keras. Our working example will the DAVIS 2019 Challenge dataset, but this will apply to other image-based datasets…

FIRE Capital One Machine Learning is an Innovation and Research stream that provides undergrad students with authentic research experience in Machine Learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store