Glut1 is the gene corresponding to the GLUT1 protein, the first glucose transporter to be characterized.
We have two types of mice that we use for experiments.
- Wild Type (WT)
- Knock Out (KO)
Wild Type mice will produce the GLUT1 protein whereas the Knock Out mice will not.
What is Genotyping?
Genotyping is the technique that allows researchers to differentiate between the genetic differences between samples.
We can use genotyping to find which mice are WTs and KOs.
For my NYCDSA Deep Learning project, I created a CNN network to classify between WTs and KOs.
Step 1: Create directory Structure
Training Set = 48
Validation Set = 15
Test set = 5
Step 2: Import Dependencies
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"]="3"
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
Step 3: Path for Data
train_data_dir = ‘data/train’
validation_data_dir = ‘data/validation’
test_data_dir = ‘data/test’
Step 4: Set Image Dimensions
img_width, img_height = 30, 120
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
Step 6: Image Augmentation Configuration
train_datagen = ImageDataGenerator(
rescale= 1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True) test_datagen = ImageDataGenerator(
rescale=1. / 255)
Step 7: Separate Image into Classes
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=1,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=1,
class_mode='binary')
test_generator = test_datagen.flow_from_directory(
test_data_dir,
target_size=(img_width, img_height),
batch_size=1,
class_mode='binary')
Step 8: Create CNN Network
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation(‘relu’))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Conv2D(32, (3, 3)))
model.add(Activation(‘relu’))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Conv2D(64, (3, 3)))
model.add(Activation(‘relu’))
model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Flatten())
model.add(Dense(64))
model.add(Activation(‘relu’))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation(‘sigmoid’))model.summary()
Step 9: Compile
model.compile(loss=’binary_crossentropy’,
optimizer=’rmsprop’,
metrics=[‘accuracy’])
Step 10: Fitting/Training the Model
STEP_SIZE_TRAIN=train_generator.n//train_generator.batch_size
STEP_SIZE_VALID=validation_generator.n//validation_generator.batch_size
model.fit_generator(generator=train_generator,
steps_per_epoch=STEP_SIZE_TRAIN,
validation_data=validation_generator,
validation_steps=STEP_SIZE_VALID,
epochs=10
)
Step 11: Evaluate the Model
model.evaluate_generator(generator=validation_generator, steps = 15)
Step 12: Predict the Output
test_generator.reset() pred=model.predict_generator(test_generator, steps = 5)
Step 13: Export Results into a csv.file
import numpy as np
predicted_class_indices=np.argmax(pred,axis=1)labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())
predictions = [labels[k] for k in predicted_class_indices]
import pandas as pd
filenames=test_generator.filenames
results=pd.DataFrame({"Filename":filenames,
"Predictions":predictions})
results.to_csv("results.csv",index=False)
Results!
Training Set:
Validation:
Test:
Future Directions
- To inspect a few examples of correctly/incorrectly classified images
- Continue to add more images to data set
End!!
I would like thank Dr. Cho, Jon Krohn, Alex Choy and Hyun Min Park for their help on the project.
Github
FloydHub
https://www.floydhub.com/hankkim7012/projects/glut1ko/workspaces
Reference
https://github.com/the-deep-learners/nyc-ds-academy/blob/master/notebooks/deep_net_in_keras.ipynb
https://github.com/keras-team/keras/blob/master/keras/preprocessing/image.py