Come velocizzare il training delle reti neurali: PARTE II
Utilizzando la Batch Normalization, approfondita nel post precedente, l’effettivo risparmio, in termini di tempo, e il guadagno, in termini di performance, sono evidenti.
Di seguito riporto il codice utilizzato per implementare una rete convoluzionale, a 20 strati, con e senza Batch Normalization, insieme ai rispettivi risultati.
Come Benchmark ho utilizzato il popolare Mnist, un database di numeri scritti a mano: la rete impara dunque a riconoscere la scrittura umana e a classificare correttamente i numeri a cui viene esposta.
L’applicazione in se è molto semplice e una rete convoluzionale, per lo più a 20 strati, è eccessiva e la sua profondità non produce alcun vantaggio (si potrebbe tranquillamente utilizzare una semplice rete feedforward), se non quello di testare in modo robusto l’efficacia della Batch Normalization nel ridurre i tempi di training della rete e nel generare risultati superiori con architetture molto profonde.
L’architettura completa è disponibile nella mia repository github a questo link.
Ho implementato la rete in tensorflow quindi, per prima cosa, importo il framework. Tensorflow permette di scaricare il database mnist direttamente, quindi:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_datamnist = input_data.read_data_sets("MNIST_data/", one_hot=True, reshape=False)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
I numeri scritti a mano del Mnist appaiono cosi:

Esperimento Senza Batch Normalization
Ho implementato la rete neurale senza Batch Normalization, cosi da poter confrontare i risultati successivamente.
La funzione che segue, fully_connected, crea uno strato di “neuroni”, utilizzando l’API di tensorflow, con N unità provenienti dallo strato precedente, connesse a un numero “num_units” di unità successive. Ogni output passa attraverso una funzione Relu, che trasforma il valore come max(0, x): ovvero sceglie il valore massimo tra zero e il valore dell’output, quindi zero con output negativo e il valore di output altrimenti.
def fully_connected(prev_layer, num_units):
layer = tf.layers.dense(prev_layer, num_units, activation=tf.nn.relu)
return layerCon la funzione conv_layer si genera lo strato di convoluzione, ovvero si crea un filtro, la cui profondità in questo caso aumenta proporzionalmente all’aumentare del numero degli strati della rete stessa (parametro “layer_depth”). Lo scopo della funzione è quello di generare n filtri convoluzionali per verificare le performance della BN, pertanto la qualità con cui sono determinati i parametri della rete convoluzionale sono trascurabili.
def conv_layer(prev_layer, layer_depth):
strides = 2 if layer_depth % 3 ==0 else 1
layer = tf.layers.conv2d(prev_layer, layer_depth*4, 3, strides=strides, padding="same", activation= tf.nn.relu)
return layerIl training della rete viene implementato dalla funzione “train”, a cui vengono passati come parametri il tasso di apprendimento, la dimensione ed il numero dei batch.
La funziona genera due placeholder: due “segnaposto” per inputs e labels (comodissimi da utilizzare una volta compresa la struttura del grafo).
In sostanza l’inputs è un segnaposto che assume valori differenti nelle fasi di training, validation e testing: valore del Batch corrente, durante la fase di training, il valore del batch di validazione in fase di validazione della rete e, il valore dell’input di testing durante la fase di testing.
Vengono poi generati 20 strati convoluzionali la cui ‘layer_depth’ è determinata dalla sua profondità (layer_i);
L’output finale viene poi trasformato (con tf.reshape) moltiplicando la prima dimensione (non la dimensione zero) per la seconda per la terza (altezza, profondità e rgb, dove, in questo caso, rgb è un solo valore perchè le immagini sono in bianco e nero) cosi da essere connesso con lo strato feed forward (fully_connected).
L’output della rete è il valore della matrice “logits”. Logits non viene implementata con la funzione fully_connected per evitare di applicare la funzione di attivazione relu.
Si calcola quindi l’errore della rete utilizzando la cross_entropy e il gradiente con l’ottimizzatore Adam.
Ogni 25 batch si producono funzione costo ed accuratezza della fase di training, ogni 100 batch funzione costo ed accuratezza rispetto alle immagini di validazione ed, infine, si produce l’accuratezza in fase di testing con 100 immagini:
def train(num_batches, batch_size, learning_rate):
inputs = tf.placeholder(tf.float32, [None, 28,28,1])
labels = tf.placeholder(tf.float32, [None, 10])
#20 convolutional filters
layer = inputs
for layer_i in range(1, 20):
layer = conv_layer(layer, layer_i)
layer_shape= layer.get_shape().as_list()
layer = tf.reshape(layer, shape=[-1, layer_shape[1]*layer_shape[2]*layer_shape[3]])
#fully connected with relu activation
fc_layer = fully_connected(layer, 100)
#logits - fully connected without relu activation
logits = tf.layers.dense(fc_layer, 10)
model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_i in range(num_batches):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)# train this batch
sess.run(opt, {inputs: batch_xs,
labels: batch_ys})
if batch_i % 100 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
labels: mnist.validation.labels})
print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
elif batch_i % 25 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys})
print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))# At the end, score the final accuracy for both the validation and test sets
acc = sess.run(accuracy, {inputs: mnist.validation.images,
labels: mnist.validation.labels})
print('Final validation accuracy: {:>3.5f}'.format(acc))
acc = sess.run(accuracy, {inputs: mnist.test.images,
labels: mnist.test.labels})
print('Final test accuracy: {:>3.5f}'.format(acc))
correct = 0
for i in range(100):
correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
labels: [mnist.test.labels[i]]})print("Accuracy on 100 samples:", correct/100)num_batches = 800
batch_size = 64
learning_rate = 0.002tf.reset_default_graph()with tf.Graph().as_default():
train(num_batches, batch_size, learning_rate)Batch: 0: Validation loss: 0.69067, Validation accuracy: 0.09900
Batch: 25: Training loss: 0.36572, Training accuracy: 0.04688
Batch: 50: Training loss: 0.32663, Training accuracy: 0.14062
Batch: 75: Training loss: 0.32599, Training accuracy: 0.07812
Batch: 100: Validation loss: 0.32638, Validation accuracy: 0.11260
Batch: 125: Training loss: 0.32680, Training accuracy: 0.07812
Batch: 150: Training loss: 0.32655, Training accuracy: 0.06250
Batch: 175: Training loss: 0.32512, Training accuracy: 0.12500
Batch: 200: Validation loss: 0.32640, Validation accuracy: 0.09900
Batch: 225: Training loss: 0.32114, Training accuracy: 0.20312
Batch: 250: Training loss: 0.32470, Training accuracy: 0.12500
Batch: 275: Training loss: 0.32804, Training accuracy: 0.06250
Batch: 300: Validation loss: 0.32597, Validation accuracy: 0.09900
Batch: 325: Training loss: 0.32647, Training accuracy: 0.06250
Batch: 350: Training loss: 0.32461, Training accuracy: 0.15625
Batch: 375: Training loss: 0.32900, Training accuracy: 0.06250
Batch: 400: Validation loss: 0.32559, Validation accuracy: 0.09760
Batch: 425: Training loss: 0.32604, Training accuracy: 0.03125
Batch: 450: Training loss: 0.32605, Training accuracy: 0.07812
Batch: 475: Training loss: 0.32633, Training accuracy: 0.07812
Batch: 500: Validation loss: 0.32533, Validation accuracy: 0.11260
Batch: 525: Training loss: 0.32370, Training accuracy: 0.10938
Batch: 550: Training loss: 0.32568, Training accuracy: 0.14062
Batch: 575: Training loss: 0.32590, Training accuracy: 0.10938
Batch: 600: Validation loss: 0.32578, Validation accuracy: 0.09860
Batch: 625: Training loss: 0.32537, Training accuracy: 0.09375
Batch: 650: Training loss: 0.32583, Training accuracy: 0.10938
Batch: 675: Training loss: 0.32538, Training accuracy: 0.12500
Batch: 700: Validation loss: 0.32562, Validation accuracy: 0.10700
Batch: 725: Training loss: 0.32717, Training accuracy: 0.14062
Batch: 750: Training loss: 0.32713, Training accuracy: 0.07812
Batch: 775: Training loss: 0.32434, Training accuracy: 0.09375
Final validation accuracy: 0.11260
Final test accuracy: 0.11350
Accuracy on 100 samples: 0.14
Senza Batch Normalization l’accuratezza della rete è di appena il 14% !!!!
Implementando la BATCH NORMALIZATION:
Alto Livello
Implementazione di alto livello con l’API di tensorflow tf.layers.batch_normalization ;
con la funzione fully_connected si genera uno strato connesso di neuroni il cui valore di output viene normalizzato con la BN.
Alcune differenze rispetto alla funzione fully_connected senza batch_normalization:
- Lo strato fc non viene dotato dell’unità di bias perchè, quest’ultima, viene compresa dall’equazione gamma*x +beta
- La funzione di attivazione (relu, in questo caso) viene applicata successivamente alla BN (activation è impostato su “None”)
- Viene aggiunto il parametro is_training, in modo da comunicare alla rete quando si trova nella fase di training e quando nella fase di inferenza.
def fully_connected(prev_layer, num_units, is_training):
layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)
layer = tf.layers.batch_normalization(layer, epsilon=0.001, beta_initializer=tf.zeros_initializer(), gamma_initializer=tf.ones_initializer(),
training=is_training)
out = tf.nn.relu(layer)
return outNella funzione conv_layer, con batch_normalization:
- ho aggiunto il parametro is_training
- ho impostato use_bias su “False” e activation su “None” per le medesime ragioni elencate sopra
def conv_layer(prev_layer, layer_depth, is_training):
strides = 2 if layer_depth % 3 ==0 else 1
conv_layer = tf.layers.conv2d(prev_layer, layer_depth*4, 3, strides=strides, padding=’same’, activation= None, use_bias=False)
conv_layer = tf.layers.batch_normalization(conv_layer, training=is_training)
conv_layer = tf.nn.relu(conv_layer)
return conv_layerTraining Con la Batch Normalization
Le differenze rispetto alla funzione train senza BN:
- ho creato il placeholder is_training come variabile booleana così da poterla impostare su True in fase di addestramento e su False in fase di inferenza
- L’ottimizzatore Adam viene chiamato all’interno di tf.control_dependencies così che possa aggiornare le statistiche della popolazione (NB durante il testing gli output vengono normalizzati utilizzando media e varianza calcolati durante il training)
def train(num_batches, batch_size, learning_rate):
inputs = tf.placeholder(tf.float32, [None, 28,28,1])
labels = tf.placeholder(tf.float32, [None, 10])
is_training = tf.placeholder(tf.bool)
#20 convolutional filters
layer = inputs for layer_i in range(1, 20):
layer = conv_layer(layer, layer_i, is_training)
layer_shape= layer.get_shape().as_list()
layer = tf.reshape(layer, shape=[-1, layer_shape[1]*layer_shape[2]*
layer_shape[3]])
#fully connected with batch normalization
fc_layer = fully_connected(layer, 100, is_training)
#logits - fully connected without relu activation
logits = tf.layers.dense(fc_layer, 10)
model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits, labels=labels))
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
opt = tf.train.AdamOptimizer(
learning_rate).minimize(model_loss)
correct_prediction = tf.equal(
tf.argmax(logits,1), tf.argmax(labels,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_i in range(num_batches):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)# train this batch
sess.run(opt, {inputs: batch_xs,
labels: batch_ys, is_training:True})
if batch_i % 100 == 0:
loss, acc = sess.run([model_loss, accuracy],
{inputs: mnist.validation.images,
labels: mnist.validation.labels, is_training:False})
print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(
batch_i, loss, acc))
elif batch_i % 25 == 0:
loss, acc = sess.run([model_loss, accuracy],
{inputs: batch_xs, labels: batch_ys, is_training:False})
print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(
batch_i, loss, acc))# At the end, score the final accuracy for both the validation and test sets
acc = sess.run(accuracy, {inputs: mnist.validation.images,
labels: mnist.validation.labels, is_training:False})
print('Final validation accuracy: {:>3.5f}'.format(acc))
acc = sess.run(accuracy, {inputs: mnist.test.images,
labels: mnist.test.labels, is_training:False})
print('Final test accuracy: {:>3.5f}'.format(acc))
correct = 0
for i in range(100):
correct += sess.run(accuracy,feed_dict={inputs:
[mnist.test.images[i]],
labels:
[mnist.test.labels[i]],
is_training:False})print("Accuracy on 100 samples:", correct/100)num_batches = 800
batch_size = 64
learning_rate = 0.002tf.reset_default_graph()with tf.Graph().as_default():
train(num_batches, batch_size, learning_rate)Batch: 0: Validation loss: 0.69178, Validation accuracy: 0.09860
Batch: 25: Training loss: 0.60032, Training accuracy: 0.07812
Batch: 50: Training loss: 0.49489, Training accuracy: 0.12500
Batch: 75: Training loss: 0.41710, Training accuracy: 0.12500
Batch: 100: Validation loss: 0.37155, Validation accuracy: 0.11260
Batch: 125: Training loss: 0.34586, Training accuracy: 0.12500
Batch: 150: Training loss: 0.33138, Training accuracy: 0.15625
Batch: 175: Training loss: 0.35367, Training accuracy: 0.12500
Batch: 200: Validation loss: 0.35802, Validation accuracy: 0.14740
Batch: 225: Training loss: 0.25998, Training accuracy: 0.46875
Batch: 250: Training loss: 0.23626, Training accuracy: 0.51562
Batch: 275: Training loss: 0.18696, Training accuracy: 0.60938
Batch: 300: Validation loss: 0.15098, Validation accuracy: 0.75120
Batch: 325: Training loss: 0.07846, Training accuracy: 0.87500
Batch: 350: Training loss: 0.09178, Training accuracy: 0.82812
Batch: 375: Training loss: 0.09913, Training accuracy: 0.85938
Batch: 400: Validation loss: 0.06519, Validation accuracy: 0.90240
Batch: 425: Training loss: 0.06764, Training accuracy: 0.87500
Batch: 450: Training loss: 0.10690, Training accuracy: 0.90625
Batch: 475: Training loss: 0.01458, Training accuracy: 0.98438
Batch: 500: Validation loss: 0.03870, Validation accuracy: 0.94860
Batch: 525: Training loss: 0.04785, Training accuracy: 0.92188
Batch: 550: Training loss: 0.06010, Training accuracy: 0.92188
Batch: 575: Training loss: 0.04038, Training accuracy: 0.95312
Batch: 600: Validation loss: 0.08740, Validation accuracy: 0.89280
Batch: 625: Training loss: 0.05582, Training accuracy: 0.92188
Batch: 650: Training loss: 0.02143, Training accuracy: 0.98438
Batch: 675: Training loss: 0.01115, Training accuracy: 0.98438
Batch: 700: Validation loss: 0.06566, Validation accuracy: 0.91520
Batch: 725: Training loss: 0.08521, Training accuracy: 0.89062
Batch: 750: Training loss: 0.01286, Training accuracy: 0.98438
Batch: 775: Training loss: 0.03955, Training accuracy: 0.93750
Final validation accuracy: 0.95140
Final test accuracy: 0.95090
Accuracy on 100 samples: 0.99
La medesima rete convoluzionale, addestrata con lo stesso numero di batch, con la batch_normalization, raggiunge un’accuratezza del 99% (contro il 14% della rete senza BN)!
Basso Livello
Di seguito l’implementazione di basso livello con Tensorflow (ovviamente i risultati sono praticamente gli stessi):
def fully_connected(prev_layer, num_units, is_training):
#fc_layer with no bias and no activation in order to implement batch_normalization
fc_layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)
#beta and gamma are the parameters of the equation y= gamma*x + beta
beta = tf.Variable(tf.zeros([num_units]))
gamma = tf.Variable(tf.ones([num_units]))
#mean and variance of the population
pop_mean = tf.Variable(tf.zeros([num_units]), trainable=False)
pop_variance = tf.Variable(tf.ones([num_units]), trainable=False)
epsilon = 1e-3
def batch_norm_training():
decay= 0.99
#extracting mean and variance from the current batch using tf.moments
batch_mean, batch_variance = tf.nn.moments(fc_layer, [0])
#updating mean and variance for the inference phase
train_mean = tf.assign(pop_mean, (pop_mean * decay) + batch_mean*(1-decay))
train_variance = tf.assign(pop_variance, (pop_variance * decay) + batch_variance*(1-decay))
with tf.control_dependencies([train_mean, train_variance]):
return tf.nn.batch_normalization(fc_layer, mean=batch_mean, variance=batch_variance,
offset=beta, scale=gamma, variance_epsilon=epsilon)
def batch_norm_inference():
return tf.nn.batch_normalization(fc_layer, mean=pop_mean, variance=pop_variance, offset=beta,
scale=gamma, variance_epsilon=epsilon)
batch_norm_out = tf.cond(is_training, batch_norm_training, batch_norm_inference)
#returning the normalized value after the relu activation
return tf.nn.relu(batch_norm_out)Conv_layer
def conv_layer(prev_layer, layer_depth, is_training):
#as before
strides = 2 if layer_depth %3 ==0 else 1
#defining the filter/feature map
in_ch = prev_layer.get_shape().as_list()[3]
out_ch = layer_depth*4
weights = tf.Variable(tf.truncated_normal([3,3, in_ch, out_ch], stddev=0.05))
#conv layer
conv_layer = tf.nn.conv2d(prev_layer, weights, strides=[1,strides,strides,1], padding='SAME')
#as before
gamma = tf.Variable(tf.ones([out_ch]))
beta = tf.Variable(tf.zeros([out_ch]))
pop_mean = tf.Variable(tf.zeros([out_ch]), trainable=False)
pop_variance = tf.Variable(tf.ones([out_ch]), trainable=False)
epsilon= 1e-3
def batch_norm_training():
#as before
decay= 0.99
#extracting mean and variance from the current batch using tf.moments
batch_mean, batch_variance = tf.nn.moments(conv_layer, [0, 1, 2], keep_dims=False)
#updating mean and variance for the inference phase
train_mean = tf.assign(pop_mean, (pop_mean * decay) + batch_mean*(1-decay))
train_variance = tf.assign(pop_variance, (pop_variance * decay) + batch_variance*(1-decay))
with tf.control_dependencies([train_mean, train_variance]):
return tf.nn.batch_normalization(conv_layer, mean=batch_mean, variance=batch_variance,
offset=beta, scale=gamma, variance_epsilon=epsilon)
def batch_norm_inference():
return tf.nn.batch_normalization(conv_layer, mean=pop_mean, variance=pop_variance, offset=beta,
scale=gamma, variance_epsilon=epsilon)
batch_norm_out = tf.cond(is_training, batch_norm_training, batch_norm_inference)
#returning the normalized value after the relu activation
return tf.nn.relu(batch_norm_out)Training..
ldef train(num_batches, batch_size, learning_rate):
inputs = tf.placeholder(tf.float32, [None, 28,28,1])
labels = tf.placeholder(tf.float32, [None, 10])
is_training = tf.placeholder(tf.bool)
#20 convolutional filters
layer = inputsfor layer_i in range(1, 20):
layer = conv_layer(layer, layer_i, is_training)
layer_shape= layer.get_shape().as_list()
layer = tf.reshape(layer, shape=[-1, layer_shape[1]*layer_shape[2]*layer_shape[3]])
#fully connected with batch normalization
fc_layer = fully_connected(layer, 100, is_training)
#logits - fully connected without relu activation
logits = tf.layers.dense(fc_layer, 10)
model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_i in range(num_batches):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)# train this batch
sess.run(opt, {inputs: batch_xs,
labels: batch_ys, is_training:True})
if batch_i % 100 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
labels: mnist.validation.labels, is_training:False})
print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
elif batch_i % 25 == 0:
loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys, is_training:False})
print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))acc = sess.run(accuracy, {inputs: mnist.validation.images,
labels: mnist.validation.labels, is_training:False})
print('Final validation accuracy: {:>3.5f}'.format(acc))
acc = sess.run(accuracy, {inputs: mnist.test.images,
labels: mnist.test.labels, is_training:False})
print('Final test accuracy: {:>3.5f}'.format(acc))
correct = 0
for i in range(100):
correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
labels: [mnist.test.labels[i]], is_training:False})print("Accuracy on 100 samples:", correct/100)num_batches = 800
batch_size = 64
learning_rate = 0.002tf.reset_default_graph()with tf.Graph().as_default():
train(num_batches, batch_size, learning_rate)Batch: 0: Validation loss: 0.69078, Validation accuracy: 0.10700
Batch: 25: Training loss: 0.58244, Training accuracy: 0.10938
Batch: 50: Training loss: 0.47270, Training accuracy: 0.04688
Batch: 75: Training loss: 0.39788, Training accuracy: 0.07812
Batch: 100: Validation loss: 0.36073, Validation accuracy: 0.10700
Batch: 125: Training loss: 0.35411, Training accuracy: 0.03125
Batch: 150: Training loss: 0.32505, Training accuracy: 0.14062
Batch: 175: Training loss: 0.35393, Training accuracy: 0.10938
Batch: 200: Validation loss: 0.40405, Validation accuracy: 0.11260
Batch: 225: Training loss: 0.47686, Training accuracy: 0.20312
Batch: 250: Training loss: 0.68870, Training accuracy: 0.10938
Batch: 275: Training loss: 0.41925, Training accuracy: 0.15625
Batch: 300: Validation loss: 0.82895, Validation accuracy: 0.11440
Batch: 325: Training loss: 0.76750, Training accuracy: 0.07812
Batch: 350: Training loss: 0.46322, Training accuracy: 0.28125
Batch: 375: Training loss: 0.39270, Training accuracy: 0.37500
Batch: 400: Validation loss: 0.39302, Validation accuracy: 0.36840
Batch: 425: Training loss: 0.41512, Training accuracy: 0.40625
Batch: 450: Training loss: 0.36016, Training accuracy: 0.54688
Batch: 475: Training loss: 0.58151, Training accuracy: 0.46875
Batch: 500: Validation loss: 0.18255, Validation accuracy: 0.72000
Batch: 525: Training loss: 0.25641, Training accuracy: 0.67188
Batch: 550: Training loss: 0.13348, Training accuracy: 0.79688
Batch: 575: Training loss: 0.26048, Training accuracy: 0.67188
Batch: 600: Validation loss: 0.05418, Validation accuracy: 0.91400
Batch: 625: Training loss: 0.03811, Training accuracy: 0.95312
Batch: 650: Training loss: 0.03032, Training accuracy: 0.95312
Batch: 675: Training loss: 0.02503, Training accuracy: 0.96875
Batch: 700: Validation loss: 0.02738, Validation accuracy: 0.96080
Batch: 725: Training loss: 0.02848, Training accuracy: 0.96875
Batch: 750: Training loss: 0.01631, Training accuracy: 0.93750
Batch: 775: Training loss: 0.03970, Training accuracy: 0.93750
Final validation accuracy: 0.94620
Final test accuracy: 0.95230
Accuracy on 100 samples: 0.98