Does not fit training set well on cost function?
Does not fit dev set well on cost function?
Does not fit test set well on cost function?
Does not perform well in real world?
Big gap between Human error and Training error, focus on reducing bias.
Small gap between Human and Training error, doing fine on the Training set, want to reduce variance between Training and Dev sets.
Typically human error is close to Bayes error.
Output size is given by ⌊(n - f + 2p)/s⌋ + 1
Common settings
Motivation
1×1 convolutional filters can be used to shrink the color dimension of an input volume. Say input volume is 28x28x192 and filter 1x1x32 outputs 28x28x32 1×1 convolutions can decrease the number of required arithmetic operations (bottleneck)1x1x400 volume.When one needs to compare two images (face verification, logo verification) similarity function can be trained once without a need to retrain the network when a new image enters the database.
d(img1, img2) = degree of difference between images
Siemese network obtains encodings f(x_i). Goal is to learn parameters such that
x_i s.t x_i, x_j are the same person then ||f(x_i) - f(x_j)|| is smallx_i s.t x_i, x_j are different person then ||f(x_i) - f(x_j)|| is largeTraining could be done using Triplet Loss which has objective for anchor image A, positive image P and negative image N
'Naively: d(A,P) < d(A,N)
For technical reasons to avoid getting f(Y)=0, we introduce margin s.t. d(A,P) - d(A,N) + a < 0 where a is margin.
Given A,P,N can construct loss L(A,P,N) = max(d(A,P) - d(A,N) + a < 0',0) . The total loss is sum over all training examples required pairs A,P. The idea is that as long you manage to get d(A,P) - d(A,N) + a < 0 the Loss is 0, otherwise the Loss is positive (not good).
High bias solutions (underfitting training set)
High variance solutions (overfitting training set, low dev set accuracy)
Regularization techniques:
Vanishing and exploding gradients can be attacked via appropriate weight initialization. Such as Ho initialization.
from collections import OrderedDict from collections import namedtuple from itertools import product class RunBuilder(): @staticmethod def get_runs(params): Run = namedtuple('Run', params.keys()) runs = [Run(*v) for v in product(*params.values())] return runs params = OrderedDict( lr = [0.01, 0.001], batch_size = [100, 1000] ) for run in RunBuilder.get_runs(params): print(f"{run}, {run.lr}") # training
from sklearn.metrics import confusion_matrix cmt = confusion_matrix(targets, predictions)
import itertools import numpy as np import matplotlib.pyplot as plt def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix', cmap=plt.cm.Blues): if normalize: cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] print("Normalized confusion matrix") else: print('Confusion matrix, without normalization') print(cm) plt.imshow(cm, interpolation='nearest', cmap=cmap) plt.title(title) plt.colorbar() tick_marks = np.arange(len(classes)) plt.xticks(tick_marks, classes, rotation=45) plt.yticks(tick_marks, classes) fmt = '.2f' if normalize else 'd' thresh = cm.max() / 2. for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])): plt.text(j, i, format(cm[i, j], fmt), horizontalalignment="center", color="white" if cm[i, j] > thresh else "black") plt.tight_layout() plt.ylabel('True label') plt.xlabel('Predicted label')
names = ( 'T-shirt/top' ,'Trouser' ,'Pullover' ,'Dress' ,'Coat' ,'Sandal' ,'Shirt' ,'Sneaker' ,'Bag' ,'Ankle boot' ) plt.figure(figsize=(10,10)) plot_confusion_matrix(cmt, names)