Input: image
Output: a label (i.e., cat, dog, hotdog) or probabilities across labels
We give an algorithm a dataset that includes the right answers (a training set) and it learns a function that approximates the data.
After that, it can make predictions on new (but similar) data that it’s never seen.
Convolutional Neural Networks 💥
But first, let’s go over neural nets...
def unit(inputs, weights, bias):
return activation_function(np.dot(inputs, weights) + bias)
aka ConvNets aka CNNs
A powerful hammer for computer vision nails
Very similar to ordinary neural nets but with an architecture better suited to handle image inputs
3 main pieces:
We take small filters and slide them over the image spatially. Different filters respond to different things in the image. Some may like edges, others may prefer yellow regions, etc.
A high-level neural network library, written in Python
Wraps an API similar to scikit-learn
around the Theano or TensorFlow backend.
Modular and user friendly -- easy to construct new models and/or leverage pretrained ones
VGG16 -- a 16 layer ConvNet trained on ImageNet data, ~140M parameters, runner-up in 2014
vgg = keras.applications.VGG16(weights="imagenet", include_top=True)
img = get_image("kitty.jpg")
predictions = vgg.predict(img)
for p in decode_predictions(predictions)[0]:
print("✨ {} (prob = {:0.3f})".format(p[1], p[2]))
✨ Egyptian_cat (prob = 0.213)
✨ kit_fox (prob = 0.213)
✨ tabby (prob = 0.138)
✨ red_fox (prob = 0.107)
✨ tiger_cat (prob = 0.098)
In general, this refers to the process of leveraging the knowledge learned in one model for the training of another model.
More specifically, we’ll load a state-of-the-art model (VGG16), sub out the last classifier layer, and use the rest of the ConvNet as a fixed feature extractor for our new dataset.
# new softmax layer with number of classes in our dataset
new_classification_layer = Dense(num_classes, activation="softmax")
# connect new layer to the second to last layer in VGG, and make ref to it
out = new_classification_layer(vgg.layers[-2].output)
# create new network between VGG’s input layer and (new) output
model_new = Model(vgg.input, out)
# make all layers (except last) untrainable by freezing weights
for layer in model_new.layers[:-1]:
layer.trainable = False
# ensure the last layer is trainable (not frozen)
model_new.layers[-1].trainable = True
# compile and fit model
model_new.compile(loss="categorical_crossentropy", optimizer="adadelta", metrics=["accuracy"])
model_new.fit(x_train, y_train, batch_size=128, epochs=50, validation_data=(x_val, y_val))
Stanford’s Convolutional Neural Networks for Visual Recognition course notes
Practical Deep Learning For Coders (by fast.ai)
#1 suggestion (if I may) -- get your hands dirty with a toy project and fill in the gaps in your knowledge along the way.
The secret of getting ahead is getting started.