Demo

  Last updated: May 2nd, 2020

Quick Note

Though they are not strictly connected, I recommend viewing the 4 sections in order to know how to not only train a model, but to make your own custom training procedure.

Building Your Model

source

1. Import

Importing batch_norm would also recursively import all of it's dependencies. Reducing the need of having many import statements.

from batch_norm import *

2. Layers

Building a model is incredibly simple. The comments in the snippet are the output dimensions of each layer for clarity. Now that we have a model, it will be used in the two final sections of this demo.

model = Sequential(Reshape((1, 28, 28)),
                   Conv(c_in=1, c_out=4, k_s=5, stride=2, pad=1), # 4, 13, 13
                   AvgPool(k_s=2, pad=0), # 4, 12, 12
                   BatchNorm(4),
                   Conv(c_in=4, c_out=16, stride=2, leak=1.), # 16, 5, 5
                   BatchNorm(16),
                   Flatten(),
                   Linear(400, 64), # 16 * 5 * 5 -> 400
                   ReLU(),
                   Linear(64, 10, True))

3. Display Model

Custom __repr__ methods let classes to be neatly displayed. It also works for nested models as shown in this notebook.

model
(Model)
    Reshape(1, 28, 28)
    Conv(1, 4, 5, 2)
    AvgPool(2, 1)
    BatchNorm()
    Conv(4, 16, 3, 2)
    BatchNorm()
    Flatten()
    Linear(400, 64)
    ReLU()
    Linear(64, 10)

4. Display Parameters

Printing out parameters' information is also useful for debugging. Here is how to print out all model parameters.

for p in model.parameters():
    print(p)
shape: (4, 1, 5, 5), grad: True
shape: (4,), grad: True
shape: (1, 4, 1, 1), grad: True
shape: (1, 4, 1, 1), grad: True
shape: (16, 4, 3, 3), grad: True
shape: (16,), grad: True
shape: (1, 16, 1, 1), grad: True
shape: (1, 16, 1, 1), grad: True
shape: (400, 64), grad: True
shape: (64,), grad: True
shape: (64, 10), grad: True
shape: (10,), grad: True

If you want to only look into a selected layer, here it is how.

print(f'layer2: {model.layers[1]}\n')
for p in model.layers[1].parameters():
    print(p)
layer2:     Conv(1, 4, 5, 2)

shape: (4, 1, 5, 5), grad: True
shape: (4,), grad: True

Making a Callback

source

1. Import

from callback import *

2. Making the Callback

I decided to use LearningRateSearch as our example since it will be used in the next section of this demo. As shown, before each batch, the callback would try a new learning rate while keeping track of the best learning rate (measured by loss) so far and automatically stop training after loss increases 10x or we have tried enough learning rates.

class LearningRateSearch(Callback):
    def __init__(self, max_iter=1000, min_lr=1e-4, max_lr=1):
        self.max_iter = max_iter # max number of candidates learning rates to try
        self.min_lr = min_lr  # lowest/starting candidate learning rate
        self.max_lr = max_lr  # highest candidate learning rate
        self.cur_lr = min_lr  # current candidate learning rate holder 
        self.best_lr = min_lr # recorded learning rate with the lowest loss
        self.best_loss = float('inf') # lowest loss so far
        
    def before_batch(self): 
        # assert training state
        if not self.model.training: return
        # calculate new candidate learning rate
        position = self.iters_count / self.iters
        self.cur_lr = self.min_lr * (self.max_lr/self.min_lr)**position
        # set learning rate in optimizer
        self.optimizer.hyper_params['learning_rate'] = self.cur_lr
        
    def after_step(self):
        # stop when either tried enough times or loss starts increasing
        if self.iters_count >= self.max_iter or self.loss > self.best_loss*10:
            raise CancelTrainException()
        # update best loss and best learning rate
        if self.loss < self.best_loss:
            self.best_loss = self.loss
            self.best_lr = self.cur_lr

Model Training

source

1. Import

Same as previous sections, import modules, then grab data bunch and loss function.

from stateful_optim import *

2. Data Bunch, Loss Function

data_bunch = get_data_bunch(*get_mnist_data(), batch_size=64)
loss_fn = CrossEntropy()

3. Cosine Parameter Scheduling

New callback alert! Please refer to the code documentation to familiarize with the ParamScheduler callback. Here we build a custom cosine schedule for the learning rate that takes place each epoch using the learning rate from the last section.

schedule = combine_schedules([0.4, 0.6], one_cycle_cos(lr/3, lr*3, lr/3))

4. Model, Adam, Callbacks, Learner

Same as before, create model, optimizer, and callbacks. Notice that LearningRateSearch is no longer needed and that the ParamScheduler is now used for training with dynamic learning rate per epoch. I also added StatsLogging to print out loss and accuracy per epoch.

model = get_conv_final_model(data_bunch)
optimizer = adam_opt(model, learning_rate=lr, weight_decay=1e-4)
callbacks = [ParamScheduler('learning_rate', schedule), StatsLogging(), Recorder()]
learner = Learner(data_bunch, model, loss_fn, optimizer, callbacks)
print(learner)
(DataBunch) 
    (DataLoader) 
        (Dataset) x: (50000, 784), y: (50000,)
        (Sampler) total: 50000, batch_size: 64, shuffle: True
    (DataLoader) 
        (Dataset) x: (10000, 784), y: (10000,)
        (Sampler) total: 10000, batch_size: 128, shuffle: False
(Model)
    Reshape(1, 28, 28)
    Conv(1, 4, 5, 2)
    AvgPool(2, 1)
    BatchNorm()
    Conv(4, 16, 3, 2)
    BatchNorm()
    Flatten()
    Linear(400, 64)
    ReLU()
    Linear(64, 10)
(CrossEntropy)
(StatefulOpt) steppers: ['adam', 'l2_reg'], stats: ['ExpWeightedGrad', 'ExpWeightedSqrGrad', 'StepCount']
(Callbacks) ['TrainEval', 'ParamScheduler', 'StatsLogging', 'Recorder']

5. Train Model

Training is as simple as just calling the fit method with number of epochs. As shown below, the validation accuracy rises to 97.3 in just 3 epochs.

learner.fit(3)
Epoch - 1
train metrics - [5.624208450317383e-06, 0.89082]
valid metrics - [2.2692584991455077e-05, 0.9622]

Epoch - 2
train metrics - [6.513986587524414e-06, 0.95746]
valid metrics - [2.0359230041503905e-05, 0.9706]

Epoch - 3
train metrics - [6.942987442016601e-06, 0.96688]
valid metrics - [1.5890932083129882e-05, 0.9731]

6. Loss and Learning Rate

Lastly, plot the loss and learning rate recorded by the Recorder callback.

learner.callbacks[3].plot_losses()
screenshot

As shown below, the learning rate values make a cosine-ish cycle each epoch, showing that our ParamScheduler callback is working properly.

learner.callbacks[3].plot_parameter('learning_rate')
screenshot