Model Training

What is iteration, epoch, batch size, step size?

A Forward Backward and forward pass makes together one iteration.
During one iteration we can either pass a subset of dataset or “mini-batch”or the entire dataset “batch”.
One full pass through the entire dataset(full batch or in mini-batches) is known as an epoch. One epoch contains (number_of_items / batch_size) iterations.
Step size

https://stackoverflow.com/questions/36740533/what-are-forward-and-backward-passes-in-neural-networks

https://spell.ml/blog/lr-schedulers-and-adaptive-optimizers-YHmwMhAAACYADm6F

What is the effect of varying batch size on asymptotic test accuracy?

Why using mini-batch gradient descent is a good choice?

What does a large gap between training and validation graph signifies?

If there’s a big gap between the training and the validation curves, clearly the model strongly overfits.

Reasons we get NaN in loss values when training a neural network

How do you decide the learning rate for training a model?

How to speed up your learning?

(From Coursera Deep Learning Sp)

Use Learning rate decay!

The idea is as you move towards convergence, using a fixed learning rate, you might wander around the optima but may not reach it so reducing the LR can help converge.

During the initial phase, when LR is large, you can still have relatively fast learning, meaning that during initial steps of learning you can afford to take larger steps but as and when you approach convergence, then having a lower learning rate can allow you to take smaller steps

Decay Rate is a HYPERPARAMETER that one needs to choose just like the learning rate.

Vanishing or Exploding Gradients

can be improved by careful weight initialization - partial solution