Vanishing vs Exploding Gradient

ParameterVanishing GradientExploding GradientProblem statementWhen the gradient (rate of change of loss wrt to weight) is small, there is no training signal for the network to learn. Therefore the weighs wont update correctly in the subsequent iterations.When the gradient is large, the network gets too much unncessary information contributing to the update of the weights. The … Continue reading Vanishing vs Exploding Gradient