Model Capacity and Overfitting

Regularization

add penalty terms P(W)
$$
L_r(\theta)=L(\theta)+P(W)
$$

stop when testing error begins to increase

easy for classification, modify input sample a bit without changing class label

During each minibatch, randomly set the output of some neurons in layer l to zero.

Grid search, try and error. Require a validation set.