regularization penalize large weights to reduce overfitting
- create data interpolation that countains intentional error (by throwing away/hiding parameters), missing some/all of the data points
- this makes the resulting function more “predictable”/“smooth”
there is, therefore, a trade-off between sacrificing quality and on the ORIGINAL data and better accuracy on new points. If you regularize too much, you will underfit.
For instance, in a linear model:
\begin{equation} \min_{\theta} |y - X \theta|_{2}^{2} + \lambda | \theta |_{2}^{2} \end{equation}
This gives a analytical form:
\begin{equation} \theta = \qty(B^{\top}B + \lambda I)^{-1} B^{\top} y \end{equation}
Or, you can write the…
Lasso
The lasso uses an $L$-1 norm on the weights
\begin{equation} \min_{\theta} |y - X \theta|_{2}^{2} + \lambda | \theta |_{1}^{2} \end{equation}
which will downselect weights that are not useful.
Which has no closed form.