constituents
\(m\) data points \(\qty(x_i,y_{i})\)
requirements
we desire \(c_{j}\) such that:
\begin{equation} y = c_1 + c_{2} x + c_3 x^{2} + \dots \end{equation}
Given our set of basis functions \(\phi_{j}(x)\) for input \(x\), our goal is:
\begin{equation} y = c_1 \phi_{1} + c_2 \phi_{2} + \dots + c_{n}\phi_{n} \end{equation}
the \(\phi\) are the model function which determines our neural networks.
additional information
Monomial basis and vandermonde Matrix
to do this, we put stuff in matrix form following forms, called the matrix of monomial basis:
\begin{equation} \mqty(1 & x_1 & x_1^{2} \\ 1 & x_2 & x_2^{2} \\ & \dots &) \mqty(c_1 \\ c_2 \\ c_3) = \mqty(y_1 \\ y_2 \\ y_3) \end{equation}
inverting this matrix gives us the answer; this is the Vandermonde matrix. See problem with Vandermonde martix
Lagrange Basis
\begin{equation} \phi_{k} \qty(x) = \frac{\prod_{i\neq k}^{} x-x_{i}}{\prod_{i \neq k}^{} x_{k}- x_{i}} \end{equation}
notice this gives a \(A\) as an identity, but evaluation time is more expensive because you have to do all the multiplication in sequence.
This also has no problem of being ill-conditioned unlike the Vandermonde matrix.
However, each term is now quadratic.
Newton Basis
\begin{equation} \phi_{k} \qty(x) = \prod_{i=1}^{k-1} x - x_{i} \end{equation}
the first entry is \(1\), the second entry is quadratic, and so on. This is now only quadratic in one term up to \(k=3\).
overfitting
For \(m\) data points, you can draw a unique \(m-1\), polynomial which fits the lines exactly.
overfitting can occur, so we perform regularization
We do this as long as they are not degenerate: your points can’t be on a line.
problem with Vandermonde martix
at higher powers, the squared results tend to be more parallel: this is bad because then small parameter adjustments will require humongous parameter values
- and if two columns of the matrix are parallel, our rank would be at most \(n-1\)
- …meaning we don’t span \(\mathbb{R}^{n}\)
- …we may not have a solution! because some target output in \(\mathbb{R}^{n}\) may not be hit, or could be hit many times by manipulating the parallel vectors
in general: if any columns become linearly dependent, they maybe combined in infinite number of ways; that is, we want our Vandermode matrix to have full rank.
near-singular matrix problem
importantly: if its even close to being singular, we will have this problem.
- with limited precision, we will struggle when columns/linear combinations of columns are too close to being parallel
- they may not be computationally invertible
- condition numbers can help judge how close our matrix is about to be non-invertible