polynomial interpolation

constituents

\(m\) data points \(\qty(x_i,y_{i})\)

requirements

we desire \(c_{j}\) such that:

\begin{equation} y = c_1 + c_{2} x + c_3 x^{2} + \dots \end{equation}

Given our set of basis functions \(\phi_{j}(x)\) for input \(x\), our goal is:

\begin{equation} y = c_1 \phi_{1} + c_2 \phi_{2} + \dots + c_{n}\phi_{n} \end{equation}

the \(\phi\) are the model function which determines our neural networks.

additional information

Monomial basis and vandermonde Matrix

to do this, we put stuff in matrix form following forms, called the matrix of monomial basis:

\begin{equation} \mqty(1 & x_1 & x_1^{2} \\ 1 & x_2 & x_2^{2} \\ & \dots &) \mqty(c_1 \\ c_2 \\ c_3) = \mqty(y_1 \\ y_2 \\ y_3) \end{equation}

inverting this matrix gives us the answer; this is the Vandermonde matrix. See problem with Vandermonde martix

Lagrange Basis

\begin{equation} \phi_{k} \qty(x) = \frac{\prod_{i\neq k}^{} x-x_{i}}{\prod_{i \neq k}^{} x_{k}- x_{i}} \end{equation}

notice this gives a \(A\) as an identity, but evaluation time is more expensive because you have to do all the multiplication in sequence.

This also has no problem of being ill-conditioned unlike the Vandermonde matrix.

However, each term is now quadratic.

Newton Basis

\begin{equation} \phi_{k} \qty(x) = \prod_{i=1}^{k-1} x - x_{i} \end{equation}

the first entry is \(1\), the second entry is quadratic, and so on. This is now only quadratic in one term up to \(k=3\).

overfitting

For \(m\) data points, you can draw a unique \(m-1\), polynomial which fits the lines exactly.

overfitting can occur, so we perform regularization

We do this as long as they are not degenerate: your points can’t be on a line.

problem with Vandermonde martix

at higher powers, the squared results tend to be more parallel: this is bad because then small parameter adjustments will require humongous parameter values

and if two columns of the matrix are parallel, our rank would be at most \(n-1\)
…meaning we don’t span \(\mathbb{R}^{n}\)
…we may not have a solution! because some target output in \(\mathbb{R}^{n}\) may not be hit, or could be hit many times by manipulating the parallel vectors

in general: if any columns become linearly dependent, they maybe combined in infinite number of ways; that is, we want our Vandermode matrix to have full rank.

near-singular matrix problem

importantly: if its even close to being singular, we will have this problem.

with limited precision, we will struggle when columns/linear combinations of columns are too close to being parallel
they may not be computationally invertible
condition numbers can help judge how close our matrix is about to be non-invertible