Preference Elicitation
For the Weight Method, for instance, we need to figure a \(w\) such that:
\begin{equation} f = w^{\top}\mqty[f_1 \\ \dots\\f_{N}] \end{equation}
where weight \(w \in \triangle_{N}\).
To do this, we essentially infer the weighting scheme by asking “do you like system \(a\) or system \(b\)”.
- first, we collect a series of design variables \((a_1, a_2, a_3 …)\) and \((b_1, b_2, b_3…)\) and we ask “which one do you like better”
- say our user WLOG chose \(b\) over \(a\)
- so we want to design a \(w\) such that \(w^{\top} a < w^{\top} b\)
- meaning, we solve for a \(w\) such that…
\begin{align} \min_{w}&\ \sum_{i=1}^{n} (a_{i}-b_{i})w^{\top} \\ \text{such that}&\ \bold{1}^{\top} w = 1 \\ &\ w \geq 0 \end{align}
unlike the rest of everything, we are MAXIMIZING here idk why
Sampling Plans
Many methods requires knowing a series of samples of the objective value to calculate local model or population methods, so…
Full Factorial
Grid it up.
- easy to implement
- good results
- bad: sample count grows exponentially with dimension
Random Sampling
Use a pseudorandom generator to pick points in your space.
- allows for any number of evaluations you specify
- statistically, the points clump when you do this!
- also need lots of samples to get good coverage
Uniform Projection
We take each point, and uniformly project it onto each dimension. To implement this, we grid up each dimension and shuffle the ordering of each dimension individually. Then, we read off the coordinates to create the points:
# in d3...
seq = range(axis_min, axis_max)
d1 = random.shuffle(seq)
d2 = random.shuffle(seq)
d3 = random.shuffle(seq)
sampling_points = zip(d1, d2, d3)
Stratified Sampling
- perform Uniform Projection
- within each grid, make smaller grids and perform within them Uniform Projection again
Space-Filling Metrics
Pairwise Distances
This requires each set to have the same number of points
- figure the euclidian distance between every pair of points
- for each set of pairs, figure the closest together points, and call that the “pairwise distance” of the set
Limitation: if there are just two points that are close together, this metric scores it worse. So maybe Morris-Mitchell.
Morris-Mitchell
We have a hype-parameter \(q\), which checks all of the possible norms to use between points. Consider \(d_{i}\) to be the ith-pairwise distance between the points with the Lp Norm for your choice of \(p\). Then, for:
\begin{equation} \Phi_{q}(X) = \qty(\sum_{i}^{}d_{i}^{-q})^{\frac{1}{q}} \end{equation}
and we try to solve for our set of points \(X\) such that:
\begin{equation} \min_{X} \max_{q \in \{1,2,5,10,20,50,100\}} \Phi_{q}(X) \end{equation}
“minimize the distance at the worst \(q\) possible norm”
Space-Filling Subset
A Space-Filling Subset is a subset \(S\) of a point set \(X\) which minimizes the maximum distance between a point in \(S\) and its closest point in \(X\) (i.e. making \(S\) a good representative of \(X\)).
\begin{equation} d_{\max}(X,S) = \max_{x \in X} \min_{s \in S} |s -x|_{p} \end{equation}
we can choose any \(p\) norm you’d like.
greedy local search
Choosing one best point to add to \(S\) which maximize \(d_{\max}\), and then choose another point, and another one, …
exchange algorithm
randomly initialize \(S\), and swap points within \(S\) and only in \(X\)
Surrogate Models
Once we finished sampling, we need to create a model parameterized by \(\theta\) which minimizes the error. In particular, we want to choose \(\theta\) such that:
\begin{equation} \min_{\theta} |y - \hat{y}|_{p} \end{equation}
for some model \(\hat{y}(x)\), with matching actual result \(y\).
Linear Model
\begin{equation} \hat{f} = w_0 + \bold{w}^{\top}\bold{x} \end{equation}
whereby, we now want:
\begin{equation} \min_{\theta} |y - X \theta|_{2}^{2} \end{equation}
this is CLOSE FORM! by applying the pseudo-inverse:
\begin{equation} \theta = X^{\dagger} y \end{equation}
Basis Functions
We can make this slighlty non-linear by computing some non-zero \(B(x)\) where a set of these basis functions all taking \(x\) as input (for instance, terms of a polynomial) \(\bold{B}(\bold{x})\) is then used for optimization:
\begin{equation} \min_{\theta} |y - B \theta|_{2}^{2} \end{equation}
Radial Basis Functions
\begin{equation} \psi(x,c) = \psi(|x - c|) \end{equation}
a radial basis function is a basis function that acts on the distance to a local point. You can choose any kernel \(\psi\) you’d like.
Regularization
Especially for noisy things, you ideally want some kind of regularization: see Regularization
L2 Regularization
\begin{equation} \min_{\theta} || y - B(x)\theta ||_{2}^{2} + \lambda || \theta ||^{2}_{2} \end{equation}
this is kind of a like a multi-objective Weight Method.