Houjun Liu

best-action worst-state

best-action worst-state is a lower bound for alpha vectors:

\begin{equation} r_{baws} = \max_{a} \sum_{k=1}^{\infty} \gamma^{k-1} \min_{s}R(s,a) \end{equation}

The alpha vector corresponding to this system would be the same \(r_{baws}\) at each slot.

which should give us the highest possible reward possible given we always pick the most optimal actions while being stuck in the worst state