To evaluate the lower bound:
\begin{equation} \alpha_{a}^{k+1} (s) = R(s,a) + \gamma \sum_{s’}^{} T(s’|s,a) \alpha_{a}^{k}(s’) \end{equation}
we are essentially sticking with an action and do conditional plan evaluation of a policy that do one action into the future