One alpha vector per action:
\begin{equation} \alpha^{(k+1)}_{a}(s) = R(s,a) + \gamma \sum_{o}^{} \max_{a’} \sum_{s’}^{} O(o|a,s’)T(s’|s,a) \alpha_{a’}^{k}(s’) \end{equation}
time complexity: \(O(|S|^{2}|A|^{2}|O|)\)
One alpha vector per action:
\begin{equation} \alpha^{(k+1)}_{a}(s) = R(s,a) + \gamma \sum_{o}^{} \max_{a’} \sum_{s’}^{} O(o|a,s’)T(s’|s,a) \alpha_{a’}^{k}(s’) \end{equation}
time complexity: \(O(|S|^{2}|A|^{2}|O|)\)