SARSOP

Big problem: curse of dimensionality and the curse of history.

PBVI and HSVI tries to sample the belief simplex generally. But instead we should try to sample OPTIMAL REACHABLE SET.

Background

Recall one-step lookahead in POMDP. The difficulty here is that the sum over all of the alpha-vectors is still very hard. So, in PBVI, we only do this to a small set of beliefs

SARSOP

sample \(R^{*}\)
backup
prune

Initialization

choose an initial belief, action, and observation using “suitable heuristics”. Initialize a set of alpha vectors corresponding to this belief.

Sampling

compute \(b’ = update(b,a,o)\)
add node \(b’\) to the tree

So far, this is just PBVI, HSVI. The point is that we only want to update the reachable set.

To do this, we now take the new \(b’\), we give an upper bound via FIB, and a lower bound with blind lower bound over the alpha vectors you already got.

Now:

where \(\mathcal{R}^{*}\) is a reachable space tree set from \(b_0\).

Backup

PBVI Backup on the beliefs you sampled to update your alpha vectors.

Pruning

We can prune anything that’s suboptimal: every step, we perform alpha vector pruning at every step.

Limitations

HSVI is better at handling systems with lower uncertainty.

Does not make an attempt at challenges of dimensionality
Make unproven theoretical claims
Don’t compare to domain contraction
Compare algorithm to a single alternative
Compared to continuous state spaces
Subsection headings