Big problem: curse of dimensionality and the curse of history.
PBVI and HSVI tries to sample the belief simplex generally. But instead we should try to sample OPTIMAL REACHABLE SET.
Background
Recall one-step lookahead in POMDP. The difficulty here is that the sum over all of the alpha-vectors is still very hard. So, in PBVI, we only do this to a small set of beliefs
SARSOP
- sample \(R^{*}\)
- backup
- prune
Initialization
choose an initial belief, action, and observation using “suitable heuristics”. Initialize a set of alpha vectors corresponding to this belief.
Sampling
- compute \(b’ = update(b,a,o)\)
- add node \(b’\) to the tree
So far, this is just PBVI, HSVI. The point is that we only want to update the reachable set.
To do this, we now take the new \(b’\), we give an upper bound via FIB, and a lower bound with blind lower bound over the alpha vectors you already got.
Now:
where \(\mathcal{R}^{*}\) is a reachable space tree set from \(b_0\).
Backup
PBVI Backup on the beliefs you sampled to update your alpha vectors.
Pruning
We can prune anything that’s suboptimal: every step, we perform alpha vector pruning at every step.
Limitations
HSVI is better at handling systems with lower uncertainty.
- Does not make an attempt at challenges of dimensionality
- Make unproven theoretical claims
- Don’t compare to domain contraction
- Compare algorithm to a single alternative
- Compared to continuous state spaces
- Subsection headings