One-Liner
“impact of approximation decreases as steps from the root node”
Novelty
- combined alpha-vector and forward heuristics to guide search of belief states before backup
- 100x times faster in PBVI
- scales to huge environments
Goal: minimize “regret” (difference until optimal policy)
Novelty HSVI 2
- Projected the upper bound onto a convex hull (HSVI2: via approximate convex hull projection)
- uses blind lower bound