POMDPs with continuous actions are hard. So POMCP or (belief update + MCTS).
So instead, let’s try improving that. Unlike just POMCP, not only do we have \(B(h)\), we also have \(W(h)\), which is the weight of a specific state sampled. Naively applying POMCP on continuous states will give a wide-ass tree because each sampled state will not be the same as before.
double progressive widening
We want to use sampling to sample from observation. This will eventually lead to a suboptimal QMDP policy—this is because there are no state uncertainty?
POMCPOW
- get an action from ActionProgressiveWiden function
- Get an observation, if the observation we got has to many children we prune
- discard the observation and stick the next state onto previous observation weighted by the observation likelihood system \(Z(o|s,a,s’)\)
\(k, \alpha, C\)
PFTDTW
- MCTS
- Particle filters
- Double Progressive Widening