POMCPOW

POMDPs with continuous actions are hard. So POMCP or (belief update + MCTS).

So instead, let’s try improving that. Unlike just POMCP, not only do we have \(B(h)\), we also have \(W(h)\), which is the weight of a specific state sampled. Naively applying POMCP on continuous states will give a wide-ass tree because each sampled state will not be the same as before.

double progressive widening

We want to use sampling to sample from observation. This will eventually lead to a suboptimal QMDP policy—this is because there are no state uncertainty?

POMCPOW

get an action from ActionProgressiveWiden function
Get an observation, if the observation we got has to many children we prune
discard the observation and stick the next state onto previous observation weighted by the observation likelihood system \(Z(o|s,a,s’)\)

\(k, \alpha, C\)

PFTDTW

MCTS
Particle filters
Double Progressive Widening