IS-DESPOT

Motivation

Large crowd navigation with sudden changes: unlikely events are out of likely sample. So, we want to bring in another distribution based on importance and not likelyness.

Goals

retains DESPOT garantees
outperforms DESPOT and POMCP

DESPOT with Importance Sampling

take our initial belief
sample trajectories according to Importance Sampling distribution
calculate values of those states
obtain value estimate based on weighted average of the values

Importance Sampling of trajectories

We define an importance distribution of some trajectory \(\xi\):

\begin{equation} q(\xi | b,\pi) = q(s_0) \prod_{t=0}^{D} q(s_{t+1}, o_{t+1} | s_{t}, a_{t+1}) \end{equation}

Background

Importance Sampling