Motivation
Large crowd navigation with sudden changes: unlikely events are out of likely sample. So, we want to bring in another distribution based on importance and not likelyness.
Goals
DESPOT with Importance Sampling
- take our initial belief
- sample trajectories according to Importance Sampling distribution
- calculate values of those states
- obtain value estimate based on weighted average of the values
Importance Sampling of trajectories
We define an importance distribution of some trajectory \(\xi\):
\begin{equation} q(\xi | b,\pi) = q(s_0) \prod_{t=0}^{D} q(s_{t+1}, o_{t+1} | s_{t}, a_{t+1}) \end{equation}
Background
Importance Sampling
Suppose you have a function \(f(s)\) which isn’t super well integrate-able, yet you want:
\begin{equation} \mu = \mathbb{E}(f(s)) = \int_{0}^{1} f(s)p(s) \dd{s} \end{equation}
how would you sample various \(f(s)\) effectively such that you end up with \(\hat{\mu}\) that’s close enough?
Well, what if you have an importance distribution \(q(s): S \to \mathbb{R}^{[0,1]}\), which tells you how “important” to the expected value of the distribution a particular state is? Then, we can formulate a new, better normalization function called the “importance weight”:
\begin{equation} w(s) = \frac{p(s)}{q(s)} \end{equation}
Therefore, this would make our estimator:
\begin{equation} \hat{\mu} = \frac{\sum_{n} f(s_{n}) w(s_{n})}{\sum_{n} w(s_{n})} \end{equation}
Theoretic grantees
So, there’s a distribution over \(f\):
\begin{equation} q(s) = \frac{b(s)}{w_{\pi}(s)} \end{equation}
where
\begin{equation} w(s) = \frac{\mathbb{E}_{b} \qty( \sqrt{[\mathbb{E}(v|s, \pi )]^{2} + [Var(v|s, \pi )]})}{[\mathbb{E}(v|s, \pi )]^{2} + [Var(v|s, \pi )]} \end{equation}
which measures how important a state is, where \(\pi\) is the total discounted reward.