We want to solve huge POMDP in the real world, but the belief states are huge. Notably, reachable beliefs are very small given an initial belief.
Why is vanilla PCA bad
PCA as a denoising procedure: the underlying data is some data which is normally noised. This is not strictly true, the points don’t have normal noise.
Better PCA: E-PCA
Instead of Euclidean distance, we use
\begin{equation} L(U,V) = \mid X-UV\mid^{2} \end{equation}
as a metric, where:
\(U\) the feature
specifically:
\begin{equation} F(z) - yz + F^{*}(y) \end{equation}
where \(F\) is any convex objective that is problem specific that you choose,
Bregman Divergence forces the underlying matricies’ bases to be non-negative
Overall Methods
- collect sample beliefs
- apply the belifs into E-PCA
- Discretize the E-PCA’d belifs into a new state space \(S\)
- Recalculate R (\(R(b) = b \cdot R(s)\)) and T (we simply sample \(b,o\) and calculate \(update(b,a,o)\)) for that state space S; congratulations, you are now solving an MDP
- value iteration