a controller is a that maintains its own state.
constituents
- \(X\): a set of nodes (hidden, internal states)
- \(\Psi(a|x)\): probability of taking an action
- \(\eta(x’|x,a,o)\) : transition probability between hidden states
requirements
Controllers are nice because we:
- don’t have to maintain a belief over time: we need an initial belief, and then we can create beliefs as we’d like without much worry
- controllers can be made shorter than conditional plans
additional information
finite state controller
A finite state controller has a finite amount of hidden internal state.
Consider the crying baby problem. We will declare two internal state:
\begin{equation} x_1, x_2 \end{equation}
Given our observations and our internal states, we can declare transitions and an action probability \(\Psi\):
We essentially declare a policy vis a vi your observations. It can be a sequence, for instance, if we want to declare a policy whereby if you cry twice then you feed, you can declare:
finite state controller evaluation
\begin{equation} U(x, s) = \sum_{a}^{} \Psi(a|x) \qty[R(s,a) + \gamma \qty(\sum_{s’}^{} T(s’|s,a) \sum_{o}^{} O(o|a, s’) \sum_{x’}^{} \eta(x’|x,a,o) U(x’, s’)) ] \end{equation}
which is a conditional plan evaluation but we know even litle
and, to construct alpha vectors:
\begin{equation} \alpha_{x} = \qty[U(x, s_1), \dots, U(x, s_{n})] \end{equation}
we just make one alpha vector per node. So the entire plan is represented as usual by \(\Gamma\) a set of alpha vectors. And yes you can alpha vector pruning.
\begin{align} U(x,b) = b^{\top} \alpha_{x} \end{align}
node we want to start at:
\begin{equation} X^{*} = \arg\max_{x} U(x,b) \end{equation}
solving for \(\Psi\) and \(\eta\)
- policy iteration: incrementally add nodes and evaluate it
- nonlinear programming: this can be a nonlinear optimization problem
- controller gradient ascent