Alignment Problem
autonomous systems will do exactly what we tell them to do… so we need to give them good instructions. This is the Alignment Problem
- imperfect objective—underspecified objective
- imperfect model—understanding of the world is underspecified
- imperfect optimization—the model just didn’t solve the problem correctly
Validation Framework
High level structure:
validation_algorithm(system, spec)
system
- environment: state of the world, \(T(s’|s,a)\)
- sensor, \(O(o|s)\)
- agent, policy \(\pi\qty(a | o)\)
example: inverted pendulum
- state: \(\qty (\theta, \omega)\) of the pendulum
- observation: \(O(o|s) = \mathcal{N}\qty (o|s,\Sigma)\), Gaussian noise
- policy: consider the following proportional controller policy
\begin{equation} \pi \qty(a | o) = \begin{cases} 1, \text{if} a = -15 \tau - 8 \omega \\ 0 \end{cases} \end{equation}
- environment: a \(T(s’|s,a)\) given by physics
specification \(\psi\)
Rules of the system—“do not let the pendulum tip over”. Specifications are usually given in formal specification language such as Linear Temporal Logic or Signal Temporal Logic.
validation algorithm
With input a system + specification, a Validation Algorithm provides an output in the form of one of…
Failure Analysis
- Falsification: search for failures of a particular system
- Failure distribution: identify what are the distirbutions of failures
- Failure probability estimation: estimate the probability of failures
Formal Methods
“reachability”
- Linear System reachability
- Nonlinear system reachability
- Discrete system reachability
Other stuff
- Explanation
- Runtime Assurances