Houjun Liu

FV-POMCPs

Main problem: joint actions and observations are exponential by the number of agents.

Solution: Smaple-based online planning for multiagent systems. We do this with the factored-value POMCP.

factored statistics: reduces the number of joint actions (through action selection statistics)
factored trees: reduces the number of histories

Multiagent Definition

\(I\) set of agents
\(S\) set of states
\(A_{i}\) set of states for each agent \(i\)
\(T\) state transitions
\(R\) reward function
\(Z_{i}\) joint observations for each agents
\(O\) set of observations

Coordination Graphs

you can use sum-product elimination to shorten the Baysian Network of the agent Coordination Graphs (which is how agents influnece each other).

Mixture of Experts

Directly search for the best joint actions; computed by MLE of the total value.