Goal: for a bunch of satellite with
\begin{equation} \alpha \qty(\beta) = \text{argmax}_{x \in X}\sum_{i=1}^{n} \sum_{j=1}^{m} \beta_{ij}x_{ij} \end{equation}
where there’s benefit matrix of Agent assigned to Task, \(\beta\). This is greedy and can be soled with Hungarian Method. But, this becomes hard when satellites MOVE and becomes sequential! and stuff starts running out of time: it becomes sequential with dependenices of past to future.
Solution: Multi-Agent RL. But, vanilla solution will conflict because the dominants strategy maybe the same for each agent.
Solution’: to fix this, we use the learned \(Q\) values as the benefit matrix \(\beta\) at each time stamp, and then apply the equation above with an Method to solve each point.