paper

Satellite formation control using multi-agent deep reinforcement learning

Paper number

IAC-24,C1,IPB,17,x83549

Author

Prof. Yue Wang, Beihang University, China

Coauthor

Mr. Zicen Xiong, Beihang University, China

Coauthor

Mr. Zheng Chen, Beihang University, China

Coauthor

Ms. Heng Jiang, Beihang University, China

Year

2024

Abstract

As satellite technology continues to miniaturize and becomes more intelligent, the spacecraft formation has become an important direction in the future development of space systems, attracting attention worldwide. The existing satellite formation flying mainly rely on model-based optimization methods and deterministic control, resulting in limited autonomy and difficulties in handling nonlinear environmental constraints. The utilization of intelligent methods such as neural networks effectively reduces the dependence on prior knowledge and global information in formation control, making them more suitable and rapid for solving nonlinear programming problems.

This work addresses the demands of multi-spacecraft formation tasks and presents a novel approach based on deep reinforcement learning (DRL) for satellite formation flying reconfiguration control along with the design of a multi-agent collaborative mechanism. An off-policy approach, Soft Actor Critic (SAC), is utilized in this work to obtain stochastic reconfiguration policies for the satellites (agents) with continuous action space. The SAC maximizes a value function that incorporates a bonus entropy reward, which measures the randomness, namely the agents’ ability to explore. This trade-off between exploitation and exploration thus encourages agents to explore globally, instead of converging prematurely to a deficient local optimum, and to learn the near-optimal continuous reconfiguration strategies concerning both orbital and attitude constraints or various perturbations. A newly established action and observation spaces specializing for the multi-satellite system are introduced for the formation reconfiguration problem. And easier convergence of relative orbital transfer is achieved by applying a newly interpretation of the value function which releases the phase constraints.

Meanwhile, based on the SAC network, a higher control algorithm is developed to customize the reconfiguration objectives. The agents are trained in various formation flying scenarios, such as astronomical and space debris observation, formation communication maintenance, and obstacle or solar radiation avoidance. Finally, the performance discrepancies between the proposed DRL methods and traditional model-based control methods are analyzed, further refining the intelligent approaches.

Abstract document

IAC-24,C1,IPB,17,x83549.brief.pdf

Manuscript document

IAC-24,C1,IPB,17,x83549.pdf (🔒 authorized access only).

To get the manuscript, please contact IAF Secretariat.