基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法
Integrated Modular Avionics System Reconstruction Method Based on Sequential Game Multi-agent Reinforcement Learning
查看参考文献27篇
文摘
|
动态重构是一种有效的综合模块化航空电子系统故障容错方法.重构蓝图定义了系统故障环境下的应用迁移与资源重配置方案,是以最小代价重构恢复系统功能的关键.在复杂多级关联故障模式下,如何快速自动生成有效重构蓝图是其难点.针对该问题,本文提出一种基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法.该方法引入序贯博弈模型,将因受故障影响而需要迁移重构的应用软件定义为博弈中的智能体,根据应用软件优先级确定序贯博弈的顺序.针对序贯博弈过程中多智能体间竞争与合作的问题,算法使用强化学习中的策略梯度,通过控制与环境交互中的动作选择概率来优化重构效果.应用基于有偏估计的策略梯度蒙特卡洛树搜索算法更新博弈策略,解决了传统策略梯度算法震荡难收敛、计算耗时长问题.实验结果表明,与差分进化、Q学习等方法相比,所提算法的优化性能和稳定性均具有显著优势. |
其他语种文摘
|
Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA)systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment, which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game, the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy, which solves the problems of oscillation, difficulty in convergence, long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods, the proposed algorithm has significant advantages in convergence and efficiency. |
来源
|
电子学报
,2022,50(4):954-966 【核心库】
|
DOI
|
10.12263/DZXB.20211268
|
关键词
|
综合模块化航空电子系统
;
序贯博弈
;
策略梯度
;
多智能体强化学习
;
蒙特卡洛树搜索
;
重构
|
地址
|
西北工业大学软件学院, 陕西, 西安, 710065
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
0372-2112 |
学科
|
自动化技术、计算机技术 |
基金
|
国家自然科学基金
;
中国航空科学基金
;
上海航天科技创新基金
|
文献收藏号
|
CSCD:7190621
|
参考文献 共
27
共2页
|
1.
Parr G R. Integrated modular avionics.
Air & Space Europe,1999,1(2):72-75
|
CSCD被引
6
次
|
|
|
|
2.
丁全心. 综合模块化航空电子系统标准述评.
电光与控制,2013,20(6):1-3
|
CSCD被引
5
次
|
|
|
|
3.
Parton D. Blueprint for the future.
Mental Health Today,2011,63(2):10
|
CSCD被引
1
次
|
|
|
|
4.
Jolliffe G.
Exploring the Possibilities Towards a Preliminary Safety Case for IMA Blueprints,2005:8.1-8.43
|
CSCD被引
1
次
|
|
|
|
5.
王震. 基于在线加载分区机制的重构方案的设计与实现.
航空电子技术,2016,47(1):6
|
CSCD被引
1
次
|
|
|
|
6.
Briao E W. Impact of task migration in NoC-based MPSoCs for soft real-time applications.
2007 IFIP International Conference on Very Large Scale Integration. Virtual Conference,2007:296-299
|
CSCD被引
1
次
|
|
|
|
7.
Annighoefer B. Structured and symmetric IMA architecture optimization: Use case Ariane launcher.
IEEE/AIAA Digital Avionics Systems Conference,2015:6B3-1-6B3-14
|
CSCD被引
1
次
|
|
|
|
8.
刘若辰. 动态多目标优化研究综述.
计算机学报,2020,43(7):1246-1278
|
CSCD被引
25
次
|
|
|
|
9.
Calabough J. Software configuration-an NP-complete problem.
ACM Sigmis Database,1988,19(2):29-34
|
CSCD被引
1
次
|
|
|
|
10.
Hou X Y. Path planning of lunar rover group based on theory of dynamic programming and multi-objective optimization.
IEEE Conference on Industrial Electronics & Applications. Virtual Conference,2007:1308-1313
|
CSCD被引
1
次
|
|
|
|
11.
赵玉芳. 极小化总完工时间的单机连续型批调度问题.
电子学报,2008,36(2):367-370
|
CSCD被引
2
次
|
|
|
|
12.
Zilinskas A. Branch and probability bound methods in multi-objective optimization.
Optimization Letters,2016,10(2):341-353
|
CSCD被引
1
次
|
|
|
|
13.
Singh H K. C-PSA: Constrained pareto simulated annealing for constrained multi-objective optimization.
Information Sciences,2010,180(13):2499-2513
|
CSCD被引
5
次
|
|
|
|
14.
Zhang J. An improved multiobjective adaptive niche genetic algorithm based on pareto front.
2009 IEEE International Advance Computing Conference. Virtual Conference,2009:300-304
|
CSCD被引
1
次
|
|
|
|
15.
Lei R. A pareto-based differential evolution algorithm for multi-objective optimization problems.
2010 Chinese Control and Decision Conference,2010:1608-1613
|
CSCD被引
1
次
|
|
|
|
16.
Zhang T. Automatic generation of reconfiguration blueprints for ima systems using reinforcement learning.
IEEE Embedded Systems Letters,2021,13(4):182-185
|
CSCD被引
1
次
|
|
|
|
17.
罗庆. 基于改进Q学习的IMA系统重构蓝图生成方法.
航空学报,2021,42(8):525792-525792
|
CSCD被引
3
次
|
|
|
|
18.
Huang R T.
Deep Reinforcement Learning,2020:161-212
|
CSCD被引
1
次
|
|
|
|
19.
He Q. WD3: Taming the estimation bias in deep reinforcement learning.
2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI). Virtual Conference,2020:391-398
|
CSCD被引
1
次
|
|
|
|
20.
Chaslot J B. Progressive Strategies for Monte-Carlo Tree Search.
New Mathematics & Natural Computation,2008,4(3):343-357
|
CSCD被引
8
次
|
|
|
|
|