帮助 关于我们

返回检索结果

基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法
Integrated Modular Avionics System Reconstruction Method Based on Sequential Game Multi-agent Reinforcement Learning

查看参考文献27篇

文摘 动态重构是一种有效的综合模块化航空电子系统故障容错方法.重构蓝图定义了系统故障环境下的应用迁移与资源重配置方案,是以最小代价重构恢复系统功能的关键.在复杂多级关联故障模式下,如何快速自动生成有效重构蓝图是其难点.针对该问题,本文提出一种基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法.该方法引入序贯博弈模型,将因受故障影响而需要迁移重构的应用软件定义为博弈中的智能体,根据应用软件优先级确定序贯博弈的顺序.针对序贯博弈过程中多智能体间竞争与合作的问题,算法使用强化学习中的策略梯度,通过控制与环境交互中的动作选择概率来优化重构效果.应用基于有偏估计的策略梯度蒙特卡洛树搜索算法更新博弈策略,解决了传统策略梯度算法震荡难收敛、计算耗时长问题.实验结果表明,与差分进化、Q学习等方法相比,所提算法的优化性能和稳定性均具有显著优势.
其他语种文摘 Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA)systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment, which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game, the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy, which solves the problems of oscillation, difficulty in convergence, long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods, the proposed algorithm has significant advantages in convergence and efficiency.
来源 电子学报 ,2022,50(4):954-966 【核心库】
DOI 10.12263/DZXB.20211268
关键词 综合模块化航空电子系统 ; 序贯博弈 ; 策略梯度 ; 多智能体强化学习 ; 蒙特卡洛树搜索 ; 重构
地址

西北工业大学软件学院, 陕西, 西安, 710065

语种 中文
文献类型 研究性论文
ISSN 0372-2112
学科 自动化技术、计算机技术
基金 国家自然科学基金 ;  中国航空科学基金 ;  上海航天科技创新基金
文献收藏号 CSCD:7190621

参考文献 共 27 共2页

1.  Parr G R. Integrated modular avionics. Air & Space Europe,1999,1(2):72-75 CSCD被引 6    
2.  丁全心. 综合模块化航空电子系统标准述评. 电光与控制,2013,20(6):1-3 CSCD被引 5    
3.  Parton D. Blueprint for the future. Mental Health Today,2011,63(2):10 CSCD被引 1    
4.  Jolliffe G. Exploring the Possibilities Towards a Preliminary Safety Case for IMA Blueprints,2005:8.1-8.43 CSCD被引 1    
5.  王震. 基于在线加载分区机制的重构方案的设计与实现. 航空电子技术,2016,47(1):6 CSCD被引 1    
6.  Briao E W. Impact of task migration in NoC-based MPSoCs for soft real-time applications. 2007 IFIP International Conference on Very Large Scale Integration. Virtual Conference,2007:296-299 CSCD被引 1    
7.  Annighoefer B. Structured and symmetric IMA architecture optimization: Use case Ariane launcher. IEEE/AIAA Digital Avionics Systems Conference,2015:6B3-1-6B3-14 CSCD被引 1    
8.  刘若辰. 动态多目标优化研究综述. 计算机学报,2020,43(7):1246-1278 CSCD被引 25    
9.  Calabough J. Software configuration-an NP-complete problem. ACM Sigmis Database,1988,19(2):29-34 CSCD被引 1    
10.  Hou X Y. Path planning of lunar rover group based on theory of dynamic programming and multi-objective optimization. IEEE Conference on Industrial Electronics & Applications. Virtual Conference,2007:1308-1313 CSCD被引 1    
11.  赵玉芳. 极小化总完工时间的单机连续型批调度问题. 电子学报,2008,36(2):367-370 CSCD被引 2    
12.  Zilinskas A. Branch and probability bound methods in multi-objective optimization. Optimization Letters,2016,10(2):341-353 CSCD被引 1    
13.  Singh H K. C-PSA: Constrained pareto simulated annealing for constrained multi-objective optimization. Information Sciences,2010,180(13):2499-2513 CSCD被引 5    
14.  Zhang J. An improved multiobjective adaptive niche genetic algorithm based on pareto front. 2009 IEEE International Advance Computing Conference. Virtual Conference,2009:300-304 CSCD被引 1    
15.  Lei R. A pareto-based differential evolution algorithm for multi-objective optimization problems. 2010 Chinese Control and Decision Conference,2010:1608-1613 CSCD被引 1    
16.  Zhang T. Automatic generation of reconfiguration blueprints for ima systems using reinforcement learning. IEEE Embedded Systems Letters,2021,13(4):182-185 CSCD被引 1    
17.  罗庆. 基于改进Q学习的IMA系统重构蓝图生成方法. 航空学报,2021,42(8):525792-525792 CSCD被引 3    
18.  Huang R T. Deep Reinforcement Learning,2020:161-212 CSCD被引 1    
19.  He Q. WD3: Taming the estimation bias in deep reinforcement learning. 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI). Virtual Conference,2020:391-398 CSCD被引 1    
20.  Chaslot J B. Progressive Strategies for Monte-Carlo Tree Search. New Mathematics & Natural Computation,2008,4(3):343-357 CSCD被引 8    
引证文献 3

1 李腾 基于强化学习的自免疫动态攻击生成方法 电子学报,2023,51(11):3033-3041
CSCD被引 0 次

2 杨媛媛 深度强化学习算法求解动态流水车间实时调度问题 控制理论与应用,2024,41(6):1047-1055
CSCD被引 0 次

显示所有3篇文献

论文科学数据集
PlumX Metrics
相关文献

 作者相关
 关键词相关
 参考文献相关

版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号