帮助关于我们

返回检索结果

基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法
Integrated Modular Avionics System Reconstruction Method Based on Sequential Game Multi-agent Reinforcement Learning

查看参考文献27篇

张涛张文涛代凌陈婧怡王丽魏倩茹

文摘	动态重构是一种有效的综合模块化航空电子系统故障容错方法.重构蓝图定义了系统故障环境下的应用迁移与资源重配置方案,是以最小代价重构恢复系统功能的关键.在复杂多级关联故障模式下,如何快速自动生成有效重构蓝图是其难点.针对该问题,本文提出一种基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法.该方法引入序贯博弈模型,将因受故障影响而需要迁移重构的应用软件定义为博弈中的智能体,根据应用软件优先级确定序贯博弈的顺序.针对序贯博弈过程中多智能体间竞争与合作的问题,算法使用强化学习中的策略梯度,通过控制与环境交互中的动作选择概率来优化重构效果.应用基于有偏估计的策略梯度蒙特卡洛树搜索算法更新博弈策略,解决了传统策略梯度算法震荡难收敛、计算耗时长问题.实验结果表明,与差分进化、Q学习等方法相比,所提算法的优化性能和稳定性均具有显著优势.
其他语种文摘	Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA)systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment, which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game, the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy, which solves the problems of oscillation, difficulty in convergence, long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods, the proposed algorithm has significant advantages in convergence and efficiency.
来源	电子学报 ,2022,50(4):954-966 【核心库】
DOI	10.12263/DZXB.20211268
关键词	综合模块化航空电子系统 ; 序贯博弈 ; 策略梯度 ; 多智能体强化学习 ; 蒙特卡洛树搜索 ; 重构
地址	西北工业大学软件学院, 陕西, 西安, 710065
语种	中文
文献类型	研究性论文
ISSN	0372-2112
学科	自动化技术、计算机技术
基金	国家自然科学基金 ; 中国航空科学基金 ; 上海航天科技创新基金
文献收藏号	CSCD:7190621

参考文献共 27 共2页

引证文献 3 篇

1 李腾基于强化学习的自免疫动态攻击生成方法电子学报,2023,51(11):3033-3041
CSCD被引 0 次

2 杨媛媛深度强化学习算法求解动态流水车间实时调度问题控制理论与应用,2024,41(6):1047-1055
CSCD被引 0 次

显示所有3篇文献

论文科学数据集

PlumX Metrics

相关文献
作者相关关键词相关参考文献相关

版权所有 ©2008 中国科学院文献情报中心制作维护：中国科学院文献情报中心
地址：北京中关村北四环西路33号邮政编码：100190 联系电话：(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号