基于双融合框架的多模态3D目标检测算法
A Multimodal 3D Object Detection Method Based on Double-Fusion Framework
查看参考文献35篇
文摘
|
相机和激光雷达多模态融合的3D目标检测可以综合利用两种传感器的优点,提高目标检测的准确度和鲁棒性.然而,由于环境复杂性以及多模态数据间固有的差异性,3D目标检测仍面临着诸多挑战.本文提出了双融合框架的多模态3D目标检测算法.设计体素级和网格级的双融合框架,有效缓解融合时不同模态数据之间的语义差异;提出ABFF(Adaptive Bird-eye-view Features Fusion)模块,增强算法对小目标特征感知能力;通过体素级全局融合信息指导网格级局部融合,提出基于Transformer的多模态网格特征编码器,充分提取3D检测场景中更丰富的上下文信息,并提升算法运行效率.在KITTI标准数据集上的实验结果表明,提出的3D目标检测算法平均检测精度达78.79%,具有更好的3D目标检测性能. |
其他语种文摘
|
The 3D object detection of camera and lidar multimodal fusion can comprehensively utilize the advantages of the two sensors to improve the accuracy and robustness of detection. However, due to the complexity of the environment and the inherent variability among multimodal data, 3D object detection still faces many challenges. In this paper, we propose a multimodal 3D object detection algorithm with a double-fusion framework. We design a voxel-level and grid-level double-fusion framework, effectively alleviating the semantic differences between modal data. We propose the ABFF (Adaptive Bird-eye-view Features Fusion) module to enhance the algorithm's ability to perceive small object features. Through voxel-level global fusion information to guide grid-level local fusion, we propose a Transformer-based multimodal grid feature encoder to extract richer context information in 3D detection scenes and improve the efficiency of the algorithm. The experimental results on the KITTI standard dataset show that the average detection accuracy of our proposed 3D object detection algorithm reaches 78.79%, which has better 3D object detection performance. |
来源
|
电子学报
,2023,51(11):3100-3110 【核心库】
|
DOI
|
10.12263/DZXB.20230414
|
关键词
|
深度学习
;
三维目标检测
;
激光雷达
;
相机
;
多模态信息融合
|
地址
|
1.
青岛科技大学数据科学学院, 山东, 青岛, 266000
2.
武汉理工大学计算机与人工智能学院, 湖北, 武汉, 430000
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
0372-2112 |
学科
|
自动化技术、计算机技术 |
基金
|
中国高校产学研创新基金
;
国家自然科学基金
;
山东省高等学校青创科技支持计划
|
文献收藏号
|
CSCD:7641765
|
参考文献 共
35
共2页
|
1.
Yan Y. SECOND: Sparsely embedded convolutional detection.
Sensors,2018,18(10):3337
|
CSCD被引
79
次
|
|
|
|
2.
Shi S S. PV-RCNN: Point-voxel feature set abstraction for 3D object detection.
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2020:10529-10538
|
CSCD被引
1
次
|
|
|
|
3.
Deng J J. Voxel R-CNN: Towards high performance voxel-based 3D object detection.
Proceedings of the AAAI Conference on Artificial Intelligence,2021,35(2):1201-1209
|
CSCD被引
3
次
|
|
|
|
4.
Zheng W. SE-SSD: Self-ensembling single-stage object detector from point cloud.
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2021:14494-14503
|
CSCD被引
1
次
|
|
|
|
5.
Hu J S K. Point densityaware voxels for LiDAR 3D object detection.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2022:8469-8478
|
CSCD被引
1
次
|
|
|
|
6.
Wu H. CasA: A cascade attention network for 3-D object detection from LiDAR point clouds.
IEEE Transactions on Geoscience and Remote Sensing,2022,60:1-11
|
CSCD被引
3
次
|
|
|
|
7.
Philion J. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3D.
Computer Vision-ECCV 2020,2020:194-210
|
CSCD被引
1
次
|
|
|
|
8.
Li Y H. BEVDepth: Acquisition of reliable depth for multi-view 3D object detection.
Proceedings of the AAAI Conference on Artificial Intelligence,2023,37(2):1477-1485
|
CSCD被引
1
次
|
|
|
|
9.
Li Z Q. BEVFormer: Learning Bird's-eye-view representation from multi-camera images via spatiotemporal Transformers.
Lecture Notes in Computer Science,2022:1-18
|
CSCD被引
1
次
|
|
|
|
10.
Vora S. Pointpainting: Sequential fusion for 3D object detection.
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2020:4604-4612
|
CSCD被引
1
次
|
|
|
|
11.
Yin T. Multimodal virtual point 3D detection.
Advances in Neural Information Processing Systems,2021,34(11):16494-16507
|
CSCD被引
4
次
|
|
|
|
12.
Huang T T. EPNet: Enhancing point features with image semantics for 3D object detection.
Computer Vision-ECCV 2020,2020:35-52
|
CSCD被引
4
次
|
|
|
|
13.
Liu Z. EPNet++: Cascade bidirectional fusion for multi-modal 3D object detection.
IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,2022(12):1-18
|
CSCD被引
1
次
|
|
|
|
14.
Zhang Y N. CAT-det: Contrastively augmented transformer for multimodal 3D object detection.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2022:908-917
|
CSCD被引
1
次
|
|
|
|
15.
Pang S. CLOCs: Camera-LiDAR object candidates fusion for 3D object detection.
2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),2020:10386-10393
|
CSCD被引
5
次
|
|
|
|
16.
Wu X P. Sparse fuse dense: Towards high quality 3D detection with depth completion.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2022:5418-5427
|
CSCD被引
1
次
|
|
|
|
17.
Chen Z.
Autoalign: Pixel-instance feature aggregation for multi-modal 3D object detection,2022
|
CSCD被引
1
次
|
|
|
|
18.
Chen Z.
Autoalignv2: Deformable feature aggregation for dynamic multi-modal 3D object detection,2022
|
CSCD被引
1
次
|
|
|
|
19.
Li Y W. Deepfusion: Lidarcamera deep fusion for multi-modal 3D object detection.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2022:17182-17191
|
CSCD被引
1
次
|
|
|
|
20.
Chitta K. TransFuser: Imitation with transformer-based sensor fusion for autonomous driving.
IEEE Transactions on Pattern Analysis and Machine Intelligence,2022:1-18
|
CSCD被引
1
次
|
|
|
|
|