面向众核处理器的阴阳K-means算法优化
Optimizing Yinyang K-means algorithm on many-core CPUs
查看参考文献32篇
文摘
|
传统阴阳K-means算法处理大规模聚类问题时计算开销十分昂贵。针对典型众核处理器的体系结构特征,提出了一种阴阳K-means算法高效并行加速实现。该实现基于一种新内存数据布局,采用众核处理器中的向量单元来加速阴阳K-means中的距离计算,并面向非一致内存访问(non-unified memory access, NUMA)特性进行了针对性的访存优化。与阴阳K-means算法的开源多线程实现相比,该实现在ARMv8和x86众核平台上分别获得了最高约5.6与8.7的加速比。因此上述优化方法在众核处理器上成功实现了对阴阳K-means算法的加速。 |
其他语种文摘
|
Traditional Yinyang K-means algorithm is computationally expensive when dealing with large-scale clustering problems. An efficient parallel acceleration implementation of Yinyang K-means algorithm was proposed on the basis of the architectural characteristics of typical many-core CPUs. This implementation was based on a new memory data layout,used vector units in many-core CPUs to accelerate distance calculation in Yinyang K-means,and targeted memory access optimization for NUMA(non-uniform memory access) characteristics. Compared with the open source multi-threaded version of Yinyang K-means algorithm,this implementation can achieve the speedup of up to 5.6 and 8.7 approximately on ARMv8 and x86 many-core CPUs,respectively. Experiments show that the optimization successfully accelerate Yinyang K-means algorithm in many-core CPUs. |
来源
|
国防科技大学学报
,2024,46(1):93-102 【核心库】
|
DOI
|
10.11887/j.cn.202401010
|
关键词
|
K-means
;
非一致内存访问
;
向量化
;
众核处理器
;
性能优化
|
地址
|
1.
国防科技大学计算机学院, 湖南, 长沙, 410073
2.
国防科技大学, 并行与分布计算全国重点实验室, 湖南, 长沙, 410073
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
1001-2486 |
学科
|
自动化技术、计算机技术 |
基金
|
国家自然科学基金资助项目
|
文献收藏号
|
CSCD:7660746
|
参考文献 共
32
共2页
|
1.
Lloyd S. Least squares quantization in PCM.
IEEE Transactions on Information Theory,1982,28(2):129-137
|
CSCD被引
215
次
|
|
|
|
2.
Arthur D. k-means + +: the advantages of careful seeding.
Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms,2007
|
CSCD被引
15
次
|
|
|
|
3.
Kanungo T. An efficient k-means clustering algorithm: analysis and implementation.
IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(7):881-892
|
CSCD被引
267
次
|
|
|
|
4.
Xia S Y. Ball k-means: fast adaptive clustering with no bounds.
IEEE transactions on pattern analysis and machine intelligence,2022,44(1):87-99
|
CSCD被引
12
次
|
|
|
|
5.
Drake J. Accelerated k-means with adaptive distance bounds.
Proceedings of 5th NIPS Workshop on Optimization for Machine Learning. 8,2012:1-4
|
CSCD被引
1
次
|
|
|
|
6.
Hamerly G. Making k-means even faster.
Proceedings of the 2010 SIAM International Conference on Data Mining,2010
|
CSCD被引
1
次
|
|
|
|
7.
Milanov D V. Relaxed triangle inequality for the orbital similarity criterion by Southworth and Hawkins and its variants.
Celestial Mechanics and Dynamical Astronomy,2019,131:5
|
CSCD被引
1
次
|
|
|
|
8.
Ding Y F. Yinyang K-means: a drop-in replacement of the classic K-means with consistent speedup.
Proceedings of the 32nd International Conference on Machine Learning,2015
|
CSCD被引
1
次
|
|
|
|
9.
Wu F H. A vectorized K-means algorithm for intel many integrated core architecture.
Lecture Notes in Computer Science,2013:277-294
|
CSCD被引
1
次
|
|
|
|
10.
Kwedlo W. A hybrid MPI/OpenMP parallelization of K-means algorithms accelerated using the triangle inequality.
IEEE Access,2019,7:42280-42297
|
CSCD被引
3
次
|
|
|
|
11.
Zhao W Z. Parallel K-means clustering based on MapReduce.
Lecture Notes in Computer Science,2009:674-679
|
CSCD被引
8
次
|
|
|
|
12.
Kumar J. Parallel kmeans clustering for quantitative ecoregion delineation using large data sets.
Procedia Computer Science,2011,4:1602-1611
|
CSCD被引
4
次
|
|
|
|
13.
Bhimani J. Accelerating K-means clustering with parallel implementations and GPU computing.
Proceedings of IEEE High Performance Extreme Computing Conference (HPEC),2015
|
CSCD被引
1
次
|
|
|
|
14.
Zechner M. Accelerating k-means on the graphics processor via CUDA.
Proceedings of First International Conference on Intensive Applications and Services,2009
|
CSCD被引
1
次
|
|
|
|
15.
Farivar R. A parallel implementation of K-means clustering on GPUs.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications,2008
|
CSCD被引
1
次
|
|
|
|
16.
Hussain H M. FPGA implementation of K-means algorithm for bioinformatics application: an accelerated approach to clustering Microarray data.
Proceedings of NASA/ESA Conference on Adaptive Hardware and Systems (AHS),2011
|
CSCD被引
1
次
|
|
|
|
17.
Dias L A. Parallel implementation of K-means algorithm on FPGA.
IEEE Access,2020,8:41071-41084
|
CSCD被引
1
次
|
|
|
|
18.
Taylor C. Accelerating the Yinyang kmeans algorithm using the GPU.
Proceedings of IEEE 37th International Conference on Data Engineering (ICDE),2021
|
CSCD被引
1
次
|
|
|
|
19.
Intel.
Accelerate your compute-intensive workloads: Intel®advanced vector extensions 512(Intel®AVX-512),2022
|
CSCD被引
1
次
|
|
|
|
20.
ARM.
Neon,2022
|
CSCD被引
1
次
|
|
|
|
|