帮助 关于我们

返回检索结果

基于E-t-SNE的混合属性数据降维可视化方法
Dimension Reduction and Visualization of Mixed-Type Data Based on E-t-SNE

查看参考文献21篇

魏世超 1,2,3,4   李歆 1,3,4   张宜弛 1,3,4   周晓锋 1,3,4   李帅 1,2,3,4  
文摘 针对传统的t分布随机近邻嵌入(t-SNE)算法只能处理单一属型数据,不能很好地处理混合属性数据的问题,提出一种扩展的t-SNE降维可视化算法E-t-SNE,用于处理混合属性数据。该方法引入信息熵概念来构建分类属性数据的距离矩阵,采用分类属性数据距离与数值属性数据欧式距离相结合的方式构建混合属性数据距离矩阵,将新的距离矩阵输入t-SNE算法对数据进行降维并在二维空间可视化展示。此外,为验证算法有效性,采用k近邻(kNN)算法对混合数据降维后的效果进行评价。通过在UCI数据集上的实验表明,该方法在处理混合属性数据方面,不仅具有较好的可视化能力,而且能有效地对不同类别的数据进行降维分簇,提升后续分类器的分类准确率。
其他语种文摘 Aiming at the problem that the traditional t-SNE algorithm can only deal with single attribute data and can't handle mixed type data very well.An extended t-SNE dimensionality reduction visualization algorithm named E-t-SNE is proposed.The extension facilitates to handle mixed type data.The concept of information entropy is introduced to construct the distance matrix of categorical data.The distance matrix of mixed type data is constructed by combining the distance between categorical data and the Euclidean distance of numerical data.The combined matrix is used into t-SNE algorithm to reduce the dimension and display it in two-dimensional space.In addition,in order to verify the effectiveness of the algorithm,k-Nearest Neighbor(kNN)algorithm is used to evaluate.Experiments on UCI datasets show that this method not only has good visualization ability in dealing with mixed attribute data,but also can effectively reduce the dimension of different classes of data and improve the classification accuracy of subsequent classifiers.
来源 计算机工程与应用 ,2020,56(6):66-72 【扩展库】
DOI 10.3778/j.issn.1002-8331.1903-0330
关键词 t-SNE算法 ; 混合属性数据 ; 降维 ; 可视化
地址

1. 中国科学院沈阳自动化研究所, 沈阳, 110016  

2. 中国科学院大学, 北京, 100049  

3. 中国科学院网络化控制系统重点实验室, 中国科学院网络化控制系统重点实验室, 沈阳, 110016  

4. 中国科学院机器人与智能制造创新研究院, 沈阳, 110016

语种 中文
文献类型 研究性论文
ISSN 1002-8331
学科 自动化技术、计算机技术
基金 辽宁省沈阳市科技计划项目
文献收藏号 CSCD:6707936

参考文献 共 21 共2页

1.  Mohammed R A. Machine learning techniques for highly imbalanced credit card fraud detection:a comparative study. Pacific Rim International Conference on Artificial Intelligence,2018:237-246 被引 2    
2.  胡彬. 基于核学习和距离相似度量的行人再识别. 信息与控制,2017,46(5):525-529 被引 3    
3.  Cherchi E. A Monte Carlo experiment to analyze the curse of dimensionality in estimating random coefficients models with a full variance-covariance matrix. Transportation Research Part B:Mechodological,2012,46(2):321-332 被引 2    
4.  Junchin A. Supervised,unsupervised,and semi supervised feature selection:review on gene selection. Transactions on Computational Biology and Bioinformatics,2016,13(5):971-989 被引 1    
5.  Danubianu M. Data dimensionality reduction for data mining:a combined filter-wrapper framework. International Journal of Computers Communications & Control,2014,9(3):576-580 被引 1    
6.  Turk M. Eigenfaces for recognition. Journal of Cognitive Neuroscience,1991,3(1):71-86 被引 957    
7.  Roweis S T. Nonlinear dimensionality reduction by locally linear embedding. Science,2000,290(5500):2323-2326 被引 1302    
8.  Tenenbaum J B. A global geometric framework for nonlinear dimensionality reduction. Science,2000,290(5500):2319-2323 被引 996    
9.  Zhang P. An improved local tangent space alignment method for manifold learning. Pattern Recognition Letters,2011,32(2):181-189 被引 18    
10.  Van Der Maaten L. Visualizing data using t-SNE. Journal of Machine Learning Research,2008,9(11):2579-2605 被引 391    
11.  Belkin M. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation,2003,15(6):1373-1396 被引 558    
12.  Jamieson A R. Exploring nonlinear feature space dimension reduction and data representation in breast CADx with Laplacian eigenmaps and t-SNE. Medical Physics,2010,37(1):339-351 被引 1    
13.  Garces E. A similarity measure for illustration style. ACM Transactions on Graphics,2014,33(4):1-9 被引 2    
14.  Liu W. Application of improved locally linear embedding algorithm in dimensionality reduction of cancer gene expression data. Journal of Biomedical Engineering,2014,31(1):85-90 被引 1    
15.  Hsu C C. Integrated dimensionality reduction technique for mixed-type data involving categorical values. Applied Soft Computing,2016,43:199-209 被引 1    
16.  Yang Y. A re-examination of text categorization methods. International ACM SIGIR Conference on Research and Development in Information Retrieval,1999:42-49 被引 2    
17.  Hinton G E. Stochastic neighbor embedding. Proceedings of Advances in Neural Information Processing Systems,2002:833-840 被引 2    
18.  董迎朝. 基于t-SNE的脑网络状态观测矩阵降维方法研究. 计算机工程与应用,2018,54(1):42-47 被引 5    
19.  姜智涵. 一种基于信息熵的混合属性数据谱聚类算法. 计算机应用研究,2018,36(8) 被引 1    
20.  Ding Shifei. An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood. Knowledge-Based Systems,2017,133:294-313 被引 16    
引证文献 4

1 贺许龙 虚拟样本生成方法及其在重整数据建模中的应用 石油炼制与化工,2021,52(6):92-95
被引 0 次

2 孙彦玺 基于卷积长短时记忆网络的人体行为识别研究 计算机工程,2021,47(10):260-268
被引 1

显示所有4篇文献

论文科学数据集
PlumX Metrics
相关文献

 作者相关
 关键词相关
 参考文献相关

iAuthor 链接
周晓锋 0000-0001-9837-1261
版权所有 ©2008 中国科学院文献情报中心 制作维护:中国科学院文献情报中心
地址:北京中关村北四环西路33号 邮政编码:100190 联系电话:(010)82627496 E-mail:cscd@mail.las.ac.cn 京ICP备05002861号-4 | 京公网安备11010802043238号