基于E-t-SNE的混合属性数据降维可视化方法
Dimension Reduction and Visualization of Mixed-Type Data Based on E-t-SNE
查看参考文献21篇
魏世超
1,2,3,4
李歆
1,3,4
张宜弛
1,3,4
周晓锋
1,3,4
李帅
1,2,3,4
文摘
|
针对传统的t分布随机近邻嵌入(t-SNE)算法只能处理单一属型数据,不能很好地处理混合属性数据的问题,提出一种扩展的t-SNE降维可视化算法E-t-SNE,用于处理混合属性数据。该方法引入信息熵概念来构建分类属性数据的距离矩阵,采用分类属性数据距离与数值属性数据欧式距离相结合的方式构建混合属性数据距离矩阵,将新的距离矩阵输入t-SNE算法对数据进行降维并在二维空间可视化展示。此外,为验证算法有效性,采用k近邻(kNN)算法对混合数据降维后的效果进行评价。通过在UCI数据集上的实验表明,该方法在处理混合属性数据方面,不仅具有较好的可视化能力,而且能有效地对不同类别的数据进行降维分簇,提升后续分类器的分类准确率。 |
其他语种文摘
|
Aiming at the problem that the traditional t-SNE algorithm can only deal with single attribute data and can't handle mixed type data very well.An extended t-SNE dimensionality reduction visualization algorithm named E-t-SNE is proposed.The extension facilitates to handle mixed type data.The concept of information entropy is introduced to construct the distance matrix of categorical data.The distance matrix of mixed type data is constructed by combining the distance between categorical data and the Euclidean distance of numerical data.The combined matrix is used into t-SNE algorithm to reduce the dimension and display it in two-dimensional space.In addition,in order to verify the effectiveness of the algorithm,k-Nearest Neighbor(kNN)algorithm is used to evaluate.Experiments on UCI datasets show that this method not only has good visualization ability in dealing with mixed attribute data,but also can effectively reduce the dimension of different classes of data and improve the classification accuracy of subsequent classifiers. |
来源
|
计算机工程与应用
,2020,56(6):66-72 【扩展库】
|
DOI
|
10.3778/j.issn.1002-8331.1903-0330
|
关键词
|
t-SNE算法
;
混合属性数据
;
降维
;
可视化
|
地址
|
1.
中国科学院沈阳自动化研究所, 沈阳, 110016
2.
中国科学院大学, 北京, 100049
3.
中国科学院网络化控制系统重点实验室, 中国科学院网络化控制系统重点实验室, 沈阳, 110016
4.
中国科学院机器人与智能制造创新研究院, 沈阳, 110016
|
语种
|
中文 |
文献类型
|
研究性论文 |
ISSN
|
1002-8331 |
学科
|
自动化技术、计算机技术 |
基金
|
辽宁省沈阳市科技计划项目
|
文献收藏号
|
CSCD:6707936
|
参考文献 共
21
共2页
|
1.
Mohammed R A. Machine learning techniques for highly imbalanced credit card fraud detection:a comparative study.
Pacific Rim International Conference on Artificial Intelligence,2018:237-246
|
被引
2
次
|
|
|
|
2.
胡彬. 基于核学习和距离相似度量的行人再识别.
信息与控制,2017,46(5):525-529
|
被引
3
次
|
|
|
|
3.
Cherchi E. A Monte Carlo experiment to analyze the curse of dimensionality in estimating random coefficients models with a full variance-covariance matrix.
Transportation Research Part B:Mechodological,2012,46(2):321-332
|
被引
2
次
|
|
|
|
4.
Junchin A. Supervised,unsupervised,and semi supervised feature selection:review on gene selection.
Transactions on Computational Biology and Bioinformatics,2016,13(5):971-989
|
被引
1
次
|
|
|
|
5.
Danubianu M. Data dimensionality reduction for data mining:a combined filter-wrapper framework.
International Journal of Computers Communications & Control,2014,9(3):576-580
|
被引
1
次
|
|
|
|
6.
Turk M. Eigenfaces for recognition.
Journal of Cognitive Neuroscience,1991,3(1):71-86
|
被引
957
次
|
|
|
|
7.
Roweis S T. Nonlinear dimensionality reduction by locally linear embedding.
Science,2000,290(5500):2323-2326
|
被引
1302
次
|
|
|
|
8.
Tenenbaum J B. A global geometric framework for nonlinear dimensionality reduction.
Science,2000,290(5500):2319-2323
|
被引
996
次
|
|
|
|
9.
Zhang P. An improved local tangent space alignment method for manifold learning.
Pattern Recognition Letters,2011,32(2):181-189
|
被引
18
次
|
|
|
|
10.
Van Der Maaten L. Visualizing data using t-SNE.
Journal of Machine Learning Research,2008,9(11):2579-2605
|
被引
391
次
|
|
|
|
11.
Belkin M. Laplacian eigenmaps for dimensionality reduction and data representation.
Neural Computation,2003,15(6):1373-1396
|
被引
558
次
|
|
|
|
12.
Jamieson A R. Exploring nonlinear feature space dimension reduction and data representation in breast CADx with Laplacian eigenmaps and t-SNE.
Medical Physics,2010,37(1):339-351
|
被引
1
次
|
|
|
|
13.
Garces E. A similarity measure for illustration style.
ACM Transactions on Graphics,2014,33(4):1-9
|
被引
2
次
|
|
|
|
14.
Liu W. Application of improved locally linear embedding algorithm in dimensionality reduction of cancer gene expression data.
Journal of Biomedical Engineering,2014,31(1):85-90
|
被引
1
次
|
|
|
|
15.
Hsu C C. Integrated dimensionality reduction technique for mixed-type data involving categorical values.
Applied Soft Computing,2016,43:199-209
|
被引
1
次
|
|
|
|
16.
Yang Y. A re-examination of text categorization methods.
International ACM SIGIR Conference on Research and Development in Information Retrieval,1999:42-49
|
被引
2
次
|
|
|
|
17.
Hinton G E. Stochastic neighbor embedding.
Proceedings of Advances in Neural Information Processing Systems,2002:833-840
|
被引
2
次
|
|
|
|
18.
董迎朝. 基于t-SNE的脑网络状态观测矩阵降维方法研究.
计算机工程与应用,2018,54(1):42-47
|
被引
5
次
|
|
|
|
19.
姜智涵. 一种基于信息熵的混合属性数据谱聚类算法.
计算机应用研究,2018,36(8)
|
被引
1
次
|
|
|
|
20.
Ding Shifei. An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood.
Knowledge-Based Systems,2017,133:294-313
|
被引
16
次
|
|
|
|
|