Clustering method via independent components for semi-structured documents

Tong Wang; Da-Xin Liu; Xuanzuo Lin; Wei Sun

doi:10.1117/12.665427

18 April 2006 Clustering method via independent components for semi-structured documents

Tong Wang, Da-Xin Liu, Xuanzuo Lin, Wei Sun

Proceedings Volume 6241, Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2006; 62410V (2006) https://doi.org/10.1117/12.665427
Event: Defense and Security Symposium, 2006, Orlando (Kissimmee), Florida, United States

Abstract

This paper presents a novel clustering method for XML documents. Much research effort of document clustering is currently devoted to support the storage and retrieval of large collections of XML documents. However, traditional text clustering approaches cannot embody the structural information of semi-structured documents. Our technique is firstly to extract relative path features to represent each document. And then, we transform these documents to Vector Space Model (VSM) and propose a similarity computation. Before clustering, we apply Independent Component Analysis (ICA) to reduce dimensions of VSM. To the best of author's knowledge, ICA has not been used for XML clustering before. The standard C-means partition algorithm is also improved: When a solution can be no more improved, the algorithm makes the next iteration after an appropriate disturbance on the local minimum solution. Thus the algorithm can skip out of the local minimum and in the meanwhile, reach the whole search space. Experimental results, based on two real datasets and one synthetic dataset, show that the proposed approach is efficient and outperforms naive-clustering method without ICA applied.

Citation Download Citation

Tong Wang, Da-Xin Liu, Xuanzuo Lin, and Wei Sun "Clustering method via independent components for semi-structured documents", Proc. SPIE 6241, Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2006, 62410V (18 April 2006); https://doi.org/10.1117/12.665427

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available