Cluster Tree Based Hybrid Document Similarity Measure
Keywords:
Dimensionality reduction, semantic analysis, cluster tree, hybrid similarity, term associationAbstract
Cluster tree based hybrid similarity measure is established to measure the hybrid similarity. In cluster tree, the hybrid similarity measure can be calculated for the random data even it may not be the co -occurred and generate different views. Different views of tree can be combined and choose the one which is significant in cost. A method is proposed to combine the multiple views. Multiple views are represented by different distance measures into a single cluster. Comparing the cluster tree based hybrid similarity with the traditional statistical methods it gives the better feasibility for intelligent based search. It helps in improving the dimensionality reduction and semantic analysis.
References
D. Cai, X. He, and J. Han, “Document clustering using locality preserving indexing,”IEEE Trans. Knowl. Data Eng., vol. 17, no. 12, pp. 1624–1637, Dec. 2005.
T. Hofmann, “Probabilistic latent semantic indexing,” in Proc. 22nd Annu. Int. SIGIR Conf., 1999, pp. 50–57. [6] D. Blei, A. Ng, and M. Jordan,
“Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.
N. Bouguila, “Clustering of count data using generalized Dirichlet multinomial distributions,”IEEE Trans. Knowl. Data Eng., vol. 20, no. 4, pp.
–474, Apr. 2008.
M. Welling, M. Rosen-Zvi, and G. Hinton,“Exponential family harmoniums with an application to information retrieval,” in Proc. Adv. Neural Inf. Process. Syst., 2004, vol. 17, pp. 1481–1488.
P. Gehler, A. Holub, and M. Welling, “The rate adapting Poisson model for information retrieval and object recognition,” in Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh, PA, 2006, pp. 337–344.
H. Zhang, T. W. S. Chow, and M. K. M. Rahman, “A new dualing harmonium model for document retrieval,” Pattern Recognit., vol.42, no. 11, pp. 2950–2960, 2009.
A. Schenker, M. Last, H. Bunke, and A.Kandel, “Classification of web documents using graph matching,” Int. J. Pattern Recognit. Artif. Intell., vol. 18, no. 3, pp. 475–496, 2004.
M. Fuketa, S. Lee, T. Tsuji, M. Okada, and J. Aoe, “A document classification method by using field association words,” Inf. Sci., vol. 126, no. 1–4, pp. 57–70, 2000.
C. M. Tan, Y. F.Wang, and C. D. Lee, “The use of bigrams to enhance text categorization,” Inf. Process. Manag., vol. 38, no. 4, pp. 529–546,
M. L. Antonie and O. R. Zaiane, “Text document categorization by term association,” in Proc. IEEE Int. Conf. Data Mining, 2002, pp. 19–
P. Kanerva, J. Kristoferson, and A. Holst, “Random indexing of text samples for latent semantic analysis,” in Proc. 22nd Annu. Conf. Cognit. Sci. Soc., 2000, pp. 103–106.
E. Gabrilovich and S.Markovitch, “Computing semantic relatedness using Wikipedia-based explicit semantic analysis,” in Proc. 20th Int. Joint Conf. Artif. Intell., 2007, pp. 1606–1611.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2014 COMPUSOFT: An International Journal of Advanced Computer Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.
©2023. COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY by COMPUSOFT PUBLICATION is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY. Permissions beyond the scope of this license may be available at Creative Commons Attribution 4.0 International Public License.