Cluster Tree Based Hybrid Document Similarity Measure

Authors

  • Devi MV PG Student, Department of Computer Science and Engineering, V.S.B. Engineering College, Karur

Keywords:

Dimensionality reduction, semantic analysis, cluster tree, hybrid similarity, term association

Abstract

Cluster tree based hybrid similarity measure is established to measure the hybrid similarity. In cluster tree, the hybrid similarity measure can be calculated for the random data even it may not be the co -occurred and generate different views. Different views of tree can be combined and choose the one which is significant in cost. A method is proposed to combine the multiple views. Multiple views are represented by different distance measures into a single cluster. Comparing the cluster tree based hybrid similarity with the traditional statistical methods it gives the better feasibility for intelligent based search. It helps in improving the dimensionality reduction and semantic analysis.

References

D. Cai, X. He, and J. Han, “Document clustering using locality preserving indexing,”IEEE Trans. Knowl. Data Eng., vol. 17, no. 12, pp. 1624–1637, Dec. 2005.

T. Hofmann, “Probabilistic latent semantic indexing,” in Proc. 22nd Annu. Int. SIGIR Conf., 1999, pp. 50–57. [6] D. Blei, A. Ng, and M. Jordan,

“Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.

N. Bouguila, “Clustering of count data using generalized Dirichlet multinomial distributions,”IEEE Trans. Knowl. Data Eng., vol. 20, no. 4, pp.

–474, Apr. 2008.

M. Welling, M. Rosen-Zvi, and G. Hinton,“Exponential family harmoniums with an application to information retrieval,” in Proc. Adv. Neural Inf. Process. Syst., 2004, vol. 17, pp. 1481–1488.

P. Gehler, A. Holub, and M. Welling, “The rate adapting Poisson model for information retrieval and object recognition,” in Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh, PA, 2006, pp. 337–344.

H. Zhang, T. W. S. Chow, and M. K. M. Rahman, “A new dualing harmonium model for document retrieval,” Pattern Recognit., vol.42, no. 11, pp. 2950–2960, 2009.

A. Schenker, M. Last, H. Bunke, and A.Kandel, “Classification of web documents using graph matching,” Int. J. Pattern Recognit. Artif. Intell., vol. 18, no. 3, pp. 475–496, 2004.

M. Fuketa, S. Lee, T. Tsuji, M. Okada, and J. Aoe, “A document classification method by using field association words,” Inf. Sci., vol. 126, no. 1–4, pp. 57–70, 2000.

C. M. Tan, Y. F.Wang, and C. D. Lee, “The use of bigrams to enhance text categorization,” Inf. Process. Manag., vol. 38, no. 4, pp. 529–546,

M. L. Antonie and O. R. Zaiane, “Text document categorization by term association,” in Proc. IEEE Int. Conf. Data Mining, 2002, pp. 19–

P. Kanerva, J. Kristoferson, and A. Holst, “Random indexing of text samples for latent semantic analysis,” in Proc. 22nd Annu. Conf. Cognit. Sci. Soc., 2000, pp. 103–106.

E. Gabrilovich and S.Markovitch, “Computing semantic relatedness using Wikipedia-based explicit semantic analysis,” in Proc. 20th Int. Joint Conf. Artif. Intell., 2007, pp. 1606–1611.

Downloads

Published

2024-02-26

How to Cite

Devi, M. V. (2024). Cluster Tree Based Hybrid Document Similarity Measure. COMPUSOFT: An International Journal of Advanced Computer Technology, 3(01), 494–498. Retrieved from https://ijact.in/index.php/j/article/view/88

Issue

Section

Original Research Article

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.