Cluster Tree Based Hybrid Document Similarity Measure

M. Varshana Devi

Authors

Devi MV PG Student, Department of Computer Science and Engineering, V.S.B. Engineering College, Karur

Keywords:

Dimensionality reduction, semantic analysis, cluster tree, hybrid similarity, term association

Abstract

Cluster tree based hybrid similarity measure is established to measure the hybrid similarity. In cluster tree, the hybrid similarity measure can be calculated for the random data even it may not be the co -occurred and generate different views. Different views of tree can be combined and choose the one which is significant in cost. A method is proposed to combine the multiple views. Multiple views are represented by different distance measures into a single cluster. Comparing the cluster tree based hybrid similarity with the traditional statistical methods it gives the better feasibility for intelligent based search. It helps in improving the dimensionality reduction and semantic analysis.

References

D. Cai, X. He, and J. Han, “Document clustering using locality preserving indexing,”IEEE Trans. Knowl. Data Eng., vol. 17, no. 12, pp. 1624–1637, Dec. 2005.

T. Hofmann, “Probabilistic latent semantic indexing,” in Proc. 22nd Annu. Int. SIGIR Conf., 1999, pp. 50–57. [6] D. Blei, A. Ng, and M. Jordan,

“Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.

N. Bouguila, “Clustering of count data using generalized Dirichlet multinomial distributions,”IEEE Trans. Knowl. Data Eng., vol. 20, no. 4, pp.

–474, Apr. 2008.

M. Welling, M. Rosen-Zvi, and G. Hinton,“Exponential family harmoniums with an application to information retrieval,” in Proc. Adv. Neural Inf. Process. Syst., 2004, vol. 17, pp. 1481–1488.

P. Gehler, A. Holub, and M. Welling, “The rate adapting Poisson model for information retrieval and object recognition,” in Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh, PA, 2006, pp. 337–344.

H. Zhang, T. W. S. Chow, and M. K. M. Rahman, “A new dualing harmonium model for document retrieval,” Pattern Recognit., vol.42, no. 11, pp. 2950–2960, 2009.

A. Schenker, M. Last, H. Bunke, and A.Kandel, “Classification of web documents using graph matching,” Int. J. Pattern Recognit. Artif. Intell., vol. 18, no. 3, pp. 475–496, 2004.

M. Fuketa, S. Lee, T. Tsuji, M. Okada, and J. Aoe, “A document classification method by using field association words,” Inf. Sci., vol. 126, no. 1–4, pp. 57–70, 2000.

C. M. Tan, Y. F.Wang, and C. D. Lee, “The use of bigrams to enhance text categorization,” Inf. Process. Manag., vol. 38, no. 4, pp. 529–546,

M. L. Antonie and O. R. Zaiane, “Text document categorization by term association,” in Proc. IEEE Int. Conf. Data Mining, 2002, pp. 19–

P. Kanerva, J. Kristoferson, and A. Holst, “Random indexing of text samples for latent semantic analysis,” in Proc. 22nd Annu. Conf. Cognit. Sci. Soc., 2000, pp. 103–106.

E. Gabrilovich and S.Markovitch, “Computing semantic relatedness using Wikipedia-based explicit semantic analysis,” in Proc. 20th Int. Joint Conf. Artif. Intell., 2007, pp. 1606–1611.

Cluster Tree Based Hybrid Document Similarity Measure

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Download

Indexing

Information