DATA EXTRACTION AND ALIGNMENT USING TAGS AND VALUE SIMILARITY

Authors

  • Padmavathi S Master of Philosophy in Computer Science, Marudupandiyar College
  • Tamilselvi K Marudupandiyar College

Keywords:

Data Extraction, QRRs, HTML DOM, Value Similarity

Abstract

Web databases generate query result pages based on a user’s query. Automatically extracting these data from query result pages is very important for many applications, such as data integrations, which needs to cooperate with multiple web databases. This system presents a novel data extraction and alignment method called DATVS that combines both tag and value similarity. DATVS automatically extracts data from query result pages by first identifying and segmenting the query result records (QRRs) in the query result pages and then aligning the data segmentation QRRs into a table, in which the data values from the same each attributes the put into the same column. Specifically, This propose new techniques to handle the case when the QRRs is not contiguous, which may be due to presence of an auxiliary information, such a comment, recommendation or advertisement and for handling they any nested structure that may exist in the QRRs. The new system is a design and the new record alignment algorithm that aligns the attributes in a record and first pair wise and they holistically, by combines the tag and data value similar information. Experimental results show that DATVS achieves high precision and outperforms existing state-of-the-art data extraction methods.

References

Ronald R. Yager and Frederick E. Petry,”Hyper matching: Similarity Matching With Extreme Values” IEEE Transactions On Fuzzy Systems, Vol. 22, No. 4, August 2014.

Fanman Meng, Hongliang Li, Guanghui Liu, and King Ngi Ngan, “From Logo to Object Segmentation” IEEE Transactions On Multimedia, Vol. 15, No. 8, December 2013.

Weifeng Su, Jiying Wang, Frederick H. Lochovsky, and Yi Liu” Combining Tag and Value Similarity for Data Extraction and Alignment” IEEE Transactions On Knowledge And Data Engineering, Vol. 24, No. 7, July 2012.

Jun Kong, Omer Barkol, Ruth Bergman, Ayelet Pnueli, Sagi Schein, Kang Zhang, and Chunying Zhao” Web Interface Interpretation Using Graph Grammars” IEEE Transactions On Systems, Man, And Cybernetics—Part C: Applications And Reviews, Vol. 42, No. 4, July 2012.

Jer Lang Hong” Data Extraction for Deep Web Using Word Net” IEEE Transactions On Systems, Man, And Cybernetics—Part C: Applications And Reviews, Vol. 41, No. 6, November 2011.

Alessandro Bozzon, Marco Brambilla, Stefano Ceri, and Silvia Quarteroni” A Framework for Integrating, Exploring, and Searching Location-Based Web Data” Published by the IEEE Computer Society 2011.

Mohammed Kayed and Chia-Hui Chang, Member,” FiVaTech: Page-Level Web Data Extraction from Template Pages”, IEEE Transactions On Knowledge And Data Engineering, Vol. 22, No. 2, February 2010.

Tak-Lam Wong and Wai Lam,” Learning to Adapt Web Information Extraction Knowledge and Discovering New Attributes via a Bayesian Approach”, IEEE Transactions On Knowledge And Data Engineering, Vol. 22, No. 4, April 2010.

Wei Liu, Xiaofeng Meng, and Weiyi Meng, “ViDE: A Vision-Based Approach for Deep Web Data Extraction”, IEEE Transactions On Knowledge And Data Engineering, Vol. 22, No. 3, March 2010.

Yanhong Zhai and Bing Liu,” Structured Data Extraction from the Web Based on Partial Tree Alignment”, IEEE Transactions On Knowledge And Data Engineering, Vol. 18, No. 12, December 2006

Downloads

Published

2024-02-26

How to Cite

Padmavathi, S., & Tamilselvi, K. (2024). DATA EXTRACTION AND ALIGNMENT USING TAGS AND VALUE SIMILARITY. COMPUSOFT: An International Journal of Advanced Computer Technology, 3(09), 1092–1097. Retrieved from https://ijact.in/index.php/j/article/view/193

Issue

Section

Original Research Article