Data extraction and label assignment for web databases

Authors

  • Rajesh T UG Student, Department of CSE, Bharath University
  • Prathap T UG Student, Department of CSE, Bharath University
  • Nambi SN UG Student, Department of CSE, Bharath University
  • Arunachalam AR Assistant Professor, Department of CSE, Bharath University

Keywords:

Web databases, Information extraction, visual features

Abstract

Deep Web contents are accessed by queries submitted to Web databases and the returned data records are en wrapped in dynamically generated Web pages (they will be called deep Web pages in this paper). The structured data that Extracting from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. Until now, a too many number of techniques have been proposed to address this problem, but all of them have limitations because they are Web-page-programming-language dependent.

References

M. Álvarez, A. Pan, J. Raposo, F. Bellas, and F. Cacheda, “Extracting lists of data records from semistructured web pages,” Data Knowl. Eng., vol. 64, no. 2, pp. 491–509, Feb. 2008.

A. Arasu and H. Garcia-Molina, “Extracting structured data from web pages,” in Proc. 2003 ACM SIGMOD, San Diego, CA, USA, pp. 337–348.

J. L. Arjona, R. Corchuelo, D. Ruiz, and M. Toro, “From wrapping to knowledge,” IEEE Trans. Knowl. Data Eng., vol. 19, no. 2, pp. 310–323, Feb. 2007.

F. Ashraf, T. Özyer, and R. Alhajj, “Employing clustering techniques for automatic information extraction from HTML documents,” IEEE Trans. Syst. Man Cybern. C, vol. 38, no. 5, pp. 660–673, Sept. 2008.

M. E. Califf and R. J. Mooney, “Bottom-up relational learning of pattern matching rules for information extraction,” J. Mach. Learn. Res., vol. 4, pp. 177–210, May 2003.

A. Carlson and C. Schafer, “Bootstrapping information extraction from semi-structured web pages,” in Proc. ECML/PKDD, Berlin, Germany, 2008, pp. 195–210.

C.-H. Chang and S.-C. Kuo, “OLERA: Semisupervised web-data extraction with visual support,” IEEE Intell. Syst., vol. 19, no. 6, pp. 56–64,

Nov./Dec. 2004.

C.-H. Chang and S.-C. Lui, “IEPAD: Information extraction based on pattern discovery,” in Proc. 10th Int. Conf. WWW, Hong Kong, China, 2001, pp. 681–688.

C.-H. Chang, M. Kayed, M. R. Girgis, and K. F. Shaalan, “A survey of web information extraction systems,” IEEE Trans. Knowl. Data Eng., vol. 18, no. 10, pp. 1411–1428, Oct. 2006.

W. W. Cohen, M. Hurst, and L. S. Jensen, “A flexible learning system for wrapping tables and lists in HTML documents,” in Proc. 11th Int. Conf. WWW, 2002, pp. 232–24.

Trinity: On Using Trinary Trees for Unsupervised Web Data Extraction Hassan A. Sleiman and Rafael Corchuelo

Downloads

Published

2024-02-26

How to Cite

Rajesh, T., Prathap, T., Nambi, S., & Arunachalam, A. (2024). Data extraction and label assignment for web databases. COMPUSOFT: An International Journal of Advanced Computer Technology, 4(04), 1628–1631. Retrieved from https://ijact.in/index.php/j/article/view/285

Issue

Section

Original Research Article