Analysis of the Temporal Behaviour of Search Engine Crawlers at Web Sites
Keywords:
Web sites, Search engine, Crawlers, Web logs, Server LoadAbstract
Web log mining is the extraction of web logs to analyze user behaviour at web sites. In addition to user information, web logs provide immense information about search engine traffic and behaviour. Search engine crawlers are highly automated programs that periodically visit the web site to collect information. The behaviour of search engines could be used in analyzing server load, quality of search engines, dynamics of search engine crawlers, ethics of search engines etc. The time spent by various crawlers is significant in identifying the server load as major proportion of the server load is constituted by search engine crawlers. A temporal analysis of the search engine crawlers were done to identify their behaviour. It was found that there is a significant difference in the total time spent by various crawlers. The presence of search engine crawlers at web sites on hourly basis was also done to identify the dynamics of search engine crawlers at web sites.
References
C. Lee Giles, Yang Sun and Issac G. Council, “Measuring the Web Crawler Ethics,” WWW2010, ACM, 2010, pp. 1101-1102.
Bhagwani J. and K. Hande, “Context Disambiguation in Web Search Results Using Clustering Algorithm”, International Journal of Computer Science and Communication, vol. 2, pp. 119-123.
Jeeva Jose, P. Sojan Lal, “A Forecasting Model for the Pages Crawled by Search Engine Crawlers at a Web Site”, International Journal of Computer Applications(IJCA), Vol 68,Issue 13, 2013, pp.19-24.
http://www.webconfs.com/what-is-robots-txt-article-12.php
Yang Sun,Ziming Zhuang and C. Lee Giles,” A Large- Scale Study of Robots.txt”, WWW2007, ACM, 2007, pp.1123–1124.
Dikaikos M.P, Athena S. and Loizos P.,”An Investigation of Web Crawler Behavior: Characterization and Metrics”, Computer Communications, Vol 28, 2005, pp.880-897.
Brin .S and Page.L, The Anatomy of a Large Scale Hypertextual Web Search Engine, In Proceedings of the 7th International WWW Conference, Elsevier Science, New York, 1998.
Sullivan D., “Webspin: Newsletter “ http://contentmarketingpedia.com/Marketing-Library/Search/industryNewsSeptA1.pdf
Vaughan L. and Thelwal M., “Search Engine Coverage Bias: Evidence and Possible causes”, Information Processing and Management, Vol 40, pp. 693-707.
Schwenke F. and Weideman M, “The Influence that JavaScript has on the visibility of a web site to search engines – a pilot study”, Informatics & Design Papers and Reports, Vol 11, pp. 1-10.
C. Lee Giles, Yang Sun and Issac G. Council, “Measuring the Web Crawler Ethics,” WWW2010, ACM, 2010, pp. 1101-1102.
D. Mican & D. Sitar-Taut,” Preprocessing and Content/ Navigational Pages Identification as Premises for an Extended Web Usage Mining Model Development”, Informatica Economica, 2009,vol. 13(4),pp.168-179.
A. H. M.Wahab,H.N.M.Mohd,F.H.Hanaf & M.F.M.Mohsin,” Data Pre-processing on Web Server Logs for Generalized Association Rules Mining Algorithm”,World Academy of Science, Engineering and Technology,2008, pp.190-197.
M.Spiliopoulou, ”Web Usage Mining for Web Site Evaluation”, Communications of the ACM, 2000.Vol..43(8), pp.127-134.
http://www.alexa.com/help/webmasters
http://www.webmasterworld.com/search_engine_spiders/4348357.htm
http://user-agent-string.info/list-of-ua/bot-detail?bot=bingbot
http://whatis.riskyinternet.com/what-is/web-robot/discoveryengine-robot-6142/
http://www.rhyolite.com/anti-spam/badbots.html
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1 78852
http://support.google.com/webmasters/bin/answer.py?hl=en&answer
=182072
http://www.majestic12.co.uk/projects/dsearch/
http://help.yahoo.com/help/us/ysearch/slurp
http://blocklistpro.com/content-scrapers/ahrefsbot-seo-spybots.html
Kruskal,W. H., Wallis, W. A.”Use of Ranks in one-criterion Variance analysis”, Journal of the American Statistical Association, 47(260), 1952, pp.583-621.
Paneerselvam, R.: Research Methodology. New Delhi: Prentice Hall of India Private Limited,2005.
Ortega, J., L. And Aguillo, I,” Differences between web sessions according to the origin of their visits”,.Journal of Informetrics, 4, 2010,pp. 331-337 .
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2013 COMPUSOFT: An International Journal of Advanced Computer Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.
©2023. COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY by COMPUSOFT PUBLICATION is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY. Permissions beyond the scope of this license may be available at Creative Commons Attribution 4.0 International Public License.