Automatic Indexing Framework for Context Aware Personal Document Management System
Keywords:
Information Retrieval, Personal Document Management, Formal Concept Analysis (FCA), Incremental Formal Concept Analysis, Context Aware Document Retrieval, Automatic Indexing, Document ClusteringAbstract
Today managing files in a personal computer has the same magnitude as managing the World Wide Web due to the dynamic nature of the file system [1]. Even searching for files over a file system is time consuming because finding a file on hard disk is a long-running task. Every file on the disk has to be read with dangling pointers to files which no longer exist because they have been changed, moved or deleted. This makes the user frustrated. Personal document management is the activity of managing a collection of digital documents by the owner of the documents.
This consists of creation, organization, finding and maintenance of documents. Information, especially digital information, is no longer a scarce resource; information exists in abundance and human time and attention have now become the scarce resource [2]. Information overload is now a recognized problem as people struggle to manage the increasing quantities of information they need to deal with on a daily basis [3]. Therefore, an on demand software agent comes necessary to manage the file system and retrieve information that the User needs.
This research proposes a framework that manages semi-structured file collection based on Formal Concept Analysis (FCA). Formal Concept Analysis has been applied in document retrieval in different contexts. Solutions like Conceptual Email Manager [4] and DOCCO [5] do not support dynamic insertions of documents (Extents) to their concept lattice. These tools require algorithms to re-run in order to rebuild the index which is costly. In this work concept lattice is generated incrementally on a document collection and stored in a database making the hierarchical structure compact to facilitate parallel insert and search. The proposed framework utilizes database queries and functions complemented with inverted index which facilitates fast document retrieval and reduce the downtime of costly updates on the master index.
References
Jayaweera, Y. D. 2012. Automatic File Indexing Framework: An Effective Approach to Resolve Dangling File Pointers. International Journal of Computer Applications 49(15):6-11, July 2012. Published by Foundation of Computer Science, New York, USA.
Simon, H. A. 1997. The future of information systems. Annals of Operations Research, 71, 3-14.
Edmunds, A., & Morris, A. 2000. The problem of information overload in business organisations: a review of the literature. Int. J. of Information Management, 17-28.
Cole, R., and Stumme, G. 2000. CEM: A Conceptual Email Manager. In B. Ganter and G. Mineau (editors), Conceptual Structures: Logical, Linguistic, and Computational Issues, number 1867 in LNAI, pages 438–452. Springer Verlag, Berlin–Heidelberg–New York.
Dean van der Merwe, Sergei, A. O. & Derrick, G. K. 2004. AddIntent: A New Incremental Algorithm for Constructing Concept Lattices. ICFCA 2004: 372-385.
Microsoft, 2007. Introduction to Document Management. Retrieved January 21, 2014, from http://office.microsoft.com/en-001sharepoint-server-help/introduction-to-document-managementHA010241399.aspx.
KDE e.V. 2014. Nepomuk. Retrieved January 26, 2014, from http://userbase.kde.org/Nepomuk
Elsweiler, D., Ruthven,, I., & Jones, C. 2005. Dealing with fragmented recollection of context in information management, Context-Based Information Retrieval (CIR-05) Workshop in CONTEXT-05.
William, J. 2007. Keeping Found Things Found: The Study and Practice of Personal Information Management Retrieved January 26,
Kirchner & Bharti 2009. Mind map your way to an idea. Writer, Vol.122(3), 28.
Windows Search. Retrieved January 26, 2014, from http://windows.microsoft.com/en-US/windows7/products/features/windows-search.
Dumais, S., Cutrell, E., Cadiz J., Jancke G., Sarin. R., and Robbins, D. C. 2003. Stuff I've seen: a system for personal information retrieval and re-use. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (SIGIR '03). ACM, New York, NY, USA, 72-79.
Iturrioz, J., Diaz, O., and Anzuola, S. F. 2008.Toward the Semantic Desktop: The seMouse Approach. IEEE Intelligent Systems, vol. 23,
no. 1, pp. 24-31.
Baeza-Yates, R., & Ribeiro-Neto, B. 1999. Modern information retrieval. New York: ACM Press.
Dey, A. K. 2001. Understanding and Using Context. Personal Ubiquitous Computing, 5 (1):4–7.
Spink, A., and Cole, C. 2005. New Directions in Cognitive Information Retrieval. Springer.
Ganter, B., and Wille, R. 1989. Conceptual Scaling, In: F. Roberts (ed.): Application of Combinatorics and Graph Theory to the Biological and Social Sciences, Springer, 139-167.
Becker, P., & Cole, R. 2003. Querying and analysing document collections with Formal Concept Analysis.
Ganter, B., and Wille, R.1999. Formal Concept Analysis: mathematical foundations. Springer, Heidelberg.
Furnas, G. W., Landauer, T. K., Gomez, L. M., and Dumais, S. T. 1983.Statistical semantics: analysis of the potential performance of key-word information systems, Bell System Technical Journal, 62, 1753-1806.
Godin, R., Missaouri, R., and Alaoui, H. 1991. Learning algorithms using a Galois lattice structure, Proceedings of the Third International Conference on Tools for Artificial Intelligence, San Jose, CA: IEEE Computer Society Press, 22-29. Kim, M., and Compton, P. (2001). Formal Concept Analysis for Domain-Specific Document Retrieval Systems. In Proceedings of the 14th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence (AI '01), Markus Stumptner, Dan Corbett, and Michael J. Brooks (Eds.). Springer-Verlag, London, UK, UK, 237-248.
Kuznetsov, S., & Obiedkov S. 2002. Comparing Performance of Algorithms for Generating Concept Lattices, 14, Journal of Experimental and Theoretical Artificial Intelligence, Taylor & Francis, ISSN 0952–813X print/ISSN 1362–3079 online, pp.189–216.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2014 COMPUSOFT: An International Journal of Advanced Computer Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.
©2023. COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY by COMPUSOFT PUBLICATION is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY. Permissions beyond the scope of this license may be available at Creative Commons Attribution 4.0 International Public License.