Online Submission!

Open Journal Systems


Swathi B.P, Anju R


Source code retrieval is a branch of text retrieval which helps developer find a piece of code from the code base. The developer can obtain the required code from the code base by issuing a query on the source code base. Generally, a developer who has been working on the code base since a long time will know how to formulate his/her query in order to get a good search result. A developer who is novice to the code base will not know what terms he/she has to include in query to obtain a good search result. In fact, a system should allow developer to issue natural language queries. This arises a need for query reformulation to optimize the developer query when the query does not contain terms from code base. This work has conducted extensive study on areas where natural language queries are applied and the various reformulation techniques.  In this work, semantic query reformulation technique is applied on the natural language queries on the source code base. Our discussion and results prove how semantically right word and a word which is in context of the source code can be obtained which acts as a replacement for a query term which is not present in the source code base.


Natural Language Processing, Query reformulation, Similarity score, Query Expansion, Natural Language Queries

Full Text:



S. Haiduc, G. Bavota, R. Oliveto, A. D Lucia, and Marcus, Automatic query performance assessment during the retrieval of software artifacts, Proceedings of the 27th IEEE/ACM international conference on Automated Software Engineering, pp. 90-99, 2012.

Massai, Lorenzo and Nesi, Paolo and Pantaleo, Gianni (2019), PAVAL: A location-aware virtual personal assistant for retrieving geolocated points of interest and location-based services, Engineering Applications of Artificial Intelligence, Vol. 77, Elsevier, pp 70-85.

Xin Hu and Yingting Yao and Luting Ye and Depeng Dang (2017), Natural Language Aggregate Query over {RDF}, Information Sciences, abs/1710.07891.

Rencis, Edgars(2018), Towards a Natural Language-based Interface for Querying Hospital Data, Proceedings of 2018 International Conference on Big Data Technologies ICBDT '18, China, pp 25-28

Kaufmann, Esther & Bernstein, Abraham. (2010), Evaluating the Usability of Natural Language Query Languages and Interfaces to Semantic Web Knowledge Bases, SSRN Electronic Journal.

Salaiwarakul, A. (2018), Thai natural language based cultural tourism ontology. ICIC Express Letters. 12. 159-165.

J. Lin et al.(2017), TiQi: A natural language interface for querying software project data, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), Urbana, IL, pp. 973-977.

Hasan M. Jamil(2017), Knowledge Rich Natural Language Queries over Structured Biological Databases, Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pp 352-361.

Ka, Amshakala and Nedunchezhian. (2011). WordNet Ontolog Based Query Reformulation and Optimization using Disjunctive Clause Elimination. International Journal of Database Management Systems. 3. 55-63.

Washio, Takashi, and Luo, Jun (2013), Applying NLP Techniques for Query Reformulation to Information Retrieval with Geographical References, Emerging Trends in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, pp 57-69.

Jeff Huang and Efthimis N. Efthimiadis (2009), Analyzing and evaluating query reformulation strategies in web search logs, Proceedings of the 18th ACM conference on Information and knowledge management (CIKM '09), ACM, 77-86.

Ángel F. Zazo, Carlos G. Figuerola, José L. Alonso Berrocal, and Emilio Rodrıguez (2005), Reformulation of queries using similarity thesauri, Information Processing and Management: an International Journal, Vol. 41, pp. 1163-1173.

C. Lioma and I. Ounis. (2008), A syntactically-based query reformulation technique for information retrieval, Information Processing and Management: an International Journal, Vol. 44, pp. 143-162.

Audeh B., Beaune P., Beigbeder M. (2017) SMERA: Semantic Mixed Approach for Web Query Expansion and Reformulation, Advances in Knowledge Discovery and Management, Studies in Computational Intelligence, Vol 665. Springer, Cham.

Kun Lu, Soohyung Joo, Taehun Lee, and Rong Hu. (2017), Factors that influence query reformulations and search performance in health information retrieval: A multilevel modeling approach, Journal of the Association for Information Science and Technology, Vol. 68, pp 1886-1898.

Tamas, I., & Salomie, I. (2016), Artemis-an extensible natural language framework for data querying and manipulation, Intelligent Computer Communication and Processing (ICCP), 2016 IEEE 12th International Conference, pp. 85-91.

Castillo-Ortega, R & Marín, Nicolás & Sánchez, Daniel & Molina, Carlos (2013), Flexible Querying with Linguistic F-Cube Factory, pp 245-256.

Mills, C., Bavota, G., Haiduc, S., Oliveto, R., Marcus, A. and Lucia, A.D., (2017), Predicting query quality for applications of text retrieval to software engineering tasks, ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 26, pp. 3.

Renuka Sindhgatta (2006), Using an information retrieval system to retrieve source code samples, Proceedings of the 28th international conference on Software engineering (ICSE '06), pp 905-908.

Ellen M. Voorhees and Donna K. Harman (2005), TREC: Experiment and Evaluation in Information Retrieval, Digital Libraries and Electronic Publishing, The MIT Press.



  • There are currently no refbacks.