Impact of transformed features in automated survey coding
Keywords:
Open-ended Survey Coding, Features, ClassifiersAbstract
Survey coding is a process of transforming respondents' responses or description into a code in the process of data analysis. This is an expensive task and this is the reason for social scientists or other professionals in charge of designing and administering surveys tend to avoid the inclusion of many open-ended questions in their surveys. They tend to rely more on the less expensive multiple-choice questions, which by definition do not require a coding phase. However multiple-choice questions strictly limit the respondents’ possible answers. This study aims at automating the survey coding process using transformed features. Five intelligent coders were developed using k Nearest Neighbor algorithm, Support Vector Machine with linear function, Support Vector Machine with RBF function and Support Vector Machine with polynomial function. Different response features were applied to improve the coding performance. Techniques that were applied to origin response features include: Relative Frequency, Power Transformation, Relative Frequency Power Transformation and Term Frequency Weighted by Inverse Document Frequency. Furthermore the study proposed new features including: Normalized Relative Frequency, Normalized Relative Frequency with Power Transformation and Normalized Relative Frequency with Term Frequency Weighted by Inverse Document Frequency. The micro-averaged F-measure was used to evaluate the performance of each automated coder. Among all machine learning techniques used Support Vector Machine polynomial was the best when implemented with transformed features.
References
S. Presser and H. Schuman, “The open and closed question,”American Sociological Review, no. 44, 5, pp. 692–712, 1979.
U. Reja, K. L. Manfreda, V. Hlebec, and V. V., “Open-ended vs close-ended questions in web questionnaires,” Developments in Applied Statistics, pp. 159–177, 2003.
F. Sebastian, A. Esuli, and T. Fangi, “Machines that learn how to Code Open Ended Survey Data,” University of Consiglio Nazionale delle Ricerche, Pisa, Italy, 2009.
R. Andersson and L. Lyberg, “Automated coding at statistics Sweden,” Proceedings of the Survey Research Methods Section, no. American Statistical Association, pp. 41–50, 1983.
D. Giorgetti and F. Sebastiani, “Automating Survey Coding by Multiclass Text Categorization Techniques,” University of Consiglio Nazionale delle Ricerche Pisa, Italy, 2003.
L. S. P. Busagala, W. Ohyama, T. Wakabayashi, and F. Kimura, “Machine Learning with Transformed features in Automatic Text Classification,” in Proceedings of ECML/PKDD-05 workshop on Sub-symbolic Paradigms for Learning in Structured Domains (Relational Machine Learning), 2005, pp. 11–20.
D. Giorgetti, I. Prodanof, and F. Sebastian, “Automated Coding of Open-ended Surveys: Technical and Ethical Issues,” Instituto di Linguistice Computazionale, CNR, Pisa Italy, 2004.
M. Roessingh and J. Bethlehem, “Trigram coding in the family expenditure survey in statistics,” Netherlands Central Bureau of Statistics, 1983.
A. Malero and L. S. . Busagala, “The Impact of Transformed features in Automatic Spam Filtering,” Journal of Informatics and Virtual Education, Tanzania, vol. 1, pp. 55–58, 2011.
D. . Manning, P. Raghavan, and H. Schütze, “Introduction to Information Retrieval,” Cambridge University Press, 2008.
F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, vol. 34, no. 1, pp. 1–47, 2002.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2014 COMPUSOFT: An International Journal of Advanced Computer Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.
©2023. COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY by COMPUSOFT PUBLICATION is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at COMPUSOFT: AN INTERNATIONAL OF ADVANCED COMPUTER TECHNOLOGY. Permissions beyond the scope of this license may be available at Creative Commons Attribution 4.0 International Public License.