A Probabilistic Approach to String Transformation

V. Vinothh; R. Thaneshwaran; K. G. S. Venkatesan

Authors

Vinothh V UG Student, Department of CSE, Bharath University, Chennai
Thaneshwaran R UG Student, Department of CSE, Bharath University, Chennai
Venkatesan KGS Asst.Professor, Department of CSE, Bharath University, Chennai

Keywords:

String Transformation, Log linear model, ALGORITHM

Abstract

The string model has been applied to a wide range of problems, including spelling correction. These models consist of two components: a source model and a channel model. Very little research has gone into improving the channel model for spelling correction. We Describes a new channel model for spelling correction, based on generic string to string edits. Using this model gives significant performance improvements compared to previously proposed models. We propose a novel and probabilistic approach to string transformation, which is both accurate and efficient. In this approach includes the use of a log linear model, a method for training the model, and an algorithm for generating the top k candidates, whether there is or is not a predefined dictionary. Log linear model is defined as a conditional probability distribution of an output string and a rule set for the transformation conditioned on an input string. The string generation algorithm based on pruning is guaranteed to generate the optimal top k candidates. The proposed method is applied to correction of spelling errors in queries as well as reformulation of queries in web search. Experimental results on large scale data show that the proposed approach is very accurate and efficient improving upon existing methods in terms of accuracy and efficiency in different settings.

References

.N. Okazaki, Y. Tsuruoka, S. Ananiadou, and J. Tsujii, “A discriminative candidate generator for string transformations,” in Proc. Conf.

Empirical Methods Natural Language Processing, Morristown, NJ, USA, 2008, pp. 447–456.

M. Dreyer, J. R. Smith, and J. Eisner, “Latentvariable modeling of string transductions with finite-state methods,” in Proc. Conf. Empirical Methods Natural Language Processing, Stroudsburg, PA, USA, 2008, pp. 1080–1089.

A. Arasu, S. Chaudhuri, and R. Kaushik, “Learning string transformations from examples,” Proc. VLDB Endow., vol. 2, no. 1, pp. 514–525, Aug. 2009.

S. Tejada, C. A. Knoblock, and S. Minton, “Learning domainindependent string transformation weights for high accuracy object identification,” in Proc. 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, New York, NY, USA, 2002, pp. 350–359.

M. Hadjieleftheriou and C. Li, “Efficient approximate search on string collections,” Proc. VLDB Endow., vol. 2, no. 2, pp. 1660–1661,

Aug. 2009.

C. Li, B. Wang, and X. Yang, “VGRAM: Improving performance of approximate queries on string collections using variable-length grams,” in Proc. 33rd Int. Conf. Very Large Data Bases, Vienna, Austria, 2007, pp. 303–314.

X. Yang, B. Wang, and C. Li, “Cost-based variable-length-gram selection for string collections to support approximate queries efficiently,” in Proc. 2008 ACM SIGMOD Int. Conf. Management Data, New York, NY, USA, pp. 353–364.

R. Vernica and C. Li, “Efficient top-k algorithms for fuzzy search in string collections,” in Proc. 1st Int. Workshop Keyword Search Structured Data, New York, NY, USA, 2009, pp. 9–14.

Z. Yang, J. Yu, and M. Kitsuregawa, “Fast algorithms for topk approximate string matching,” in Proc. 24th AAAI Conference Artificial Intelligence, 2010, pp. 1467–1473.

E. S. Ristad and P. N. Yianilos, “Learning string-edit distance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 5, pp. 522–532, May

C. Li, J. Lu, and Y. Lu, “Efficient merging and filtering algorithms for approximate string searches,” in Proc. 2008 IEEE 24th Int. Conf. Data Engineering, Washington, DC, USA, pp. 257–266.

S. Ji, G. Li, C. Li, and J. Feng, “Efficient interactive fuzzy keyword search,” in Proc. 18th Int. Conf. World Wide Web, New York, NY, USA, 2009, pp. 371–380.

C. Whitelaw, B. Hutchinson, G. Y. Chung, and G. Ellis, “Using the web for language independent spellchecking and autocorrection,”

in Proc. 2009 Conf. Empirical Methods in Natural Language Processing, Morristown, NJ, USA, pp. 890–899.

A Probabilistic Approach to String Transformation

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Download

Indexing

Information