Please use this identifier to cite or link to this item: https://repository.iimb.ac.in/handle/2074/17953
Title: Hidden Markov models in search applications
Authors: Harsh Vardhan 
Gupta, Manish 
Keywords: Search engines;Software applications;Hidden markov models
Issue Date: 2013
Publisher: Indian Institute of Management Bangalore
Series/Report no.: PGP_CCS_P13_095
Abstract: Context & Problem Statement: With the proliferation of web-based data, Search Engines have become the backbone of most, if not all, software applications. As search applications allow users to enter queries in natural language, they leave room for spelling, phonetic and phrasal errors. These errors, in turn, build the case for Query Modification, i.e. the subjective interpretation of a search query to suggest & use modified query versions. This paper tackles the problem of processing natural language queries and builds a Query Modification Algorithm. Hidden Markov Models in Search: There can and is a difference between the letters typed by a user and the letters he actually intended to type as a search query. The intended letters therefore resemble the hidden states of a Markov Model. In query modification, an HMM is critical to conjecture the most probable intended search string, given that a user has already typed a query. This paper relies heavily on HMM algorithms as the backbone of its proposal. Dynamic Autosuggest via Spell-Check & Auto-Complete: This paper breaks down the query modification problem into two phases. In the first phase, which lasts while the user is entering the search query, the search engine has to propose automatic suggestions that provide corrections as well as completions. This paper tackles the problem of spell-check by developing an HMM based algorithm that gives the top N most probable corrections. In order to derive completion suggestions, it picks & builds on the Backward HMM algorithm. Conditional Probability Based Results Aggregation & Ranking: For aggregating the results of spell-check and auto-complete results - the key to the relevance of suggestions to the user - this paper explores multiple merge options and develops an innovative conditional probability based model that performs aggregation as well as suggestion ranking. Suggestion Refinement Through Tagging: The ranked suggestions are further moderated by a one-of-its-kind tagging system to refine the conditional probabilities used for ranking. A tag is a word or phrase that describes a certain characteristic of the search query. The paper presents the concepts of Affinity between a query and a tag and the Degree of Association between a query and a suggestion and proposes a vector product scoring mechanism. This score is then used to modify the conditional probabilities and arrive at refined suggestions. The Algorithm & Dry Run: The paper presents an elaborate pseudo-code of the proposed algorithm and then conducts a dry run of the same on a set of mock-data picked & built from common e-commerce queries and suggestions. The dry run indicates the flawless execution of the algorithm that is not only able to tackle the problems of spelling & completion effortlessly, but also able to rank the suggestions accurately.
URI: https://repository.iimb.ac.in/handle/2074/17953
Appears in Collections:2013

Files in This Item:
File SizeFormat 
PGP_CCS_P13_095_E38792_QMIS.pdf1.78 MBAdobe PDFView/Open    Request a copy
Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.