Author List: Fan, Weiguo; Gordon, Michael D.; Pathak, Praveen;
Journal of Management Information Systems, 2005, Volume 21, Issue 4, Page 37-56.
Web search engines have become an integral part of the daily life of a knowledge worker, who depends on these search engines to retrieve relevant information from the Web or from the company's vast document databases. Current search engines are very fast in terms of their response time to a user query. But their usefulness to the user in terms of retrieval performance leaves a lot to be desired. Typically, the user has to sift through a lot of nonrelevant documents to get only a few relevant ones for the user's information needs. Ranking functions play a very important role in the search engine retrieval performance. In this paper, we describe a methodology using genetic programming to discover new ranking functions for the Web-based information-seeking task. We exploit the content as well as structural information in the Web documents in the discovery process. The discovery process is carried out for both the ad hoc task and the routing task in retrieval. For either of the retrieval tasks, the retrieval performance of these newly discovered ranking functions has been found to be superior to the performance obtained by well-known ranking strategies in the information retrieval literature.
Keywords: business intelligence; genetic programming; information retrieval; machine learning; ranking function; search engine; text mining; Web mining
Algorithm:

List of Topics

#217 0.384 search information display engine results engines displays retrieval effectiveness relevant process ranking depth searching economics create functions incorporate low terms
#37 0.156 intelligence business discovery framework text knowledge new existing visualization based analyzing mining genetic algorithms related techniques large proposed novel artificial
#295 0.102 task fit tasks performance cognitive theory using support type comprehension tools tool effects effect matching types theories modification working time
#299 0.063 office document documents retrieval automation word concept clustering text based automated created individual functions major approach operations prototype identify report
#33 0.053 web site sites content usability page status pages metrics browsing design use web-based guidelines results implications portal loyalty navigability addition
#90 0.053 development life cycle prototyping new stages routines stage design experiences traditional time sdlc suggested strategies rapid effort integrated needs techniques