Author List: Hosanagar, Kartik;
Information Systems Research, 2011, Volume 22, Issue 4, Page 739-755.
Information specialists in enterprises regularly use distributed information retrieval (DIR) systems that query a large number of information retrieval (IR) systems, merge the retrieved results, and display them to users. There can be considerable heterogeneity in the quality of results returned by different IR servers. Further, because different servers handle collections of different sizes and have different processing and bandwidth capacities, there can be considerable heterogeneity in their response times. The broker in the DIR system has to decide which servers to query, how long to wait for responses, and which retrieved results to display based on the benefits and costs imposed on users. The benefit of querying more servers and waiting longer is the ability to retrieve more documents. The costs may be in the form of access fees charged by IR servers or user's cost associated with waiting for the servers to respond. We formulate the broker's decision problem as a stochastic mixed-integer program and present analytical solutions for the problem. Using data gathered from FedStats-a system that queries IR engines of several U.S. federal agencies-we demonstrate that the technique can significantly increase the utility from DIR systems. Finally, simulations suggest that the technique can be applied to solve the broker's decision problem under more complex decision environments.
Keywords: distributed information retrieval (IR); optimal operational decisions; personalization; query termination; source selection; stochastic modeling; utility theory
Algorithm:

List of Topics

#281 0.141 database language query databases natural data queries relational processing paper using request views access use matching automated semantic based languages
#217 0.136 search information display engine results engines displays retrieval effectiveness relevant process ranking depth searching economics create functions incorporate low terms
#44 0.107 approach analysis application approaches new used paper methodology simulation traditional techniques systems process based using proposed method present provides various
#8 0.095 decision making decisions decision-making makers use quality improve performance managers process better results time managerial task significantly help indicate maker
#31 0.079 problem problems solution solving problem-solving solutions reasoning heuristic theorizing rules solve general generating complex example formulation heuristics effective given finding
#219 0.079 response responses different survey questions results research activities respond benefits certain leads two-stage interactions study address respondents question directly categories
#226 0.071 models linear heterogeneity path nonlinear forecasting unobserved alternative modeling methods different dependence paths efficient distribution probabilities demonstrate observed heterogeneous probability
#151 0.068 costs cost switching reduce transaction increase benefits time economic production transactions savings reduction impact services reduced affect expected optimal associated
#278 0.065 website users websites technostress stress time online wait delay aesthetics user model image elements longer waiting appeal attract utility internet
#71 0.051 distributed agents agent intelligent environments environment smart computational environmental scheduling human rule using does embodied provide trends computer-aided heterogeneous inventory