Author List: Arazy, Ofer; Woo, Carson;
MIS Quarterly, 2007, Volume 31, Issue 3, Page 525-546.
Although the management of information assets--specifically, of text documents that make up 80 percent of these assets--an provide organizations with a competitive advantage, the ability of information retrieval (IR) systems to deliver relevant information to users is severely hampered by the difficulty of disambiguating natural language. The word ambiguity problem is addressed with moderate success in restricted settings, but continues to be the main challenge for general settings, characterized by large, heterogeneous document collections. In this paper, we provide preliminary evidence for the usefulness of statistical natural language processing (NLP) techniques, and specifically of collocation indexing, for IR in general settings. We investigate the effect of three key parameters on collocation indexing performance: directionality, distance, and weighting. We build on previous work in IR to (1) advance our knowledge of key design elements for collocation indexing, (2) demonstrate gains in retrieval precision from the use of statistical NLP for general-settings IR, and, finally, (3) provide practitioners with a useful cost-benefit analysis of the methods under investigation.
Keywords: Document management; information retrieval (IR); word ambiguity; natural language processing (NLP); collocations; distance; directionality; weighting; general settings
Algorithm:

List of Topics

#220 0.145 research study different context findings types prior results focused studies empirical examine work previous little knowledge sources implications specifically provide
#299 0.123 office document documents retrieval automation word concept clustering text based automated created individual functions major approach operations prototype identify report
#281 0.118 database language query databases natural data queries relational processing paper using request views access use matching automated semantic based languages
#61 0.113 reuse results anchoring potential strategy assets leading reusability incentives impact bias situations effect similarity existing extraction reusable improvement necessary enhancing
#127 0.090 systems information research theory implications practice discussed findings field paper practitioners role general important key grounded researchers domain new identified
#82 0.079 case study studies paper use research analysis interpretive identify qualitative approach understanding critical development managerial elements exploring points positivist presents
#215 0.079 data classification statistical regression mining models neural methods using analysis techniques performance predictive networks accuracy method variables prediction problem measure
#36 0.058 competitive advantage strategic systems information sustainable sustainability dynamic opportunities capabilities environments environmental turbulence turbulent dynamics key quest create sustained ability