Author List: Wei, Chih-Ping; Chiang, Roger H.L.; Wu, Chia-Chen;
Journal of Management Information Systems, 2006, Volume 23, Issue 2, Page 173-201.
As electronic commerce and knowledge economy environments proliferate, both individuals and organizations increasingly generate and consume large amounts of online information, typically available as textual documents. To manage this ever-increasing volume of documents, individuals and organizations frequently organize their documents into categories that facilitate document management and subsequent access and browsing. Document clustering is an intentional act that should reflect individual preferences with regard to the semantic coherency and relevant categorization of documents. Hence, effective document clustering must consider individual preferences and needs to support personalization in document categorization. In this paper, we present an automatic document-clustering approach that incorporates an individual's partial clustering as preferential information. Combining two document representation methods, feature refinement and feature weighting, with two clustering methods, precluster-based hierarchical agglomerative clustering (HAC) and atomic-based HAC, we establish four personalized document-clustering techniques. Using a traditional content-based document-clustering technique as a performance benchmark, we find that the proposed personalized document-clustering techniques improve clustering effectiveness, as measured by cluster precision and cluster recall.
Keywords: hierarchical agglomerative clustering (HAC); personalized document clustering; supervised document clustering
Algorithm:

List of Topics

#299 0.351 office document documents retrieval automation word concept clustering text based automated created individual functions major approach operations prototype identify report
#13 0.101 personalization content personalized willingness web pay online likelihood information consumers cues customers consumer services elaboration preference experiment framing customized timing
#60 0.087 analysis techniques structured categories protocol used evolution support methods protocols verbal improve object-oriented difficulties analyses category benchmark comparison provided recognition
#26 0.083 business large organizations using work changing rapidly make today's available designed need increasingly recent manage years activity important allow achieve
#216 0.073 conceptual model modeling object-oriented domain models entities representation understanding diagrams schema semantic attributes represented representing object relationships concepts classes entity-relationship
#37 0.051 intelligence business discovery framework text knowledge new existing visualization based analyzing mining genetic algorithms related techniques large proposed novel artificial