Author List: Menon, Syam; Sarkar, Sumit; Mukherjee, Shibnath;
Information Systems Research, 2005, Volume 16, Issue 3, Page 256-270.
The sharing of databases either within or across organizations raises the possibility of unintentionally revealing sensitive relationships contained in them. Recent advances in data-mining technology have increased the chances of such disclosure. Consequently, firms that share their databases might choose to hide these sensitive relationships prior to sharing. Ideally, the approach used to hide relationships should be impervious to as many data-mining techniques as possible, while minimizing the resulting distortion to the database. This paper focuses on frequent item sets, the identification of which forms a critical initial step in a variety of data-mining tasks. It presents an optimal approach for hiding sensitive item sets, while keeping the number of modified transactions to a minimum. The approach is particularly attractive as it easily handles databases with millions of transactions. Results from extensive tests conducted on publicly available real data and data generated using IBM's synthetic data generator indicate that the approach presented is very effective, optimally solving problems involving millions of transactions in a few seconds.
Keywords: data quality; item set mining; privacy
Algorithm:

List of Topics

#97 0.181 set approach algorithm optimal used develop results use simulation experiments algorithms demonstrate proposed optimization present analytical distribution selection number existing
#215 0.142 data classification statistical regression mining models neural methods using analysis techniques performance predictive networks accuracy method variables prediction problem measure
#236 0.136 form items item sensitive forms variety rates contexts fast coefficients meaning higher robust scores hardware providing compared single complete subgroups
#281 0.119 database language query databases natural data queries relational processing paper using request views access use matching automated semantic based languages
#225 0.114 information environment provide analysis paper overall better relationships outcomes increasingly useful valuable available increasing greater regarding levels decisions viewed relative
#86 0.069 methods information systems approach using method requirements used use developed effective develop determining research determine assessment useful series critical existing
#40 0.053 increased increase number response emergency monitoring warning study reduce messages using reduced decreased reduction decrease act sessions cost good key