Author List: Parssian, Amir; Sarkar, Sumit; Jacob, Varghese S.;
Information Systems Research, 2009, Volume 20, Issue 1, Page 99-120.
Information derived from relational databases is routinely used for decision making. However, little thought is usually given to the quality of the source data, its impact on the quality of the derived information, and how this in turn affects decisions. To assess quality, one needs a framework that defines relevant metrics that constitute the quality profile of a relation, and provides mechanisms for their evaluation. We build on a quality framework proposed in prior work, and develop quality profiles for the result of the primitive relational operations Difference and Union. These operations have nuances that make both the classification of the resulting records as well as the estimation of the different classes quite difficult to address, and very different from that for other operations. We first determine how tuples appearing in the results of these operations should be classified as accurate, inaccurate or mismember, and when tuples that should appear do not (called incomplete) in the result. Although estimating the cardinalities of these subsets directly is difficult, we resolve this by decomposing the problem into a sequence of drawing processes, each of which follows a hyper-geometric distribution. Finally, we discuss how decisions would be influenced based on the resulting quality profiles.
Keywords: database marketing; hyper-geometric distributions; information quality framework; probability calculus; relational data model
Algorithm:

List of Topics

#115 0.279 quality different servqual service high-quality difference used quantity importance use measure framework impact assurance better include means van dimensions assessing
#8 0.136 decision making decisions decision-making makers use quality improve performance managers process better results time managerial task significantly help indicate maker
#167 0.129 workflow tools set paper management specification command support formal implemented scenarios associated sequence large derived taxonomies called given systematic specifications
#281 0.112 database language query databases natural data queries relational processing paper using request views access use matching automated semantic based languages
#6 0.111 data used develop multiple approaches collection based research classes aspect single literature profiles means crowd collected trend accuracy databases accurate
#170 0.054 information processing needs based lead make exchange situation examined ownership analytical improved situations changes informational examine developed receive perceptions facilitates