Enterprise and Object-Orientation



Data and Information Quality: an Information-theoretic Perspective

Wei Hu and Junkang Feng

Decisions making and efficiencies of business flow heavily depend on the quality of the information systems implemented. The evaluation of data quality (DQ) and information quality (IQ) has been treated as challenging issues in the field of information systems management for the last twenty years. However, we observe the definitions of DQ and IQ in the literature are not necessarily convincing, which seems to have hampered the development of deep and sound understanding of the issue, and of practically applicable and effective measures and techniques for their evaluation. It would appear that this might be caused by the seemingly lack of research on the definitions of data quality and information quality from an information-theoretic perspective. Through our review of relevant information systems literature, we believe that a rigorous and theoretically sound foundation is highly desirable to provide an insight into specifying and distinguishing the terms ‘data quality’ and ‘information quality’.

This paper presents a data-info quality model under an Information Source (S) – Information Bearer (B) – Information Receiver (R) framework based upon theories of semantic information, including Dretske’s semantic theory of information, Devlin’s ‘infon’ theory, Stamper’s Organizational Semiotics, and Floridi’s revised standard definition of information. We present a set of definitions, compare data quality with information quality, and outline the objective and subjective aspects involved in addressing this problem. This model forms a basis for our further research in data and information quality assessment.

1. Introduction

Information systems (IS) play a key role in organizations for decision-making and efficient business flow for years. Issues regarding the evaluation of data quality (DQ) and information quality (IQ) have been noticed and identified increasingly within the field of information systems management in recent years. Numerous research efforts have been made in this area from different disciplines and using different research approaches for the purpose of developing data and information quality concepts and methods (Ballou&Pazer, 85; Burgess et al., 04; Dedeke, 00; English, 99; Eppler, 01; Hill, 04; Lee et al., 02; Liu & Chi, 02; Price & Shanks, 04; Redman, 01; Wand & Wang, 96; Wang & Strong, 96; etc). Hundreds of tools have been produced for evaluating quality in practice since 1996 (English, 99).

Research and practice indicates that data or information quality should be defined accurately and is taken as encompassing multiple dimensions. Many data or information quality frameworks have been presented in the literature. They contain some quality dimensions or categories derived normally based upon some research method in a specific domain with a set of quality metrics, criteria, components, items, or attributes. Eppler (01) gives five future directions for information quality research. The quest for more generic framework and the development of frameworks that show interdependencies between different quality criteria are emphasised. It appears, however, there is still lack of theoretical underpinnings of the exploration of interdependencies or inter-relationships among those quality indicators proposed. It leads to the difficulties for the professional who needs to decide an appropriate framework with a large set of criteria given for a task in hand within an organization. When investigating further, we observe that the terms ‘data quality’ and ‘information quality’ are considered synonyms by many if not all. They are usually interchangeable in relevant quality literature. The very concept of IQ is somewhat nebulous (Ballou et al., 03-04). It makes the discussion of the aforementioned question difficult and ambiguous. It seems to us that the notions of DQ and IQ are yet to be defined adequately in a grounded way.

In this paper, we wish to argue that a set of well-established theories, including Dretske’s semantic theory of information (Dretske, 81), Devlin’s ‘infon’ theory (Devlin, 91), Stamper’s Organizational Semiotics (Stamper, 97) and Floridi’s revised standard definition of information (Floridi, 05), would provide a novel insight into investigating DQ and IQ and shed light to the interdependencies among different quality indicators. The specific aim of this paper is to explore how this problem might be approached from another perspective, namely an information-theoretic perspective, and then further research might be pursued to develop a quality framework thereby to analyze and rearrange existing or derive new quality categories for quality assessment in practice.

This paper is organized as follows. We first review existing studies about DQ and IQ and the limitations of them that we notice particularly in Section 2. In Section 3, we present the basic notions of the theories referenced in this paper, and then introduce an information-centric framework for information systems and information flow. In Section 4, we propose a data-info quality model for understanding ‘data’, ‘data quality’, ‘information quality’ and their inter-relationships. And then we use this model to analyze quality categories from some existing approaches. Finally, in Section 5 we give conclusions and indicate future work.

2. Literature Review

Existing studies have reached a consensus that DQ and IQ is a multi-dimensional concept. Research efforts have been made to derive quality indicators for the development of different quality frameworks. Wang and some other researchers (Lee et al., 02; Wang & Strong, 96; etc], following the methods developed in marketing research for determining the quality characteristics of products, present a framework of information quality (IQ) from information consumer’s perspective. They group all of their IQ dimensions into four IQ categories, Intrinsic IQ, Contextual IQ, Representational IQ, and Accessibility IQ. English gives three reasons to measuring information quality and two definitions of IQ (English, 99). One is its inherent quality, and the other is its pragmatic quality. His approach to quality includes three components, namely data definition quality, data content quality, and data presentation quality. DeLone and McLean’s review of the MIS literature during the 1980’s reports twenty-three IQ measures from nine previous studies (DeLone & McLean, 92; DeLone & McLean, 03). D&M (DeLone and McLean) say: “understandably, most measures of information quality are from the perspective of the user of this information and are thus fairly subjective in character” (DeLone & McLean, 92).

Furthermore, we find that there are different classifications for existing approaches to DQ and IQ in terms of different perspectives. We illustrate them in Table 1.

In addition, Eppler (01) reviews and finds out twenty information quality frameworks appearing in the literature from 1989 to 1999 in sixteen various application contexts. Many approaches to the quality problems in his findings, however, are proposed from a management, manufacturing, or technology perspective. He claims that the majority of frameworks they studied are context-specific rather than generic and widely applicable. He evaluates the frameworks according to two dimensions: analytic and pragmatic criteria respectively.

|Perspective |Classifications |

|Research Approaches |Empirical research |

|(Price & Shanks, 04) | |

| |Practitioner-based approach |

| |Theoretical approach |

| |Literature-based approach |

| |Integrated approach |

|Communities |Academics’ view |

|(Lee et al., 02) | |

| |Practitioners’ view |

|Subject Domains |Software Quality |

|(Burgess et al., 04) | |

| |Data Quality |

| |Information Quality |

| |Web Quality |

Table 1: Classifications of existing approaches to DQ/IQ

Through reviewing the literature, it seems to us that there is a lack of overarching theoretical perspectives or approaches for classifying existing quality frameworks with respect to their quality indicators delivered. Fundamental questions still remain as to how quality should be defined and the specific criteria that should be used to evaluate information quality (Price & Shanks, 04). As mentioned in Section 1, we are arguing that defining and distinguishing DQ and IQ should be addressed as a priority. However, from the work of Price and Shanks (04), the authors indicate that “due to the lack of agreement on the precise definition of information in the literature, we choose to restrict our usage of the term information to informal discussion and avoid its use in formal definitions”. It is difficult to achieve an agreement on the definitions of the terms Data and Information. To this end, we attempt to use an information-theoretical perspective for seeking a solution and providing a fresh insight as it would seem necessary to construct a formal and theoretically sound quality framework under which we derive quality criteria and categories.

Existing studies normally consider data or information as a type of products or output of an information system and use the analogy between data and products to develop measurement models of DQ and IQ (Kahn, 97; Lee et al., 02; Price & Shanks, 04; etc). In the literature, the definitions of data quality and information quality are distinguished depending on whether information is considered to be a product or a service. However, the analogical approach is still limited because data are after all different from products (Liu & Chi, 02).

Theoretical approaches do appear in the literature. Wand and Wang drive quality definitions by anchoring them in ontological foundations and base on the notion that the role of an information system is to provide a representation of an application domain as perceived by the user. For the information system to function properly, both the representation and interpretation transformations, involved in the development and use of an information system, need to be performed flawlessly (Wand & Wang, 96). It results in a set of four intrinsic data quality dimensions: complete, unambiguous, meaningful, and correct. A semiotic information quality framework (Price & Shanks, 04) is presented to define information quality and corresponding quality categories in terms of three semiotic levels, namely syntactic, semantic, and pragmatic, defined by Morris (38) and in terms of definitions for data, information and meaning by Mingers (95). Hill (04) proposes an information-theoretic model based upon Shannon & Weaver’s information theory for the purpose of considering customer information quality in an organization. It provides a quantitative assessment of proposed information quality improvements. However, there seems a lack of knowledge and attempt of using an information-theoretic perspective for investigating both terms of ‘data’ and ‘information’ and DQ and IQ. For example, Wand and Wang (96) derived four DQ attributes, which is only a small sample of the attributes in assessing intrinsic DQ. This might be due to the lack of an understanding of the subjective and objective nature of the domain.

3. An Information-theoretic approach to quality

Through reviewing the literature, we believe that an information-theoretical underpinning for the terms of ‘data’, ‘information’, DQ, and IQ should shed light to the quest of a generic quality model for the purpose of exploring interdependencies or inter-relationships among quality indicators proposed in various quality frameworks. In this paper, we present an overall model for such a purpose that is based upon a set of well-established theories.

Theories of Semantic Information

INFORMATION IS STILL AN ‘EXPLICANDUM’ (FLORIDI, 05) IN ACADEMIC COMMUNITY TODAY. NUMEROUS ATTEMPTS HAVE BEEN MADE TO DEFINE IT, BUT MANY OF THEM ARE ‘MERRY-GO-ROUND’ DEFINITIONS (STAMPER, 97). SHANNON AND WEAVER’S PAPER (49) OVER HALF A CENTURY AGO GIVES A MATHEMATICAL MODEL OF COMMUNICATION, IN WHICH THEY USE PROBABILITY TO DEFINE THE AMOUNT OF INFORMATION THAT IS CAUSED BY ‘REDUCTION IN UNCERTAINTY’. THIS COVERS ONLY THE ENGINEERING ASPECT OF INFORMATION CREATION AND TRANSMISSION. DRETSKE (81) MAKES A PROFOUND PARADIGM SHIFT FROM ENGINEERING ASPECT TO SEMANTIC ASPECT OF INFORMATION. WE TAKE DRETSKE’S ACCOUNT OF THE RELATIONSHIP BETWEEN INFORMATION AND KNOWLEDGE TO BE AN IMPORTANT INSIGHT, WHICH WE INTEND TO USE AS A WAY OF INCORPORATING EPISTEMOLOGICAL CONSIDERATIONS INTO THE THEORY OF INFORMATION.

Following Dretske, information will be taken as created by or associated with a state of affairs among a set of possibilities of a situation, the occurrence or realization of which reduces the uncertainty of the situation. We focus on claims of the form ‘a’s being F carries the information that b is G’. From the point of view of semiotics, which has been used in developing a science for information systems, we say that one signal, a’s being F, carries information about a state of affairs, b is G. Relevant to this, Dretske establishes the following definition of information content:

Let k be prior knowledge about a specific information source, r being F carries the information that s is G if and only if the conditional probability of s being G given that r is F is 1 (and less than 1 given k alone).

Following above definition, we proposed our first basic notion called ‘data bears information’ (Hu & Feng, 02; Xu & Feng, 02) which is now re-illustrated in Figure 1.

Figure 1: Simplification on information level and data level

The main point relevant to this paper that this diagram illustrates is that a representation/signal is considered to represent/carry part of information existing in the real world. When the source of information, namely that part of real world, is changed or simplified, a new representation/signal could be used to replace the old one. For example, in the database area, we could use (entity-relationship) ER schemas to design a conceptual representation for a university (a part of real world). With the modification made on the information requirements of the university’s information systems, the representation used to bear the information source, namely ER schemas in this case, would be rearranged accordingly.

Information can be transmitted. A state of affairs, say r1, is a particular case or an instantiation of a general situation, say r. The reduction in uncertainty at r due to the occurrence of r1 may be accounted for by one or more events, say s1, s2,...,sn, that occur at another general situation, say s. This gives rise to a special kind of relationship - ‘informational relationship’ (Dretske, 81) - between these two general situations r and s. An informational relationship captures certain degree of dependency between a state of affairs r1 of a general situation r and what takes place in another general situation s. This dependency can be demonstrated by the fact that r1’s appearance alters the distribution of probabilities of the various possibilities at s. The dependency is a type of regularity concerning different general situations based upon nomic dependencies (Dretske, 81), logic, or norms, etc. in a social setting. Due to this relationship, information created at s is transmitted to r. We will call s the ‘information source’, and r ‘the bearer of information about s’. Moreover, a state of affairs r1 at r can be seen as a signal that carries information about s. A sign/signal carries information about states of affairs in the world – what it signifies, even though the sign/signal may never be actually observed by anyone. Besides, if it is recorded, r1 becomes a piece of data. Thus data carry information. In general, data in a database system are a collection of recorded signals or events, which bear certain information about the source within a process of information transmission. Information is carried by a sign and is objective and in analogue form.

Therefore we believe that it would be beneficial to look into problems regarding data and information from the perspectives adopted by various semantic information theories, which might help reach the root and reveal the essence of the problems. In order to introduce a theoretically sound foundation for the notions of information and our data-info quality model, we start with the ontological assumption that information is objective. In the beginning there was information. The word came later (Dretske, 81). The existence of information is independent of its interpreters or receivers (agents). We notice that Floridi defines four types of data: primary data; metadata; operational data; derivative data (Floridi, 05). He revises the ‘standard definition of information’ and adds a fourth condition to it. His work will be discussed further in Section 4.

The S-B-R Framework

TO FACILITATE FURTHER STUDIES OF INFORMATION WITHIN THE CONTEXT OF INFORMATION SYSTEMS, THAT IS, TO GAIN INSIGHT AND TO BE ABLE TO EXPLAIN VARIOUS PHENOMENA IN HUMAN COMMUNICATION, INFORMATION CREATION AND TRANSFORMATION, AND THE DEVELOPMENT OF INFORMATION SYSTEMS, AN OVERARCHING FRAMEWORK SEEMS HIGHLY DESIRABLE EVEN NECESSARY. AFOREMENTIONED VARIOUS THEORIES AND SEMIOTICS CAN BE SEEN, AMONG OTHER THINGS, ADDRESS THE ISSUE OF INFORMATION AND INFORMATION FLOW IN DIFFERENT WAYS AND EMPHASIZE DIFFERENT ASPECTS OF IT. WE FIND THAT ALL THESE MAY BE INCORPORATED WITHIN A FRAMEWORK, WHICH WOULD HELP MAKE SENSE OF THEM, AND MAKE GOOD USE OF THEM IN UNDERSTANDING INFORMATION AND INFORMATION FLOW. WE BELIEVE THAT SUCH A FRAMEWORK SHOULD BE FORMULATED FROM THE POINT OF VIEW OF HOW INFORMATION IS CREATED, CARRIED AND FINALLY RECEIVED. THEREFORE WE HAVE CREATED A FRAMEWORK CONSISTING OF INFORMATION SOURCE, INFORMATION BEARER AND INFORMATION RECEIVER, AND THE LINKS BETWEEN THEM. WE CALL SUCH AN ABSTRACT MODEL THE ‘S-B-R FRAMEWORK’ (ILLUSTRATED IN FIGURE 2).

We use a simple example to show how this framework might work. As illustrated in Figure 2, some information is created due to reduction in uncertainty, for example, the tree is 80 years old, rather than it is possible that the tree is 40 years old or 80 years old among many other possibilities at an information source. This information can be carried by an information bearer due to an informational relationship between the source and the bearer, which may be based upon some ‘nomic dependencies’ (Dretske, 81). An information bearer provides an opportunity for an information receiver, for example a human agent, to receive information about the information source. By consulting an information bearer, an information receiver can acquire information (illustrated by dotted line in Figure 2) if the receiver is aware of and attuned to some constraints (Devlin, 91), which formulates the dependency and therefore the informational relationship between the bearer and the source.

Information Source

Information must be created in the first place. Following Dretsk, any situation may be regarded as a source of information as long as reduction in uncertainty takes place. It could be a Universe of Discourse, a particular situation (Devlin, 91), a relation, an event with uncertain outcomes, and so on. For example, the situation ‘choosing one from eight employees to do an unpleasant job’ can be an information source S.

From the point of view of semiotics, an information source S can be seen as the ‘sign object’ (Falkenberg, 98) that conforms to the definition of ‘sign’ given by Charles Sanders Peirce. It is a thing that the sign alludes or refers to.

Information Bearer

Information flow requires, as necessity, some representation of information, which we call the bearer. An Information Bearer can be a traffic light or signal, a physical sign or an IT system. Following Stamper (97), anything, say x, can function as a sign if it can stand for something else, say y, for the people in some community. Here, x is an information bearer for y. With our S-B-R framework, our ontological assumptions are that information may or may not be carried by a bearer; information can be conveyed only through a bearer; and information is independent of whether one receives it or can receive it or not. For example, if a book were written in ancient Chinese, we would consider that it carries certain information no matter whether we can read it or not.

In addition, we maintain that the literal meaning, if any, of a bearer is independent of the information that it bears. It is only accidental that the former is (part) of the latter.

Considering the structure of a sign given by Peirce, we agree that the ‘representamen’, which is a thing serving as the ‘carrier’ of the sign, is independent of its meaning (Falkenberg, 98). For example, an entity in an Entity Relationship data schema might refer to something that has no semantic correspondence with the meaning of the name given to that entity.

Information Receiver

To be able to receive information carried by a bearer, following Devlin, we maintain that an information receiver must be aware of and actually invokes some relevant ‘constraints’ (Devlin, 91) in order to receive information that is borne by a bearer. Different receivers may receive different information from the same bearer. The users of an information system are information receivers. In a system integration environment, an agent or a mediator can be an information receiver, which may process information further.

4. A Data-info quality model

INFORMATION QUALITY IS CRITICAL IN ORGANIZATIONS (BALLOU ET AL., 03-04; DELONE & MCLEAN, 2003). EARLY RESEARCH EFFORTS IN DATA QUALITY AT MIT LED TO THE DEVELOPMENT OF THE TOTAL DATA QUALITY MANAGEMENT (TDQM) CYCLE: DEFINE, MEASURE, ANALYZE, AND IMPROVE (WANG, 04). TU & WANG WORKED ON ER EXTENSIONS AT THE ATTRIBUTE LEVEL VIA MODELING DATA QUALITY OF THE ORIGINAL SCHEMAS (TU & WANG, 93). BRODIE (80) PLACES THE ROLE OF DATA QUALITY WITHIN THE LIFE-CYCLE FRAMEWORK WITH AN EMPHASIS ON DATABASE CONSTRAINTS. WE BELIEVE THAT DATA QUALITY HAS A CLOSE RELATIONSHIP WITH THE TASKS OF INFORMATION SYSTEMS DESIGN AND INFORMATION QUALITY HAS AN INTER-RELATIONSHIP WITH DATA QUALITY OF AN INFORMATION SYSTEM.

In this section we put forward an observation, namely, it might be helpful to go back to the basics of information systems development. A similar perspective has been utilized by Wand and Wang (96). In this paper, we use another perspective, namely, information-theoretic perspective to look at Information Systems from the point of view of information flow from the source of information to the receiver of the information via some information bearer for the purpose of forming a data-info quality model. This idea comes from a seemingly widely accepted opinion that an information system is designed to store data (including multi-media data) and provide information to the information consumers. It is an ‘information-bearing’ media for the purpose of serving business processes and performance within an organization. Furthermore, it appears that there is a lack of a practical, theoretical-grounded information-centric model in the literature thereby to explore and analyze an inevitable phenomenon, namely, information flow, in IS development and IS evaluation, in particular, DQ evaluation and IQ evaluation. The motivation of our work is that we aim to bring some contribution on the theoretical level through our model and address relevant issues mentioned in Section 1.

Definitions

Many definitions of the terms ‘data quality’ and ‘information quality’ have been proposed in the literature. Eppler lists seven definitions of information quality from reviewing existing literature on information quality published from 1989 to 1999 (Eppler, 01). It seems that many of them are defined from a management, manufacturing, or technology perspective. Some definitions for both of terms are ambiguous and overlap. We wish to argue that this might be caused by the lack of a sound theoretical foundation. The S-B-R framework described above might fill in this gap by providing a fresh insight into the problem and help define ‘data’ and ‘information’ for studying ‘data quality’ and ‘information quality’. Drawing on relevant literature regarding data quality and information quality and under the S-B-R framework, we generalize a conceptual model for considering these two terms as illustrated in Figure 3. We call it the ‘data-info quality model’.

In the diagram, S normally contains three parts in the context of Information Systems Development. They are ‘original user requirements’, ‘user expectations’, and ‘organizational needs’. The latter two change due to the dynamic nature of organizational goals, business strategies and performance. In the middle of the diagram, B is an information system that is a carrier or a mediator of information source S. It can be an ERP system, a CRM system, and so on, in the core of which lies a data engine, such as a database or a data warehouse. R, the information receiver, receives information, which is part of S, by accessing and interpreting B.

Following the notion of ‘data bears information’ discussed in Section 3 and the objectives of data quality and information quality evaluation appearing in context (Wang et al., 95) we look at Information Bearer (B) for assessing ‘data quality’. In the other words, the assessment of data quality is a task to define the quality of an information bearer. For assessing ‘information quality’ of an information system, we examine the linkage between Information Source (S) and Information Bearer (B), and the linkage between Information Receiver (R) and Information Bearer (B). In the other words, to assess information quality, we have to take the whole chain from S through B to R into consideration. We examine how well the information bearer represents the information source, and how well the information bearer supports the information receiver. That is to say, we look at how good the bearer is at conveying information to the receiver who would use perception and other cognitive means for this purpose. To enable such assessment, we present the information-theoretic definitions of data, information, data quality and information quality below.

Definition 1. Data is a set of values recorded in an information system, which are collected from the real world, generated from some pre-defined procedures, indicating the nature of stored values, or regarding usage of stored values themselves; or, a model for the purpose of organizing, constraining, representing those values in an information system for its consumers.

We define data here in a broad sense to cover values and structures existing in an information system. Following Floridi, the first type above can be of four types (namely primary data, metadata, operational data, and derivative data) according to their sources and purposes. The second type has a direct impact on the organization of data of the first type in terms of requirements.

Definition 2. Information, carried by non-empty, well-formed, meaningful, and truthful data (Floridi, 05), is a set of states of affairs, which are part of the real world and independent of its receivers.

We define information in an objective way following Dretske and Floridi. Floridi (05) revises standard definition of information with adding a fourth condition that information must be truthful. As explained by Florid, ‘Truthful’ is used here as synonymous for ‘true’, to mean ‘representing or conveying true contents about the referred situation or topic’.

Definition 3. Data Quality is the intrinsic quality of data (a type of information bearer) itself.

This definition reveals the objective characteristics of the task of evaluating the quality of data, such as, representation, precision, and etc. It is in conformity with the discussion of the ‘syntactic quality criteria’ reported by the work of Price and Shanks (04) and the ‘inherent information quality characteristics’ defined by English (99), and the ‘intrinsic’ and ‘contextual’ data quality category proposed by Wang and Strong (96).

Definition 4. Information Quality is the degree to which the information is represented and to which the information can be perceived and accessed.

The term ‘information quality’ is defined from two directions in our data-info quality model. It is not a one-array concept; rather it is the degree of some relevant correspondence between the information source and the information bearer, and between the information bearer and the information receiver respectively. From a semiotic perspective, our work on this level is also in conformity with the ‘semantic quality criteria’ and the ‘pragmatic quality criteria’ reported by Price and Shanks (04), the ‘pragmatic information quality characteristics’ defined by English (99), and the ‘representational’ and ‘accessibility’ data quality categories proposed by Wang and Strong (96).

5. Data Quality vs. Information Quality

According to Floridi (05), nonempty, well-formed and meaningful data may be of poor quality. Data that are incorrect, imprecise or inaccurate are still data and they are often recoverable, but, if they are not truthful, they can only constitute misinformation, which is not information at all. Following Floridi and considering our data-info quality model, we believe that high data quality is a necessary condition for seeking high information quality within an information system. It is not, however, a sufficient condition. For example, a well-organized database using Chinese characters that has recorded accurate and timely stock information does not have high information quality if its users include some non-Chinese speakers even though the system has high data quality. Take another example, a decision-maker is provided a stock report with a set of complete, readable, and well-formatted data. He/she will not obtain any information if data is not true or inaccurate to reflect real situation. Therefore, high information quality should be based upon high data quality, and the data must be appropriately presented and accessible to the information consumer.

Based upon our above thinking and definitions regarding data quality and info quality, we can rearrange existing quality dimensions and criteria in the literature into a new framework, as shown in Table 2 and Table 3 respectively. It is intuitively organized based upon our experience and corresponding description of the selected dimensions from the literature.

| | |Data Quality |Information Quality |

|Price and Shanks (04) |Syntactic quality |Semantic quality, pragmatic quality |

|English (99) |Inherent characteristics |Pragmatic characteristics |

|Wang et al. (96) |Intrinsic, contextual |Representational, accessibility |

|Dedeke (00) |Ergonomic, accessibility |Representation |

Table 2: Some existing quality dimensions rearranged within a data-info quality framework

|Data Quality |Information Quality |

|Accuracy, format, timeliness, precision, amount of |Relevancy, accessibility, usefulness, readability, completeness, consistency, |

|data, etc. |reliability, importance, truthfulness, etc. |

Table 3: Some existing quality criteria rearranged within a data-info quality framework

Interdependencies among quality dimensions and criteria can be further explored and studied from the point of the view of the inter-relationship between data quality and information quality. The quality criteria for the former will clearly have impact on the latter. Distinguishing information quality from data quality will help IS professionals and organizations derive required and appropriate quality criteria for the task in hand. Further analysis and validation on aforementioned issues will be reported in our future publications.

Objectivity vs. Subjectivity

In the relevant literature, the notion of data or information quality depends on the actual use of data. They are normally investigated from the viewpoint of information consumers. From the work of Wand and Wang (96), a design-oriented approach is proposed to define data quality based upon a concept called ‘possible data deficiencies’ in a system context. Ballou and Pazer’s study focuses primarily on intrinsic dimensions that can be measured objectively (Ballou & Pazer, 85; Ballou & Pazer, 95). However, it would appear that the issue of subjectivity versus objectivity that are involved in data and information quality evaluation in information systems are hardly addressed adequately. We believe that to address this issue is important - not only can an insight of the problem be gained, but also it should benefit the selection of research methods for the development of a methodology for assessing data quality and information quality.

Our preliminary thinking about this philosophical issue is that it can be looked at with the ‘S-B-R’ perspective. In Figure 3, we have shown that information quality is concerned with two linkages between S and B, and between B and R separately. The first linkage embodies the objective aspect of the problem following our ontological assumption on information. It depends on design-oriented or system-oriented. Therefore, theoretical techniques (i.e., SQL query design, schema transformation, and etc) and quantitative research methods will contribute to detecting and providing solutions to the problems. The second linkage should be looked at within a social setting, and therefore predominately inter-subjective or subjective (Mingers, 95). For example, different groups of information consumers may have different qualifications and different knowledge background, and therefore may receive different information from accessing the same information bearer. Qualitative research methods may contribute to identifying problems, reaching conclusions and obtaining solutions. Much more work should be carried out along this avenue.

6. Summary and Future Work

In this paper, we have examined some fundamental issues concerning data and information quality evaluation from an information-theoretic perspective that is informed by a set of well-established theories. We have proposed a data-info quality model based upon an information-centric framework to provide a rigorous theoretical foundation for (1) defining and distinguishing the terms of ‘data quality’ and ‘information quality’; (2) discussing the inter-relationships between two terms; (3) studying the subjective and objective characteristics of data quality and information quality. A more generic framework for data and information quality and a set of quality categories and criteria with their interdependencies articulated will be reported in future publications.

This model is being validated through a two-stage survey. First, a series of interviews will be organized with selected organizations, enterprises, and institutions in the UK and China. The goal is to elaborate the model using a qualitative research method and to generate a data-info quality framework. Then, a questionnaire will be used to test in the real world the proposed quality framework and to categorize quality criteria.

References

Ballou, D. P. and H. L. Pazer, “Modeling Data and Process Quality in Multi-input, Multi-output Information Systems”, Management Science, 31(2) 1985, pp. 150-162.

Ballou, D. P. and H. L. Pazer, “Designing Information Systems to Optimize the Accuracy-Timeliness Tradeoff”, Information Systems Research, 6(1) 1995, pp. 51-72.

Ballou, D., Madnick, S., and Wang, R. Y., “Special Section: Assuring Information Quality”, Journal of Management Information Systems, Winter 2003-4, Vol. 20, No. 3, pp. 9-11.

Brodie, M. L., “Data quality information systems, information, and management,” vol. 3, pp. 245-258, 1980.

Burgess, M., Fiddian, N. J., and Gray, W, “Quality measures and the information consumer”, ICIQ 2004

Dedeke, A. “A Conceptual Framework for Developing Quality Measures for Information Systems”, Proceedings of the 2000 Conference on Information Quality (IQ-2000), Cambridge, MA, USA, 2000, pp.126-128.

DeLone, W. H. and McLean, E. R. “Information Systems Success: The Quest for the Dependent Variable”, Information Systems Research, Volume 3, No. 1, March 1992, pp. 60-95.

DeLone, W. H., & McLean, E. R. “The DeLone and McLean model of information systems success: A ten-year update”. Journal of Management Information Systems, 19(4), 2003, pp. 9-30.

Devlin, K. Logic and Information. Cambridge University Press, Cambridge, 1991.

Dretske, F. I. Knowledge and the Flow of Information, Basil Blackwell, Oxford, 1981.

English, L. P., Improving Data Warehouse and Business Information Quality. Wiley & Sons, New York, 1999.

Eppler, M. J., “The concept of information quality: an interdisciplinary evaluation of recent information quality frameworks”, Studies in Communication Sciences, 1 (2001) p.167-182.

Falkenberg, D. E., Hesse, W., Stamper, R., et al. A Framework of Information Systems Concepts – The FRISCO Report (web edition), IFIP, 1998.

Floridi, L., “Is Semantic Information Meaningful Data?”, Philosophy and Phenomenological Research, Vol. LXX, No. 2, March 2005.

Hill, G., “An information-theoretic model of customer information quality”, Proc. IFIP Int’l Conf. on Decision Support Systems, Italy, 2004.

Hu, W. and Feng, J. 2002. “Some considerations for a semantic analysis of conceptual data schemata”, In Systems Theory and Practice in the Knowledge Age, (E. Ragsdell et al.), Kluwer Academic/Plenum Publishers. New York. ISBN 0-306-47247-3.

Kahn, B. K., Strong, D. M. and Wang, R. Y., “A Model for Delivering Quality Information as Product and Service”, in Conference on Information Quality, Cambridge, MA, pp. 80-94, 1997.

Lee, Y. W., Strong, D., Kahn, B., and Wang, R., “AIMQ: a methodology for information quality assessment”, Information and Management, 40(2) pp. 133-146, 2002.

Liu, L. and Chi, L., “Evolutional Data Quality: A Theory-specific View”, ICIQ 2002.

Mingers, J. “Information and meaning: foundations for an intersugjective account”, Information Systems Journal, 1995; 5:285 – 306

Morris, C., “Foundations of the Theory of Signs”, in International Encyclopedia of Unified Science, vol.1, University of Chicago Press, London, 1938.

Price, R.J., Shanks, G.A., “A semiotic information quality framework”, In R. Meredith, G. Shanks, D. Arnott and S. Carlsson (eds.) Proceedings of the 2004 IFIP International Conference on Decision Support Systems (DSS2004): Decision Support in an Uncertain and Complex World, Prato, Italy, 1-3 July: 658-672.

Redman, T., Data Quality: The Field Guide, New Jersey: Digital Press, 2001.

Shannon, C. E. and Weaver, W. The mathematical theory of communication. Urbana: University of Illinois, 1949.

Stamper, R. “Organisational Semiotics”, In Information Systems: An Emerging Discipline?, Mingers, J and Stowell, F. ed. The McGraw-Hill Companies, London, 1997.

Tu, S.Y. and Wang, R. Y., “Modeling Data Quality and Context Through Extension of the ER Model”, Massachusetts Institute of Technology (MIT) Sloan School of Management, Cambridge, MA, TDQM-93-13, 1993.

Wand, Y. and Wang, R. Y., “Anchoring Data Quality Dimensions in Ontological Foundations”, Communications of the ACM, 39(11): 86-95, 1996

Wang, R.Y., Kon, H.B., and Madnick, S.E., “Data quality requirements analysis and modeling”, Proc. Ninth Int’l Conf. on Data Engineering, pp. 670-677, Vienna, 1993.

Wang, R. Y., Storey, V. C., and Firth, C. P., 1995, “A Framework for Analysis of Data Quality Research”, IEEE Transactions on Knowledge and Data Engineering, Vol. 7, No. 4, 1995.

Wang, R.Y. and Strong. D.M. (1996) “Beyond Accuracy: What Data Quality Means to Data Consumers”, Journal of Management Information Systems, 12(4): 5-34.

Wang, R. Y., “Data Quality: Theory in Practice”, EPA 23rd Annual Conference, April 2004. 

Xu, H. and Feng, J., “Towards a Definition of the ‘Information Bearing Capability’ of a Conceptual Data Schema”, In Systems Theory and Practice in the Knowledge Age, (E. Ragsdell et al.), Kluwer Academic/Plenum Publishers. New York. ISBN 0-306-47247-3, 2002.

-----------------------

IQ

DQ

Access

Represent

Figure 3: A Data-info Quality Model

Machine

Information Consumer

R

Database/ Data Warehouse (inc. data value, structure, constraints, etc)

B

User expectations

New organizational needs

Original user requirement

S

receives

information, i.e., ‘Tree is 80 years old’, …

bears



Animals that live in the vicinity

Type of the tree

Age of the tree when it was felled

Tree stump

Human being

sees

access/interpret B for receiving information about S

provide an opportunity for R to receive information about S

carry information

provide information

Figure 2 S-B-R Framework

R

B

S

Information Receiver

Information Bearer

Information Source

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download