Knowledge Discovery in Databases and Information Retrieval



Knowledge Discovery in Databases and Information Retrieval

In

Knowledge Management Systems

Anne Marie Donovan

April 22, 2003

Knowledge Management Systems, LIS 385T

The University of Texas at Austin

School of Information

Introduction

The processes of Knowledge Discovery in Databases (KDD) and Information Retrieval (IR) appear deceptively simple when viewed from the perspective of terminological definition. Fayyad, Piatetsky-Shapiro, and Smith (1996) define KDD as "the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data" (p. 30). The closely related process of IR is defined by Rocha (2001) as "the methods and processes for searching relevant information out of information systems that contain extremely large numbers of documents" (1.1). In execution, however, these processes are not simple at all, especially when executed to satisfy specific personal or organizational Knowledge Management (KM) requirements or as the core functionality of Knowledge Management Systems (KMS).

The potential validity or usefulness of an individual data element or pattern of data elements may change dramatically from individual to individual, organization to organization, or task to task. Relevance is a highly contextual and personal data characteristic, changing even as the IR process is underway and information requirements are incrementally met. Making retrieved data or a description of data patterns generally understandable is also highly problematic. Data that may appear relevant and easily understandable in one retrieval context may be completely unintelligible in another, even to the same audience. KDD and IR are, in fact, highly complex processes that are strongly affected by a wide range of factors. These factors include the needs and information seeking characteristics of system users as well as the tools and methods used to search and retrieve, the structure and size of the data set or database, and the nature of the data itself.

KDD and IR: An Historical Perspective

Origins

Information professionals often describe the KDD and IR processes in the context of specific types of Database Management Systems (DBMS). Devarakonda (2001) divides DBMS into four types: simple data without query, simple data with query, complex data without query, and complex data with query. An example of the first type, simple data without query, is a filing system, including files that may exist only in paper form. The second, third and fourth types are exemplified by Relational DBMS (RDBMS), Object-Oriented DBMS (OODBMS), and Object-Relational DBMS (ORDBMS), respectively (Devarakonda, 2001, ORDBMS). The type of database that is queried significantly affects the processes of knowledge discovery (KD) and IR.

Because an RDBMS of some type forms the core of almost all KMS, improvement of RDBMS functionality for KD and IR has been a crucial part of KMS refinement for the past three decades. The relatively recent introduction of OODBMS to KMS has created many new KD and IR problem sets for researchers. These challenges have been met, thus far, primarily through the introduction of certain features of RDBMS to OODBMS. The result has been the development of a small group of ORDBMS that combine the best KD and IR features of RDBMS and OODBMS (ORDBMS).

Information professionals familiar with traditional filing systems are acutely aware of the limitations imposed on KD and IR by their pre-set filing structure. Although technically a database, this type of DBMS does not lend itself to automated searching, but only to browsing or search by pre-designated subject categories and file descriptions (e.g., library card catalogs). The difficulties presented for KD and IR by simple filing structures were initially replicated in computer-supported file structures and were only alleviated with the introduction of the Relational Database Model (RDM), by E. F. Codd in 1970 (Devarakonda, 2002, RDBMS).

Introduction of the RDM resulted in rapid adoption of RDMS for information organization and control across a broad range of commercial and social organizations as well as the development of increasingly effective data collection and storage technologies. RDBMS permitted much more flexibility in data organization and retrieval than traditional data filing systems, but traditional IR methods did not permit flexibility in the characterization of user needs or the delineation of search parameters (Rocha, 2001, 1.2). The result, of course, was increasing numbers of organizations that possessed very large and continually growing databases but only rudimentary tools for KD and IR. Two areas of research focus in information management developed in response to this problem: data warehousing and data mining.

Data warehousing, defined by Fayyad et al. as "collecting and 'cleaning' transactional data to make it available for online analysis and decision support" (2001, p. 30), focuses on the methodical collection and pre-processing of data for specific analytical uses. The data is subject-oriented, time-stamped, and integrated to permit interactive analysis in support of decision-making processes. A data warehouse normally integrates data from a variety of sources, "thus enriching the data and broadening the context and value of the information" (Rauber et al., 2002, Data Warehousing and OLAP).

Data mining, defined as "the application of specific algorithms to a data set for the purpose of extracting data patterns" (p. 28), focuses on improving the utility of large data sets as well as IR response. Data mining, in particular the algorithms used in data mining, has received a lion's share of attention in the development of Decision Support Systems (DSS) and RDMS research because results are often immediately applicable in high-payoff decision-making industries such as insurance, sales, and financial and medical services.

Inspirations and Intentions for the Technology

Rocha describes the ultimate goal of IR as the production or recommendation of relevant information to users (2001, 1.2). We can ascribe the same motivation to the development of KDD systems and methods in general, particularly in regards to the refinement of DBMS. Research in data collection, storage, and retrieval has focused on issues specifically related to the improvement of KD and IR functionality. Among the topics given special attention have been data translation, change detection, integration, duplication, summarization, aggregation, and timeliness (Widom, 1995).

Research has also focused on the need to improve automation in KD and IR, especially in the areas of data selection and pre-processing, data transformation, and data interpretation and evaluation (Fayyad et al., 1996, p. 28). However, increased automation in KD and IR requires increased attention to the methods used for data collection and storage as well as the statistical foundations of the search and retrieval processes (p. 29). Despite this complication, however, it is clear that manual analysis of billions of records and hundreds of fields is impractical and that automated data handling will be even more in demand as requirements for on-the-fly analysis and more flexible presentation of search results increase (p. 28).

KDD and IR: Application to KMS

Technological Systems and Processes

Interface, interaction, and ubiquity. The relationship of KDD and IR to KMS is intimate: all KMS rely in some form on the aggregation of data for search and retrieval. Historically, improvements in the utility of KMS have depended in large part on improvements in KDD and IR functionality. Fayyad et al. describe KDD as "the overall process of knowledge discovery from data, including how the data is stored and accessed, how algorithms can be scaled to massive data sets and still run efficiently, how results can be interpreted and visualized, and how the overall human-machine interaction can be modeled and supported” (Fayyad et al., 1996, p. 29). This comprehensive list of KDD processes, which encompasses IR, also serves to describe the core functionality of most KMS (pp. 30-31). Research issues that have arisen in the development of DBMS and the study of KDD are also closely related to the development and deployment of KMS. Among these are: data collection and pre-processing; continually increasing volumes of data; increasingly complex forms of data; identifying and extracting useful knowledge from extremely large repositories; means for identifying knowledge of value about as well as in the data set; extracting knowledge from data and presenting that knowledge in usable forms (pp. 30-31).

The development of highly specialized DBMS for data warehousing and the continual refinement of data mining methods and technologies have been motivated in large part by the deployment of KMS throughout industry. Many KMS are simply elaborated RDBMS integrated with IR and communication systems. More sophisticated KMS may also add collaborative work tools. Decisions related to data mining, including model functions, model representation, and preference criterion are an elemental part of KMS development and deployment (pp. 31-32). Data mining tasks (classification, forecasting, clustering, description, deviation detection, link analysis, and visualization (Piatetsky-Shapiro, 1998, Slide 17) and search algorithms are fundamentally affected by the focus and purpose of an organization's KMS.

System architecture. The characteristics of the underlying DBMS determine the architecture of KD and IR systems. RDBMS are composed of many relations in the form of two-dimensional tables of rows and columns containing related tuples. The rows (tuples) are called records, and the columns (fields in the record) are called attributes. Each column is accorded a specific data type. The type of data stored in an RDBMS has traditionally been constrained to ensure that there are no ambiguous tuples in the database (Devarakonda , 2002, RDBMS) although in the case of very complex data types, for example scientific data, programmers have overcome the constraints of the DBM by employing Binary Large Objects (BLOBs) to store data in a database. This "solution" creates its own set of problems, however. BLOBs are usually much larger than a single block of storage in a database, a characteristic that undermines the efficiency of the database. As well, because of their size, and because BLOBs in a single database may contain a variety of data types and compound data, the data content of the BLOB is not visible to the database. The opacity of data content means that a user cannot perform a high-level search across the BLOBs in a database (Wallace, Benschop, and Köhntopp, 1999).

RDBMS use Structured Query Language (SQL) for data definition, modification, querying and constraint specification. Queries can range from simple single-table queries to complicated multi-table queries. A commonly used RDBMS is Microsoft Access, but the existence of a standard query language allows data to be migrated easily from one RDBMS to another (Devarakonda , 2002, RDBMS). Although the structure of RDBMS renders them incapable of handling complex data types such as spatial data, images, or number arrays without the use of BLOBs, it does permit rapid data access and large storage capacities.

The data management limitations of RDBMS led to the development of OODBMS. In OODBMS, internal data structure is hidden so that external operations can be performed on the data as an Abstract Data Type (ADT). RDBMS and OODBMS are fundamentally different in the way they handle data relationships; OODBMS represent relationships explicitly, which improves data access performance. Nonetheless, OODBMS are plagued by poor query performance and problems of database scalability (Devarakonda, 2002, OODBMS).

ORDBMS, a relatively recent innovation, are designed to incorporate the best features of both RDBMS and OODBMS. Data is stored in tables, but some entries may have richer data structure; as in OODBMS, these entries are called ADTs. Because the data is stored in rows and columns, the ORDBMS maintains a relational data model, although it must be heavily modified to support object-oriented programming. In essence, the object-relational model adds a new object-oriented layer to support rich data types on top of the relational database model. ORDBMS support query and handle data objects; the can also be built on a massive scale. These features make ORDBMS particularly useful for the development of KMS for handling complex data types.

System configuration and deployment. A primary concern for many organizations during the configuration and deployment of KD and IR systems has been the creation of data and query context. Some efforts to create context have been retrospective. Lee and Hwang (2002) describe the process of extracting and visualizing semantic metadata from databases. This process, called relational database reverse engineering (RDRE), “extracts a conceptual model from an existing relational database by analyzing data instances as well as metadata” (Lee and Hwang, 2002, Conclusion). RDRE has been especially useful in creating shared "conceptual schema" for multiple databases (Introduction). A conceptual schema describes the database in terms of data items and relationships between data items in a form "suitable for human presentation" (Introduction) thereby enhancing KD and IR. The ability to discover and describe data relationships within and between databases allows organizations to profile and map information in their data warehouses in ways that were previously unimaginable. Mapping and profiling of data not only creates discovery and retrieval context to enhance data reuse, it can also reveal entirely new uses possibilities. A well-defined database reengineering project enables an organization to integrate the masses of transactional data that lies in its data warehouse with information collected from other enterprise systems or from outside the company.

Another common method for creating data and query context for enterprise data warehouses is the establishment of mechanisms for creating context during data creation and collection or during query construction. Many personal KMS provide robust mechanisms for data contextualization through the addition of metadata or by data structuring. KMS such as PeopleGarden (Xiong and Donath, 1999) extract social context for data during the processes of data collection and data exchange. Extending IR throughout the social network of an organization, as is done by Answer Garden (Ackerman, 1994, Ackerman, and Malone, 1990, & Ackerman and MacDonald, 1996) is another method for providing query context for KD and IR.

Technology transition in organizations. Institutions that have pioneered the use of KDD and IR, especially in the form of data mining, have traditionally been those that rely heavily on knowledge-based decisions for their success. Because their operations have historically relied heavily on data collection, these organizations normally have a large quantity of accessible, relevant, historical and current data. They also anticipate a high payoff for making rapid, correct decisions based on their collected data and they actively seek a technological advantage in knowledge management. Financial institutions such as banking and investment firms, healthcare and insurance organizations, and businesses that rely heavily on marketing and customer relations are emblematic of sectors that have aggressively pursued technological innovations in KD and IR (Piatetsky-Shapiro, 1998, Slides 28-31).

The development of Decision Support Systems (DSS) based on electronic data processing (EDP) was an early application of database technology to KM in large enterprises. In many cases, however, technological strides in data collection (hardware and software) rapidly outpaced the enterprises' ability to understand and manage the data that was being collected and stored. Information was often plentiful without being relevant and extensive data warehouses often proved inadequate for applied decision making (Bass, 1983, p. 189).

Another difficulty faced by organizations that relied on large data bases for decision support was the danger that decisions would be made based on data that was poorly contextualized or poorly understood. Managers faced with a complex decision process might misinterpret the applicability of a data set to the problem or fail to investigate the existence of contradictory data (Calvert, 1993, p. 91). The less contextual the data, the more easily it may be misinterpreted or misapplied.

Organizational Systems and Processes

The introduction of automated KDD and IR changed the fundamental nature of knowledge work, organizational architectures, management practices, and communication flows in organizations. The introduction of Web-served data collection, query and delivery has also significantly affected these systems. In particular, the expansive application of KDD and IR technologies and techniques to information management for distributed or "flattened" organizations has resulted in KM becoming a ubiquitous "industrial" product in many business sectors.

Two aspects of knowledge work profoundly affected by the pervasive use of KDD and IR technologies have been knowledge creation and communication in the context of collaboration. The enhancement of collaborative possibilities in knowledge work created by distributed KDD and IR has had significant social affects in organizations and among individuals. The problem of creating shared context for data collection, retrieval, and delivery in distributed DBMS has already been mentioned. Equally difficult are the incitement of collaboration and the creation of networks of trust among the dispersed users of distributed DBMS.

The creation of massive, increasingly powerful DBMS and more effective KDD and IR technologies and techniques has also raised many complex social issues outside business processes. One significant social concern is the increasingly pervasive collection of detailed individual data that enabled by sophisticated DBMS. Many individuals enjoy the convenience offered by the maintenance of personal information in commercial databases, but are unaware of the privacy implications inherent in the services these databases enable. Many individuals are faced with a daily choice: convenience and service or security and privacy?

KDD and IR: Looking to the Future

KDD and IR research problems

The demands of commercial KM markets drive the lifecycle of KD and IR systems. The creation of highly dimensional, massive data sets and the increasing sophistication of users and complexity of database uses have directed KDD research in specific directions. High priority research topics include: problems of statistical significance and missing data; the understandability of data patterns; the management of changing data and data integration; and the manipulation of non-standard, multi-media, and object oriented data (Fayyad, Piatetsky-Shapiro, & Smyth, 1996, pp. 33-34).

Research and development in IR is equally market driven. In 1995, Croft published a "top ten" list of IR research issues based on his experiences in the area of industrial and government research priorities as a member of the National Science Foundation (NSF) Center for Intelligent Information Retrieval (CIIR) (¶ 3). These research priorities, derived from surveys of companies that use and sell IR systems, still resonate today:

1. Integrated solutions (standardized architectures and common platforms; the integration of database management and IR systems with multimedia capabilities)

2. Distributed IR (retrieval systems that can work in distributed, wide-area network environments)

3. Efficient, flexible indexing and retrieval (including ability to handle a wide variety of data formats)

4. Automatic query expansion (To overcome vocabulary mismatch between users and databases

5. Interfaces and browsing (Interfaces that support a range of functions including query formulation, presentation of retrieved information, feedback, and browsing in a conceptually simple way)

6. Routing and filtering (many companies considered data routing to be the main function required for a text-based DBMS, with IR being a secondary function)

7. Effective retrieval (companies are particularly interested in techniques that produce significant improvements in precision but still avoid occasional major retrieval mistakes)

8. Multimedia retrieval (techniques for accessing image, video and sound databases without text descriptions)

9. Information extraction (techniques to identify database entities, attributes and relationships in full text)

10. Relevance feedback (improved algorithms and models for automatic relevance feedback) (Croft, 1995)

New developments

KD and IR problems for Web resources. The rapid growth of the Web and increasing reliance on the Web for the collection and delivery of data for KM has created new problems in KD and IR as well as bringing some older problems to the fore. Among the problems are: standardization of data collection and pre-processing; huge volumes of continually changing data; complex, streaming, and multi-media data; identifying and extracting useful knowledge from Web resources; a lack of consistent data models and context; a lack of available descriptive information; the problem of presenting knowledge in usable forms; and the rapid development of more time-sensitive, multi-media applications for Web resources.

Many of these problems reflect the inadequacy of current methods for Web resource KD and IR. Data collection is presently performed primarily by automated Web crawlers. Pre-processing consists of link-based ranking or human indexing and categorization. The identification and extraction of useful knowledge from Web resources is dependent on highly inefficient keyword searches on natural language text or on imprecise topical directories or topical Web sites. Retrieved knowledge can be viewed only in its native format (with a plugin) or sometimes only as derived HTML.

A variety of research and development projects are underway to enable more efficient, automated KD and IR for Web resources. Among the best known efforts are those that seek to apply semantic markup to Web resources to enable machine understanding and processing and inference analysis. Related projects seek to develop intelligent search engines and agents to exploit the semantic statements created by this markup, while still others are creating ontologies to provide context for these search engines and agents (Shah et. al., 2002)

Other researchers are examining improved methods for automated data and context collection (data pre-processing), the provision of value-added services such as query routing, the development of integrated query and knowledge delivery systems, and the establishment of social accounting metrics to provide context for humans (Smith, 2002, p. 52). Another major area of research focuses on leveraging historical information about individual and group Web browsing experience and patterns to enable more efficient KD and IR (Chakrabarti et al., 1998, Abstract). Rauber et al. (2002) provide an evocative description of the potential for enhanced KD and IR that is as yet untapped, "With [such] a repository of Web data, as well as the metadata associated with the documents and domains, we have a powerful source of information that goes beyond the content of Web pages …. in order for the most useful analyses to yield answers to project questions and issues, a different perspective of the Web and Web archives is needed, a perspective focusing not solely on content, but on the wealth of information automatically associated with each object on the Web" (Introduction). Capturing an understanding of how other individuals have discovered, retrieved, and used Web content provides invaluable context for users who are accessing that content for the first time.

Integration with Other Technologies

Enhanced presentation for the Web. The need for better integration of KDD and IR systems with delivery and presentation technologies has already been mentioned and it is a need that cannot be overstated. This is particularly true in the case of information presentation on the Web. Considerable research is underway in the area of reformatting data for discovery and presentation through Web-enabled devices. Another area of research focus is differentiated service for different devices that would enable variable visualization of retrieved information depending on a user's needs and device characteristics. Researchers in the field of adaptive graphics, "a unifying framework that allows visual representations of information to be customized and mixed together into new ones” have proposed content pre-viewing, interactive content, selective presentation, and customized views of Web-served content (Boier-Martin, 2003, pp. 6-9) as areas ripe for progressive research. Many of these researchers refer to the work of Turner Whitted who in 1998 suggested the use of computer displays as "wallpaper" for interactive information exchange to enable pervasive collaboration and information retrieval (1999, p. 6).

KDD and IR for pervasive computing. Achieving what Cherniack, Franklin, and Zdonik term “ubiquitous data access” (2001, slide 7) presents several unique challenges in system integration. Many of these challenges reflect data management problems. Among these are: the resolution of context-dependent data (e.g., push/data pull delivery issues); synchronization of data from multiple, distributed sensors and collectors; the efficient renewal of data streams; effecting profile-driven data management; dealing with location aware, mobile devices; and the enabling of service mobility and service discovery (slides 8-27).

The next generation

Research trends and priorities suggest a number of substantial advances in next generation KDD and IR systems. We can expect them to enable the solving of business problems, not data analysis problems. They will embed knowledge discovery engines and integrate access to enterprise and external data on the back-end. Moreover, most importantly, they will integrate the knowledge discovery process with knowledge delivery tools (Piatetsky-Shapiro, 1998, Slide 7). We can also expect next generation KDD and IR systems to manage information retrieval contextually, allow contextual query/continuous query, enable KD in virtual networks of peer-to-peer databases, and interpolate or extrapolate for missing data (Cherniack et. al., 2001, slides 115-138).

To enable mobile and pervasive computing applications, future KDD and IR systems will also have to be able to characterize information resources, recognize individual users, provide variable means to exchange knowledge between users and information sources (push and pull of information), adapt to the user community, and enable the reuse and recombination of information as well as its exchange (Rocha, 2001, 1.2). The most fundamental and difficult of these challenges will be information characterization.

Conclusion: On the Bleeding Edge

One might reasonably ask if the KDD and IR systems described above fall in the realm of science or science fiction. The answer is, assuredly, in the realm of science, although science fiction has often been influential in application development. This answer is supported by a brief examination of the KDD and IR research being funded by the Defense Advanced Research Projects Agency (DARPA) (the folks who brought us the Internet) under the auspices of the federal Total Information Awareness (TIA) Program. This research covers substantially new database technologies, architectures, population techniques, search algorithms, and data models.

One funded project, Genisys, has the goal of producing technology to enable ultra-large, all-source information repositories (DARPA, 2003b, Program Strategy). Unlike RDBMS in use today, Genisys-developed DBMS will require no prior data modeling; support automated restructuring and projection of data; store data in context of time and space; and develop a large, distributed system architecture for managing a huge volume of raw data input, analysis results, and feedback (DARPA, 2003a, TIA System: Program Strategy). Programs such as Genisys are building aggressively on a foundation of 30 years of research in KDD and IR technology and techniques. Although these initiatives raise new social as well as technical problems, they also suggest the possibility of substantially new applications for these technologies.

The difficulties of contextualizing and interpreting data for KM have increased many-fold in the past decade. New technologies for data collection and storage have led to ever-larger data warehouses containing hugely complex data types-- a development that has greatly complicated data discovery, retrieval, visualization, and sharing within organizations. A growing need to incorporate increasingly disparate data sources from outside the organization has transformed enterprise KM from a cluster of internal management problems into a problem set that also encompasses an organization's relationships with clients and competitors, as well as its ability to participate in lucrative cooperative ventures. Enterprises now seek to use information technology to support not just individual problem solving, but entire decision making processes.

KD and IR have become tools that not only enhance human decision-making but that also compensate for inherent weaknesses in human decision making processes. The result has been the development of powerful new EDP applications in knowledge discovery, KM, and enterprise decision making, especially in the areas of collaborative ventures, market forecasting, the management of customer relations, and fraud or crime detection. If these technologies are to progress even further, however, researchers must deal with the essential task of describing (characterizing) our growing wealth of information resources (online and offline). Only when we are able to visualize meaningfully the vast extent of our available information resources will we be able to develop new approaches to KD and IR. The fundamental problems in KM today relate to our inability to find and understand the information we already possess, not to an inability to collect and manipulate new data. It is in the development of better KD and IR tools that the future of KM and KMS lie.

References

Ackerman, M. S. (1998, July). Augmenting the organizational memory: A field study of Answer Garden. ACM Transactions on Information Systems, 16(3), 203-204. Retrieved March 28, 2003 from

Ackerman, M. S., & Malone, T. W. (1990, April). Answer Garden: A tool for growing organizational memory. ACM SIGOIS Bulletin, 11(2-3), 31-39. Retrieved March 28, 2003 from

Ackerman, M. S., & McDonald, D. W. (1996). Answer Garden 2: Merging organizational memory with collaborative help. Proceedings of the ACM Conference on Computer-Supported Cooperative Work 1996 (CSCW96 Boston, MA). Retrieved March 28, 2003 from

Bass, B. M. (1983). Organizational decision making. In L. L. Cummins, E. Kirby Warren, & J. F. Mee (Eds.), The Irwin series in management and the behavioral sciences. Homewood, IL: Richard D. Irwin.

Boier-Martin, I. M.. (2003, January/February). Adaptive graphics. In T. Rhyne (Ed.) Visualization Viewpoints, IEEE Computer Graphics and Application, 23(1), 6-10. Retrieved April 5, 2003 from

Calvert, G. (1993). Highwire management: Risk-taking tactics for leaders, innovators, and trailblazers. San Francisco, CA: Jossey-Bass Publishers.

Chakrabarti, S., Srivastava, S., Subramanyam, M., & Tiware, M. (1998). Using Memex to archive and mine community Web browsing experience. A paper presented at the 9th International World Wide Web Conference, Amsterdam, May 15-19, 2000. Retrieved April 12, 2003 from

Croft, W. B. (1995, November). What do people want from information retrieval?: The top 10 research issues for companies that use and sell IR systems. D-Lib Magazine. Retrieved April 5, 2003 from

DARPA. (2003a). Genysis. Retrieved from the DARPA Information Awareness Office Web site at:

DARPA. (2003b). Total Information Awareness System. Retrieved from the DARPA Information Awareness Office Web site at:

Devarakonda, R. (2001, March). Object-relational database systems - The road ahead. ACM Crossroads Student Magazine. Retrieved April 12, 2003 from crossroads/xrds7-3/ordbms.html

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996, November). The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11), 27-34. Retrieved March 03, 2003 from

Lee, D., & Hwang, Y. (2002, March 1). Extracting semantic metadata and its visualization. ACM Crossroads Student Magazine. Retrieved March 27, 2003 from crossroads/xrds7-3/smeva.html

Piatetsky-Shapiro, G. (1998, December 4). Data mining and knowledge discovery tools: The next generation. Retrieved February 27, 2003 from at

Rauber, A., Aschenbrenner, A., Witvoet, O., Bruckner, R. M., & Kaiser, M. (2002, December). Uncovering information hidden in Web archives: A glimpse at Web analysis building on data warehouses. D-Lib Magazine, 8(12). Retrieved March 28, 2003 from

Rocha, L. M. (2001). TalkMine: A soft computing approach to adaptive knowledge recommendation [Electronic version]. In V. Loia & S. Sessa (Eds.), Studies in fuzziness and soft computing: Vol. 75. Soft computing agents: New trends for designing autonomous systems. (pp. 89-116). New York: Springer. Retrieved March 28, 2003 from

Shah, U., Finin, T., Joshi, A., Cost, R. S., & Mayfield, J. (2002, November). Information retrieval on the Semantic Web. Paper presented at The ACM Conference on Information and Knowledge Management , November 2002. Retrieved March 28, 2003 from

Smith, M. (2002). Tools for navigating large social cyberspaces. Communications of the ACM, 45(4), 51-55. Retrieved March 28, 2003 from

Wallace, N., Benschop, O., & Köhntopp, K. (1999). What is a BLOB? php.faqts. Retrieved May 1, 2003 from

Whitted, T. (1999, July/August). Draw on the Wall. IEEE Computer Graphics and Applications, 19(4), 6-9. Retrieved April 8, 2003 from ieeeexplore. at: .

Widom, J. (1995, November). Research problems in data warehousing. Proceedings of the 4th International Conference on Information and Knowledge Management (CIKM). Retrieved March 28, 2003 from

Xiong, R., & Donath, J. (1999). PeopleGarden: Creating data portraits for users. CHI Letters, 1(1), 37-44. Retrieved April 8, 2003 from

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download