IDM Workshop Template - Brandeis University



ITR: Data Centers - Managing Data with Profiles

Project Award Number:  IIS-0086057

Principal Investigators

Stan Zdonik

Brown University, Dept. of Computer Science, P.O. Box 1910, Providence, RI  02912

Phone:  (401) 863-7648      Fax:  (401) 863-7657    Email:  sbz@cs.brown.edu

URL: 

 

Mitch Cherniack

Brandeis University, Department of Computer Science, 415 South St., Mailstop 018, Waltham, MA, 02454

Phone: (781) 736-2738      Fax: (781) 736-2741    Email:  mfc@cs.brandeis.edu

URL:  

Mike Franklin

University of California at Berkeley, Dept. of Computer Science, 687 Soda Hall, Berkeley, CA 94720-1776

Phone:  510-642-1662      Fax:   510-642-5775    Email:  franklin@cs.berkeley.edu

Co-PI

Steve Reiss, Brown University, Dept. of Computer Science, P.O. Box 1910, Providence, RI 02912

Phone:   401-863-76xx     Fax:   401-863-7657    Email:  spr@cs.brown.edu



Collaborator

Michael Stonebraker, MIT, Lab for Computer Science, 545 Technology Square, Cambridge, MA 02139

Phone:   617-253-3538     Email:  stonebraker@lcs.mit.edu

Keywords

network data management, profiles, data streams, continuous queries, data staging

Project Summary

Networked information is often unmanaged.  In a wide-area setting like the Internet data sources are autonomous and provide little opportunity for application-dependant data management.  In a mobile environment, data is often unavailable or out of date.  A modern DBMS provides data management tools (e.g., indices, clustering) that are used and tuned by a DBA based on an overall understanding of the needs of the application mix.  Having a DBA in the network environments mentioned above is impractical.  Thus, we introduce the notion of a profile.  A profile is a declarative specification of a users interest as well as a specification of the utility that a specific set of objects would provide to that user.  We use profiles to drive data management decisions in a middleware service that sits between the data sources and the users.  Recently, the project has focused on  using profiles as Quality of Service (QoS) specifications to manage data  stream processing.  The Aurora project investigates how QoS can be used to drive scheduling and load shedding in a data stream environment in which latency of results is critical.

Publications and Products 

Daniel Abadi, Don Carney, Ugur Cetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik, "Aurora: A New Model and Architecture for Data Stream Management", VLDB Journal, August, 2003. To Appear.

S. Chandrasekaran and M. Franklin, "PSoup: A System for Streaming Queries over Streaming Data", VLDB Journal, August, 2003. To Appear.

Nesime Tatbul, Ugur Cetintemel, Stan Zdonik, Mitch Cherniack and Michael Stonebraker, "Load Shedding in a Data Stream Manager", Proceedings of the International Conference on Very Large Data Bases (VLDB), 2003. To Appear.

Don Carney, Ugur Cetintemel, Alex Rasin, Stan Zdonik, Mitch Cherniack and Michael Stonebraker, "Operator Scheduling in a Data Stream Environment", Proceedings of the International Conference on Very Large Data Bases (VLDB), 2003. To Appear.

Y. Diao and M. Franklin, "Query Processing for High-Volume XML Message Brokering", Proceedings of the International Conference on Very Large Data Bases (VLDB), 2003. To Appear.

M. Hammad, M. Franklin, W. Aref, and A. K. Elmagarmid, "Scheduling for Shared Window Joins Over Data Streams", Proceedings of the International Conference on Very Large Data Bases (VLDB), 2003. To Appear.

Stan Zdonik, Michael Stonebraker, Mitch Cherniack, Ugur Cetintemel, Magdalena Balazinska and Hari Balakrishnan, "The Aurora and Medusa Projects", IEEE Data Engineering Bulletin, p. 3-10, vol. 26, 2003.

S. Krishnamurthy, S. Chandrasekaran, O. Cooper, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. Madden, F. Reiss, and M. Shah, "TelegraphCQ: An Architectural Status Report", IEEE Data Engineering Bulletin, p. 11, vol. 26, 2003.

Y. Diao and M. Franklin, "High-Performance XML Filtering - An Overview of YFilter", IEEE Data Engineering Bulletin, p. 41, vol. 26, 2003.

D. Carney, S. Lee, S. Zdonik, "Profile-Driven Data Freshening", Proceedings of the International Conference on Data Engineering (ICDE), 2003.

D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, C. Erwin, E. Galvez, M. Hatoun, J. Hwang, A. Maskey, A. Rasin, A. Singer, M. Stonebraker, N. Tatbul, Y. Xing, R. Yan and S. Zdonik, "Aurora: A Data Stream Management System (Demonstration)", Proceedings of the ACM SIGMOD International Conference on Management of Data, 2003.

Mitch Cherniack, Hari Balakrishnan, Don Carney, Ugur Cetintemel, Ying Xing and Stan Zdonik, "Scalable Distributed Stream Processing", Proceedings of the Conference for Innovative Database Research (CIDR), 2003.

Mitch Cherniack, Eduardo F. Galvez, Michael J. Franklin and Stan Zdonik, "Profile-Driven Cache Management", Proceedings of the International Conference on Data Engineering (ICDE), 2003.

S. Chandrasekaran, O. Cooper, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah, "TelegraphCQ: Continuous Data Flow Processing for an Uncertain World", (2003).  Proceedings of the Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, January, 2003

M. Denny, M. Franklin, P. Castro, and A. Purakaysatha, "Mobiscope: A Spatial Discovery Service for Mobile Network Resources", Proceedings of the International Conference on Mobile Data Management, Melbourne, Australia, January, 2003.

S. Chandrasekaran, O. Cooper, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, F. Reiss, and M. Shah, "TelegraphCQ: Continuous Dataflow Processing (System Demonstration)", (Demonstration). Proceedings of the ACM SIGMOD Conference on Management of  Data, 2003.

Yanlei Diao, Peter Fischer, Michael J. Franklin, Raymond To. "YFilter: Efficient and Scalable Filtering of XML Documents", Proceedings of the International Conference on Data Engineering (ICDE), 2002.

D. Carney, U. Cetintemel, M Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S. Zdonik, "Monitoring Streams - A New Class of Data Management Applications", Proceedings of the International Conference on Very Large Data Bases (VLDB),  2002.

S. Madden, M. Franklin, "Fjording the Stream: An Architecture for Queries of Streaming Sensor Data", Proceedings of the International Conference on Data Engineering (ICDE), 2002

S. Chandrasekaran, M. Franklin, "PSoup: Continuous Processing of Streaming Queries over Streaming Data", Proceedings of the International Conference on Very Large Data Bases (VLDB), 2002.

M. Denny, P. Castro, M. Franklin, and A. Purakayastha, "Mobiscope - A System for Mobile Resource Discovery", Proceedings of the MobiCom Conference, 2002.

Mitch Cherniack, Michael J. Franklin and Stan Zdonik, "Expressing User Profiles for Data Recharging", IEEE Personal Computing: Special Issue on Pervasive Computing, August, 2001.

M. Franklin, "Challenges in Ubiquitous Data Management,", Editor(s): R. Wilhelm Lecture Notes in Computer Science, #2000 Springer Verlag, pp 24-33, 2001.

Project Impact

Results from this research have been presented at two meetings on information management support for counter-terrorism. One meeting was organized by the CIA and held in New Jersey in October 2001. The other meeting was organized by the National Academy of Sciences and was held in Washington DC in June 2002.  In addition, the results of this research were presented at the Almaden Institute on Autonomic Computing, held in San Jose, CA in April 2002.  The three PI's also presented a tutorial on Data Management for Pervasive Computing at the 2001 VLDB Conference in Rome, Italy.  The Aurora team has done a significant amount work in identifying industrial problems and in working to demonstrate the usefulness of the approach.  They have worked closely with the US Army Research Institute for Environmental Medicine (USARIEM), Mitre Corporation, and Fidelity Investments.  We have designed and implemented two different stream processing systems, Aurora at Brown/Brandeis (and MIT) and Telegraph at Berkeley.  Both of these projects have become very visible within the database community. Significant demos of both systems were presented at SIGMOD 2003. They have generated interest in the topic among other researchers. While the two projects have taken somewhat different approaches, the linkages through this grant have been extremely valuable in being able to make meaningful comparisons of the strengths and weaknesses of our design choices.

Goals, Objectives and Targeted Activities

A primary goal of the project is to understand how simple profile models can be used to effect data management decisions in a network information system.  Ultimately, we would like to design a general-purpose profile language, but at this stage, we feel that it is better to understand the implications of simple profiles on processing requirements.  To this end, we are engaged in multiple concurrent experiments, each of which should shed some insight into the emerging area of profile-driven data management.  One such project focuses on the use of profiles to determine how to allocate limited bandwidth to freshening the data items in a cache.  Another project seeks to understand how to pick a set of objects that maximizes the utility of a limited cache given profiles that can express dependencies among the data values.  Yet another project is trying to figure out how to allocate processing resources to a large network of processing elements in order to produce streams of values that maximize the utility of the output streams relative to a set of application-specific profiles (QoS specifications). The last example cited above is intended to address the problems of large-scale stream processing.  In particular, we are very interested in being able to service the needs of sensor applications.  We are working closely with the US Army Research Institute for Environmental Medicine as well as several groups from Mitre each of which is interested in sensor processing.  We have used the Aurora system to build several demos for each of these applications.

Area Background

Database systems store information at a higher semantic level than file systems.  As such, they are capable of capturing application level semantics and of exploiting this semantics to provide better access capabilities (e.g., high-level query languages) as well as more efficient access paths (e.g., indices) to support highly optimized query evaluation.   The area of network data management is emerging as an attempt to bring the same benefits that we saw in the traditional DBMS world to the new world of network-based information systems.  Special topics within this new area include web search engines, proxy caches, data integration services, web server technology, and wireless data management.  A very important new area in this space is that of stream data management [CCC+] which tries to produce meaningful answers to queries even when the system is presented with large numbers of high-volume, push-based data streams.

Area References

[AAB+99] M. Altinel, D. Aksoy., T. Baby, M. Franklin, W. Shapiro, S. Zdonik, "DBIS Toolkit: Adaptable Middleware for Large Scale Data Delivery", Proc. ACM SIGMOD Conf., Philadelphia, PA, June, 1999.

[CDTW00] J. Chen, D. DeWitt, F. Tian, Y. Wang, "NiagaraCQ: A Scalable Continuous Query System for Internet Databases", Proc. ACM SIGMOD Conf., Dallas, June, 2000.

[OPSS93] B. Oki, M. Pfluegl, A. Siegel, D. Skeen, “The Information Bus - An Architecture for Extensible Distributed Systems”, Proc. 14th SOSP, Ashville, NC, December, 1993.

[YGM99] Tak W. Yan, Hector Garcia-Molina: The SIFT Information Dissemination System. ACM Transactions on Database Systems, 24 (4): 529-565 (1999).

[CCC+]  D. Carney, U. Cetintemel, M Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S. Zdonik, "Monitoring Streams - A New Class of Data Management Applications", Proceedings of the 27th International Conference on Very Large Databases (VLDB),  (2002).

Potential Related Projects

Stanford:                                   Stream Project

Univ. of Wisconsin, Madison:    Niagara

AT&T                                       Gigascope

Project Websites

   





Online Software

TelegraphCQ v0.2 (Beta Release)

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download