Data Modeling for the Data Warehouse - DebTech Int



Data Modeling for the Data Warehouse

Tom Haughey

April 27-29, 2005

Registration $1595

Early Registration Rate $1495 if registered by March 27, 2005

DAMA, MPO or IDMA Rate $1395 if registered by March 27, 2005

This workshop will focus on the specialized techniques used to model data warehouse data. It provides concrete guidelines and rules for modeling this data.

This workshop will stress that this is not just an intuitive process based solely on the spontaneous judgment of a skilled analyst. It can be based on sound rules and guidelines just as production data models are.

This workshop emphasizes three fundamental concepts:

□ Finding the right level of atomic data in the warehouse,

□ That the data warehouse is not a database but an integrated environment consisting of different levels of data,

□ Optimizing the data warehouse design through dimensional modeling.

Workshop Agenda

Introduction

Scope and levels of modeling

1. Kinds of data

2. The framework for data modeling

3. Challenges in data management

4. Five major characteristics of data warehouse

5. Data Models

6. Types and technologies of data warehousing

Data Warehouse Methodology

7. Explanation of methodology steps

8. Iterative nature of development

Introduction To Data Modeling

9. Definition and components

10. Levels of data models

11. Rules for each level

12. The high level data model

13. Identifying and defining subject areas

14. The detailed level data model

15. Normalization

Functional dependency

Mathematical normalization

Natural normalization

Application to the data warehouse

Building the Data Warehouse Model

16. Comparison of operational and informational data

17. Case 1: direct access to operational data

18. Case 2: using informational data bases

19. A data controlled environment

20. Progression of data in a data controlled environment

21. Two types of data changes in the data warehouse: Transformations optimizations

Levels of Data In the Enterprise

22. Four types of data and systems

23. The warehouse and decision support data model

24. Sources of warehouse and decision data

25. Definition and rules

26. The corporate model

27. The business area model

Derived Data

28. Types and different rules for handling during analysis and design

Modeling Time And History

29. Short Term And Long Term View

30. Four ways of handling time and date

31. Time-series data

32. Capturing business changes

33. Importance of representing the business time dimension

Information Gathering

34. Facilitated sessions

35. Interviews

36. Information gathering techniques

Analyzing Current Systems Data

37. Define key data elements

38. Data stewards for each data element

39. Key data element business rules

40. Define domains and valid values

41. Define valid ranges for error

42. Document key data elements on repository

43. Validate data mappings

44. Identify key data elements in source systems

45. Map relationships for repository

Data Transformations

46. Remove pure production data

47. Add time and history to the data identifier

48. Add data derivations

49. Find the right atomicity of data

50. Determine the functional dependencies in summary data

51. Create data arrays and fact tables

52. Accommodate varying levels of summarization

53. Add summary data

54. Merge like data from different tables

55. Create arrays of data

56. Separate data based on its stability

57. Embed relationships in the data

58. Add external data

59. Techniques for derived data

60. Different levels of summarization

Critical Warehouse Components

61. Definition of fact tables and dimensions

62. Creating multidimensional arrays

63. Developing fact tables and arrays

64. Corporate reference tables

65. The star database schema

66. The snowflake database schema

67. Meta-data repository and components

Optimizing the Data Warehouse Design

Data design compromises

Safe compromises to data

Merge like tables

Create arrays of data (violate first normal form)

Split data based on stability and usage (

Add indices,

Encode-decode data

Aggressive compromises to data

Store derived data

Summarize data

Add redundant data

Imbed relationship data

Add redundant relationships

Add partial dependencies (violate second normal form)

Add transitive dependencies (violate third normal form)

Critical factors in data design

Number of occurrences of each table

The ratio of one table to another

The queries that use the data

The data accesses made by each query

The load factor for each query.

The steps of optimization

Data Warehouse Technology

68. Categories of warehouse tools

69. Review of major products

Important Considerations And Issues

70. Denormalization and performance

71. Archiving and purging

72. Data distribution and replication

73. Change control

74. Copy management

75. Alternative Models For Copied Data

What you will learn

Among the most important factors you will learn at this seminar will be how to design a dimensional data warehouse. By contrast, OLTP (on-line transaction processing) data models are normalized so as to reduce update problems. In addition, the focus is on current data, which is two-dimensional. Data warehouse design usually introduces a third dimension, which is that of time. For example, to support trend analysis, it is necessary to show the values of data changing over time (such as, monthly, quarterly, annually). Usually, this requires a degree of denormalization when creating the data model. The seminar will teach how to effectively accomplish this aspect of data modeling, while ensuring a quality data warehouse design. Concepts learned are reinforced by individual and group exercises

Tom Haughey is considered one of the four founding fathers of Information Engineering in is America. He is currently President of InfoModel, Inc., a training and consulting company specializing in practical and rapid development methods. His courses on data management, data warehousing, Information Engineering and software development have been delivered to Fortune 1000 companies around the world. He has worked on the development of seven different CASE tools, over 40,000 copies of which have been sold to date. He was formerly Chief Technology Officer for the Pepsi Bottling Group and Enterprise Director of Data Warehousing for Pepsico. He was also formerly Vice President of Technology for Computer Systems Advisers, who market the CASE tools called POSE and SILVERRUN. He wrote his own CASE tool in 1984.He formerly worked for IBM for 17 years as a Senior project manager. He is the author of many articles on Data Management, Information Engineering and Data Warehousing.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download