Data Modeling for the Data Warehouse - DebTech Int
Data Modeling for the Data Warehouse
Tom Haughey
April 27-29, 2005
Registration $1595
Early Registration Rate $1495 if registered by March 27, 2005
DAMA, MPO or IDMA Rate $1395 if registered by March 27, 2005
This workshop will focus on the specialized techniques used to model data warehouse data. It provides concrete guidelines and rules for modeling this data.
This workshop will stress that this is not just an intuitive process based solely on the spontaneous judgment of a skilled analyst. It can be based on sound rules and guidelines just as production data models are.
This workshop emphasizes three fundamental concepts:
□ Finding the right level of atomic data in the warehouse,
□ That the data warehouse is not a database but an integrated environment consisting of different levels of data,
□ Optimizing the data warehouse design through dimensional modeling.
Workshop Agenda
Introduction
Scope and levels of modeling
1. Kinds of data
2. The framework for data modeling
3. Challenges in data management
4. Five major characteristics of data warehouse
5. Data Models
6. Types and technologies of data warehousing
Data Warehouse Methodology
7. Explanation of methodology steps
8. Iterative nature of development
Introduction To Data Modeling
9. Definition and components
10. Levels of data models
11. Rules for each level
12. The high level data model
13. Identifying and defining subject areas
14. The detailed level data model
15. Normalization
Functional dependency
Mathematical normalization
Natural normalization
Application to the data warehouse
Building the Data Warehouse Model
16. Comparison of operational and informational data
17. Case 1: direct access to operational data
18. Case 2: using informational data bases
19. A data controlled environment
20. Progression of data in a data controlled environment
21. Two types of data changes in the data warehouse: Transformations optimizations
Levels of Data In the Enterprise
22. Four types of data and systems
23. The warehouse and decision support data model
24. Sources of warehouse and decision data
25. Definition and rules
26. The corporate model
27. The business area model
Derived Data
28. Types and different rules for handling during analysis and design
Modeling Time And History
29. Short Term And Long Term View
30. Four ways of handling time and date
31. Time-series data
32. Capturing business changes
33. Importance of representing the business time dimension
Information Gathering
34. Facilitated sessions
35. Interviews
36. Information gathering techniques
Analyzing Current Systems Data
37. Define key data elements
38. Data stewards for each data element
39. Key data element business rules
40. Define domains and valid values
41. Define valid ranges for error
42. Document key data elements on repository
43. Validate data mappings
44. Identify key data elements in source systems
45. Map relationships for repository
Data Transformations
46. Remove pure production data
47. Add time and history to the data identifier
48. Add data derivations
49. Find the right atomicity of data
50. Determine the functional dependencies in summary data
51. Create data arrays and fact tables
52. Accommodate varying levels of summarization
53. Add summary data
54. Merge like data from different tables
55. Create arrays of data
56. Separate data based on its stability
57. Embed relationships in the data
58. Add external data
59. Techniques for derived data
60. Different levels of summarization
Critical Warehouse Components
61. Definition of fact tables and dimensions
62. Creating multidimensional arrays
63. Developing fact tables and arrays
64. Corporate reference tables
65. The star database schema
66. The snowflake database schema
67. Meta-data repository and components
Optimizing the Data Warehouse Design
Data design compromises
Safe compromises to data
Merge like tables
Create arrays of data (violate first normal form)
Split data based on stability and usage (
Add indices,
Encode-decode data
Aggressive compromises to data
Store derived data
Summarize data
Add redundant data
Imbed relationship data
Add redundant relationships
Add partial dependencies (violate second normal form)
Add transitive dependencies (violate third normal form)
Critical factors in data design
Number of occurrences of each table
The ratio of one table to another
The queries that use the data
The data accesses made by each query
The load factor for each query.
The steps of optimization
Data Warehouse Technology
68. Categories of warehouse tools
69. Review of major products
Important Considerations And Issues
70. Denormalization and performance
71. Archiving and purging
72. Data distribution and replication
73. Change control
74. Copy management
75. Alternative Models For Copied Data
What you will learn
Among the most important factors you will learn at this seminar will be how to design a dimensional data warehouse. By contrast, OLTP (on-line transaction processing) data models are normalized so as to reduce update problems. In addition, the focus is on current data, which is two-dimensional. Data warehouse design usually introduces a third dimension, which is that of time. For example, to support trend analysis, it is necessary to show the values of data changing over time (such as, monthly, quarterly, annually). Usually, this requires a degree of denormalization when creating the data model. The seminar will teach how to effectively accomplish this aspect of data modeling, while ensuring a quality data warehouse design. Concepts learned are reinforced by individual and group exercises
Tom Haughey is considered one of the four founding fathers of Information Engineering in is America. He is currently President of InfoModel, Inc., a training and consulting company specializing in practical and rapid development methods. His courses on data management, data warehousing, Information Engineering and software development have been delivered to Fortune 1000 companies around the world. He has worked on the development of seven different CASE tools, over 40,000 copies of which have been sold to date. He was formerly Chief Technology Officer for the Pepsi Bottling Group and Enterprise Director of Data Warehousing for Pepsico. He was also formerly Vice President of Technology for Computer Systems Advisers, who market the CASE tools called POSE and SILVERRUN. He wrote his own CASE tool in 1984.He formerly worked for IBM for 17 years as a Senior project manager. He is the author of many articles on Data Management, Information Engineering and Data Warehousing.
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- data available for use
- free data sets for research
- free data sets for students
- data sets for healthcare
- data available for download
- data analysis for research paper
- medical data sets for statistics projects
- available data sets for students
- simple data sets for statistics
- bayesian modeling for vancomycin
- mnps data warehouse website
- teacher modeling in the classroom