Database Nortimalizaon



Database Normalimalizaon

Introduction

One of the more crucial topics in the area of database management is the process of normalizing the tables in a relational database.

The underlying ideas in normalization are simple enough. Through normalization we want to design for our relational database a set of files that (1) contain all the data necessary for the purposes that the database is to serve. (2) have as little redundancy as possible. (3) Accommodate multiple values for types of data that requires them. (4) Permit efficient updates of the data in the database and (5) avoid the danger of losing data unknowingly.

The primary reason for normalizing database to at least the level of the 3rd normal from (the levels are explained below) is that normalization is a potent weapon against the possible corruption of database stemming from what are called “insertion anomalies”,”deletion anomalies”and”update anomalies”. These types of error can creep into databases that are insufficiently normalized.

What is normalization?

Normalization is the process of efficiently data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

Normalization can be viewed as a series of steps designes, one after another, to deal with ways in which tables can be “too complicated for their own good”. The purpose of normalization is to reduce the chances for anomalies to occur in a database. The definitions of the various levels of normalization illustrate complications to be eliminated in order to reduce the chances of anomalies.

At all levels and in very case of table with a complication, the resolution of the problem turns out to be the establishment of two or more simpler tables which, as a group contain the same information as the original table, but because of their simpler individual structures, lake the complication.

|Pros of normalization |Cons of normalization |

|More efficient database structure. |You can’t start building the database before you know what the |

|Better understanding of your data. |user needs. |

|More flexible database structure. | |

|Easier to maintain database structure. | |

|Few (if any) costly surprises down the road. | |

|Validates your common sense and intuition. | |

|Avoid redundant fields. | |

|Insure that distinct tables exist when necessary. | |

1st Normal Form (1NF):

A. ) Table (relation) is in 1NF if:

1. There are no duplicated rows in the table.

2. Each cell is single-valued (no repeating groups or arrays).

3. Entries in a column (field) are of the same kind.

*The requirement that there be no duplicated rows in the table means that the tables has a key (although the key might be made up of more than one column, even possibly, of all the columns

2nd Normal Form (2NF):

A table is in 2NF if is in 1NF and if all non-key attributes are dependent on all of the key. Since a partial dependency occurs when a non-key attribute is dependent on only a part of the composite key, the definition of 2NF is sometimes phrased as, “A table is in 2NF if it is in 1NF and if it has no partial dependencies”.

3rd Normal Form (3NF):

A table is in 3NF if it is in 3NF and if every determinant is a candidate key.

Boyce-Codd Normal Form (BCNF):

A table is in BCNF if is in 3NF and if every determinant is a candidate key.

4th Normal Form (4NF):

A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies.

5th Normal Form (5NF):

A table is in 5NF, also called “Projection-join Normal Form” (PJNF), if it is in 4NF and if every join dependency in the table is a consequence of the candidate keys of the table.

[pic]

[pic]

[pic]

Domain-Key Normal Form (DKNF)

A table is in DKNF if every constraint on the table is a logical consequence of the definition of keys and domains.

Insertion Anomaly

It is a failure to place information about a new database entryinto all the places in the database where information about the new entry needs to be stored. In a properly normalized database, information about a new entry needs to be inserted into only one place in the database, in an inadequatly normalized database, information about a new entry may need to be inserted into more than one place, and human fallibility being what it is, some of the needed additional insertionsmay be missed.

Deletion Anomaly

It is a failure to remove information about an existing database entry when it is time to remove that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry needs to be deletedfrom only one place in the database, in an inadequatly normalized database, information about that old entry may need to be deleted from more than one place.

Update Anomaly

An update of a database involves modifications that may be additions, deletions, or both. Thus “update anomalies” can be either of the kinds discussed above.

All three kinds of anomalies are highly undesirable, since thier occurence constitutes corruption of the database. Properly normalized database are much less susceptible to corruption than are un-normalized databases.

Frist normal form (1NF):

Rule 1: Eliminate Repeating Groups. Make a separate table for each set of related attributes, and give each table a primary key.

First Normal Form is a relation in which the intersection of each row and column contains one and only one value.

There are two approaches to removing repeating groups from unnormalized tables:

1. Removes the repeating groups by entering appropriate data in the empty columns of rows containing the repeating data.

2. Removes the repeating group by placing the repeating data, along with a copy of the original key attribute(s), in a separate relation. A primary key is identified for the new relation.

1NF ClientRental relation with the first approach

The ClientRental relation is defined as follows,

ClientRental(clientNo,propertyNo,cName,pAddress,rentStart,rentFinish,rent, ownerNo, oName)

With the first approach, we remove the repeating group(property rented details) by entering the appropriate client data into each row.

|ClientNo |propertyNo |

|CR76 |John Kay |

|CR56 |Aline Stewart |

|ClientNo |propertyNo |

|CR76 |John Kay |

|CR56 |Aline Stewart |

Rental

|ClientNo |propertyNo |rentStart |rentFinish |

|CR76 |PG4 |1-Jul-00 |31-Aug-01 |

|CR76 |PG16 |1-Sep-02 |1-Sep-02 |

|CR56 |PG4 |1-Sep-99 |10-Jun-00 |

|CR56 |PG36 |10-Oct-00 |1-Dec-01 |

|CR56 |PG16 |1-Nov-02 |1-Aug-03 |

PropertyOwner

|propertyNo |pAddress |rent |ownerNo |oName |

|PG4 |6 lawrence St,Glasgow |350 |CO40 |Tina Murphy |

|PG16 |5 Novar Dr, Glasgow |450 |CO93 |Tony Shaw |

|PG36 |2 Manor Rd, Glasgow |370 |CO93 |Tony Shaw |

Third Normal Form (3NF):

Rule 3: Eliminate columns not dependent on key. If attributes do not contribute to a description of the key, remove them to a separate table.

Transitive dependency

A condition where A, B, and C are attributes of a relation such thatm if A ( B and B ( C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).

A relation that is in first and second normal form, and in which no non-primary-key attribute is transitively dependent on the primary key.

The normalization of 2NF relations to 3NF involves the removal of transitive dependencies by placing the attribute(s) in a new relation along with a copy of the determinant.

3NF ClientRental relation

The functional dependencies for the Client, Rental and PropertyOwner relations are as follows:

Client

fd2 clientNo ( cName (Primary Key)

Rental

fd1 clientNo, propertyNo ( rentStart, rentFinish (Primary Key)

fd5 clientNo, rentStart ( propertyNo, rentFinish (Candidate key)

fd6 propertyNo, rentStart ( clientNo, rentFinish (Candidate key)

PropertyOwner

fd3 propertyNo ( pAddress, rent, ownerNo, oName (Primary Key)

fd4 ownerNo ( oName (Transitive Dependency)

The resulting 3NF relations have the forms:

Client (clientNo, cName)

Rental (clientNo, propertyNo, rentStart, rentFinish)

PropertyOwner (propertyNo, pAddress, rent, ownerNo)

Owner (ownerNo, oName)

Client

|ClientNo |cName |

|CR76 |John Kay |

|CR56 |Aline Stewart |

Rental

|Rental |Rental |Rental |Rental |

|Rental |Rental |Rental |Rental |

|Rental |Rental |Rental |Rental |

|Rental |Rental |Rental |Rental |

|Rental |Rental |Rental |Rental |

|Rental |Rental |Rental |Rental |

PropertyOwner

|propertyNo |pAddress |rent |ownerNo |

|PG4 |6 lawrence St,Glasgow |350 |CO40 |

|PG16 |5 Novar Dr, Glasgow |450 |CO93 |

|PG36 |2 Manor Rd, Glasgow |370 |CO93 |

Owner

|ownerNo |oName |

|CO40 |Tina Murphy |

|CO93 |Tony Shaw |

Boyce-Codd Normal Form (BCNF):

A relation is in BCNF, if and only if, every determinant is a candidate key.

The difference between 3NF and BCNF is that for a functional dependency A ( B, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key, Where as BCNF insists that for this dependency to remain in a relation, A must be a candidate key.

Example of BCNF

fd1 clientNo,interviewDate ( interviewTime, staffNo, roomNo(Primary Key)

fd2 staffNo, interviewDate, interviewTime( clientNo (Candidate key)

fd3 roomNo,interviewDate,interviewTime ( clientNo, staffNo (Candidate key)

fd4 staffNo, interviewDate ( roomNo (not a candidate key)

As a consequece the ClientInterview relation may suffer from update anmalies.

For example, two tuples have to be updated if the roomNo need be changed for staffNo SG5 on the 13-May-02.

ClientInterview

|ClientNo |interviewDate |interviewTime |staffNo |roomNo |

|CR76 |13-May-02 |10.30 |SG5 |G101 |

|CR76 |13-May-02 |12.00 |SG5 |G101 |

|CR74 |13-May-02 |12.00 |SG37 |G102 |

|CR56 |1-Jul-02 |10.30 |SG5 |G102 |

Example of BCNF(2)

To transform the ClientInterview relation to BCNF, we must remove the violating functional dependency by creating two new relations called Interview and SatffRoom as shown below,

Interview (clientNo, interviewDate, interviewTime, staffNo)

StaffRoom(staffNo, interviewDate, roomNo)

Interview

|ClientNo |interviewDate |interviewTime |staffNo |

|CR76 |13-May-02 |10.30 |SG5 |

|CR76 |13-May-02 |12.00 |SG5 |

|CR74 |13-May-02 |12.00 |SG37 |

|CR56 |1-Jul-02 |10.30 |SG5 |

StaffRoom

|staffNo |interviewDate |roomNo |

|SG5 |13-May-02 |G101 |

|SG37 |13-May-02 |G102 |

|SG5 |1-Jul-02 |G102 |

Forth Normal Form (4NF):

Rule 4: Isolate independent multiple relationships. No table may contain two or more 1:n (one-to-many) or n:m (many-to-many) relationships that are not directly related.

Multi-valued dependency (MVD) represents a dependency between attributes (for example, A, B and C) in a relation, such that for each value of A there is a set of values for B and a set of value for C. However, the set of values for B and C are independent of each other.

A multi-valued dependency can be further defined as being trivial or nontrivial. A MVD A ( B in relation R is defined as being trivial if

B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied.

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependenc

Normal Form (5NF):

A relation that has no join dependency.

Lossless-join dependency

A property of decomposition, which ensures that no spurious tuples are generated when relations are reunited through a natural join operation.

Join dependency

Describes a type of dependency. For example, for a relation R with subsets of the attributes of R denoted as A, B, …, Z, a

Relation R satisfies a join dependency if, and only if, every legal value of R is equal to the join of its projections on A, B, …, Z.

2. What is denormalization? Advantage and disadvantages.

Denormalization is the process of attempting to optimize the performance of a database by adding redundant data or by grouping data. In some cases, denormalization helps cover up the inefficiencies inherent in relational database software as of 2008. A relational normalized database imposes a heavy access load over physical storage of data even if it is well tuned for high performance.

Denormalization is usually done to decrease the time required to execute complex queries. Drawbacks of a normalized database are mostly in performance. In a normalized database, more joins are required to gather all the information from multiple entities, as data is divided and stored in multiple entities rather than in one large table. Queries that have a lot of complex joins will require more CPU usage and will adversely affect performance. Sometimes, it is good to denormalize parts of the database. Examples of design changes to denormalize the database and improve performance are:

Add a column (or columns) to the table that contains pre-aggregated data to be used only for a report.

Partition the table with many columns to multiple tables.

Add duplicate keys to tables. This will reduce the number of joins required to get complete information.

The reason for denormalization Only one valid reason exists for denormalizing a relational design - to enhance performance. However, there are several indicators which will help to identify systems and tables which are potential denormalization candidates. These are:

Many critical queries and reports exist which rely upon data from more than one table. Often times these requests need to be processed in an on-line environment.

Repeating groups exist which need to be processed in a group instead of individually. Many calculations need to be applied to one or many columns before queries can be successfully answered.Tables need to be accessed in different ways by different users during the same timeframe.

Many large primary keys exist which are clumsy to query and consume a large amount of DASD when carried as foreign key columns in related tables. Certain columns are queried a large percentage of the time. Consider 60% or greater to be a cautionary number flagging denormalization as an option.

Be aware that each new RDBMS release usually brings enhanced performance and improved access options that may reduce the need for denormalization. However, most of the popular RDBMS products on occasion will require denormalized data structures. There are many different types of denormalized tables which can resolve the performance problems caused when accessing fully normalized data. The following topics will detail the different types and give advice on when to implement each of the denormalization types.

[pic]

[pic]

Disadvantage of denormalization:

1. Minimizing the need for joins.

2. Precomputing aggregate values, that is, computing them at data modification time, rather than at select time.

3. Reducing the number of tables, in some cases.

4. Avoids data modification (INSERT/DELETE/UPDATE) anomalies as each data item lives in One place.

5. Greater flexibility in getting the expected data in atomic granular.

6. Normalization is conceptually cleaner and easier to maintain and change as your needs change.

7. Fewer null values and less opportunity for inconsistency.

8. Better handle on database security.

9. Increased storage efficiency.

Disadvantages of denormalization:

1. Denormalization usually speeds retrieval but can slow updates. This is not a real concern in a DSS environment.

2. Denormalization is always application-specific and needs to be re-evaluated if the application changes.

3. Denormalization can increase the size of tables. This is not a problem in Sybase IQ, because you can optimize the storage of column data. For details, see the IQ UNIQUE column constraint in CREATE TABLE statement and “MAX_QUERY_TIME option” in Reference: Statements and Options.

4. In some instances, denormalization simplifies coding; in others, it makes it more complex.

5. Requires much more CPU, memory, and I/O to process thus normalized data gives reduced database performance.

6. Requires more joins to get the desired result. A poorly-written query can bring the database down Maintenance overhead.

7. The higher the level of normalization, the greater the number of tables in the database.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download