DBMS #04: Joins, Decomposition and Normalization

Checklist:

> To understand data redundancies and anomalies.

> To check a relational decomposition for generation of spurious records and preservation properties.

> To achieve a normalized decomposition of a relation.

4.1. Natural Joins:

-        To combine two relations using common attributes(s); joins only those records where the common attribute(s) match

4.2. Data Redundancy and Anomalies:

-        Functional dependencies hold in the entire database, not just single relations.

-        A bad database design has redundant storage of data and may allow spurious records to be generated.

Update anomaly: update of an attribute in one record needs to be reflected in all records with the same attribute value. Data inconsistency arises when any relevant record is not updated.

Insert anomaly: to insert a new domain value for some attribute, some record must be created.

Delete anomaly: deleting a record could also delete some value of the domain of an attribute.

The issues of data redundancy are resolved by data decomposition of the initial relation into multiple tables. A valid decomposition of a relation R into R1, R2, …, Rn is when UNION (R1, R2, …, Rn) is the relation R itself i.e., the attributes are preserved.

Observations:

1. Non-trivial FDs may create redundancies in the database.

2. Trivial FDs cannot redundancies

3. If the LHS is not a super key, it may create redundancies. Since BCNF is free from such FDs, we can say that BCNF is void of FD-induced redundancies.

4. More than one many-to-many relationships in a single table causes redundancies (multi-valued dependencies)

4.3. Lossless (Non-Additive Join) and Lossy Decompositions:

The decompositions can be lossless or lossy. A lossless decomposition never gives wrong/ spurious records. A lossy decomposition may generate spurious records for some instance. While original records are maintained, spurious records also occur.

Lossy decomposition: R is a proper subset of R1 JOIN R2

Lossless decomposition: R = R1 JOIN R2

Checking if a Binary Decomposition is Lossless or Lossy:

We determine this by looking at the schema, not the instance. A binary decomposition is lossless when the common attributes in both the relations is a super key in at least one of them.

Checking if a Non-Binary Decomposition is Lossless or Lossy:

For this, we have the Chase algorithm. [Refer other resources]

4.4. Dependency Preserving Decompositions:

When an FD is lost, it is with respect to the individual relations. If the decomposition is lossless, we can still bring back the lost FDs simply by joining the decomposed relations.

When (F1 U F2 U ... U Fn)+ covers F, then we call it dependency preserving i.e. FDs are enforced without the need of joining. Remember that joining tables is an expensive operation. The dependency preserving property is desirable, but not necessary since we can still enforce FDs by joining relations, provided the decomposition is lossless. Remember that enforcement of FDs is done before insertion or modification.

4.5. BCNF Decomposition

It is important to understand that despite a database being in BCNF, the database design might not be good. A given decomposition can be in BCNF and be lossy. When performing a BCNF decomposition, we try to also achieve a lossless and dependency preserving decomposition.

When we are not concerned about the decomposition being lossless, we can simply decompose the relation to sub relations of two attributes each i.e., a trivial decomposition.

BCNF Decomposition Algorithm

This algorithm will guarantee a lossless, BCNF decomposition of a relation.

While the relation is not in BCNF, take a non-trivial, violating functional dependency, XàY. Create two sub-relations, one consisting of attributes in X+, and the other consisting of X and (R-X+).

Observations:

It is important to note that for some relations, it is not possible to achieve an ideal decomposition that is in BCNF, lossless, and dependency preserving. However, it is possible to achieve the first two of them for all relations.

4.6. 3NF Decomposition

The 3NF Synthesis Algorithm works on the minimal cover, and can give a 3NF decomposition that is both lossless and dependency preserving. Merge any functional dependency with the same LHS. Now, the attributes involved in each of these dependencies will become a sub-relation. If none of these sub-relations have a candidate key, make a separate relation for it. Finally, to simplify the relations, we can safely delete any relation whose attributes are already present in some larger relation i.e., a proper subset.

Observations:

This algorithm is a synthesis algorithm, meaning that the algorithm is guaranteed to generate a 3NF decomposition. This is particularly faster than checking if the relation is in 3NF or not.

Questions:

Q1: Consider the relation R(A, B, C) with the functional dependencies {A --> B, B --> C, A --> C} decomposed as R1(A, B) and R2(A, C). Show that the decomposed relations MUST be joined to preserve all the functional dependencies of R. [Hint: Show that the decomposition is lossless but not dependency-preserving]

Q2: Consider the relation R(A, B, C, D) and functional dependencies{AàB, BàC}.

(a)   Show that R is not in BCNF.

(b)   Give a decomposition of R that is both in BCNF and lossless.

Q3: Consider a relation R that is in BCNF. Is it possible that the 3NF synthesis algorithm can make R in 3NF but not in BCNF? [Hint: consider the form of functional dependencies that R may have]


Popular posts from this blog

DBMS #05: Data Independence

DBMS #01: Introduction to DBMS