DBMS #04: Joins, Decomposition and Normalization
Checklist:
> To understand data redundancies and anomalies.
> To check a relational decomposition for generation of spurious records and preservation properties.
> To achieve a normalized decomposition of a relation.
4.1. Natural
Joins:
-
To
combine two relations using common attributes(s); joins only those records
where the common attribute(s) match
4.2. Data
Redundancy and Anomalies:
-
Functional
dependencies hold in the entire database, not just single relations.
-
A
bad database design has redundant storage of data and may allow spurious
records to be generated.
Update
anomaly: update of an attribute in one record needs to be reflected in all
records with the same attribute value. Data inconsistency arises when any relevant
record is not updated.
Insert
anomaly: to insert a new domain value for some attribute, some record must be
created.
Delete
anomaly: deleting a record could also delete some value of the domain of an
attribute.
The issues
of data redundancy are resolved by data decomposition of the initial relation
into multiple tables. A valid decomposition of a relation R into R1, R2, …, Rn
is when UNION (R1, R2, …, Rn) is the relation R itself i.e., the attributes are
preserved.
Observations:
1. Non-trivial FDs may create redundancies in the database.
2. Trivial FDs cannot redundancies
3. If the LHS is not a super key, it may create redundancies. Since BCNF is free from such FDs, we can say that BCNF is void of FD-induced redundancies.
4. More than one many-to-many relationships in a single table causes redundancies (multi-valued dependencies)
4.3. Lossless
(Non-Additive Join) and Lossy Decompositions:
The decompositions
can be lossless or lossy. A lossless decomposition never gives wrong/ spurious
records. A lossy decomposition may generate spurious records for some instance.
While original records are maintained, spurious records also occur.
Lossy
decomposition: R is a proper subset of R1 JOIN R2
Lossless decomposition: R = R1 JOIN R2
Checking
if a Binary Decomposition is Lossless or Lossy:
We
determine this by looking at the schema, not the instance. A binary decomposition
is lossless when the common attributes in both the relations is a super key in
at least one of them.
Checking
if a Non-Binary Decomposition is Lossless or Lossy:
For this,
we have the Chase algorithm. [Refer other resources]
4.4. Dependency
Preserving Decompositions:
When an FD
is lost, it is with respect to the individual relations. If the decomposition is lossless,
we can still bring back the lost FDs simply by joining the decomposed
relations.
When (F1 U F2 U ... U Fn)+ covers F, then we call it dependency preserving i.e. FDs are enforced without the need of joining. Remember that joining tables is an expensive operation. The dependency preserving property is desirable, but not necessary since we can still enforce FDs by joining relations, provided the decomposition is lossless. Remember that enforcement of FDs is done before insertion or modification.
4.5. BCNF Decomposition
It is important to understand that despite a database being in BCNF, the database design might not be good. A given decomposition can be in BCNF and be lossy. When performing a BCNF decomposition, we try to also achieve a lossless and dependency preserving decomposition.
When we are not concerned about the decomposition being lossless, we can simply decompose the relation to sub relations of two attributes each i.e., a trivial decomposition.
BCNF Decomposition
Algorithm
This algorithm
will guarantee a lossless, BCNF decomposition of a relation.
While the
relation is not in BCNF, take a non-trivial, violating functional dependency, XàY. Create two sub-relations, one consisting of attributes
in X+, and the other consisting of X and (R-X+).
Observations:
It is important to note that for some relations, it is not possible to achieve an ideal decomposition that is in BCNF, lossless, and dependency preserving. However, it is possible to achieve the first two of them for all relations.
4.6. 3NF Decomposition
The 3NF Synthesis Algorithm works on the minimal cover, and can give a 3NF decomposition that is both lossless and dependency preserving. Merge any functional dependency with the same LHS. Now, the attributes involved in each of these dependencies will become a sub-relation. If none of these sub-relations have a candidate key, make a separate relation for it. Finally, to simplify the relations, we can safely delete any relation whose attributes are already present in some larger relation i.e., a proper subset.
Observations:
This algorithm is a synthesis algorithm, meaning that the algorithm is guaranteed to generate a 3NF decomposition. This is particularly faster than checking if the relation is in 3NF or not.
Questions:
Q1:
Consider the relation R(A, B, C) with the functional dependencies {A --> B,
B --> C, A --> C} decomposed as R1(A, B) and R2(A, C). Show that the
decomposed relations MUST be joined to preserve all the functional dependencies
of R. [Hint: Show that the decomposition is lossless but not dependency-preserving]
Q2: Consider the relation R(A, B, C, D) and functional dependencies{AàB, BàC}.
(a)
Show
that R is not in BCNF.
(b) Give
a decomposition of R that is both in BCNF and lossless.