Functional Dependencies

last updated 24-sep-12

Objectives of Normalization

Develop a good description of the data, its relationships and constraints

Produce a stable set of relations that

Normal Forms

Each is contained within the previous form – each has stricter rules than the previous form


Limitations of E-R Designs

E-R modeling provides a set of guidelines, but does not result in a unique database schema.

Nor does it provide a way of evaluating alternative schemas.

Normalization theory provides a mechanism for analyzing and refining the schema produced by an E-R design, or any other design.

 

 

 

 


Redundancy

Dependencies between attributes within a relation cause redundancy

Ex. All addresses in the same town have the same zip code

SSN Name Town Zip
1234 Joe Huntingdon 16652
2345 Mary Huntingdon 16652
3456 Tom Huntingdon 16652
5948 Harry Alexandria 16603

There's clearly redundant information stored here. 

Consistency and integrity are harder to maintain even in this simple example, e.g., ensuring the fact that the zip code always refers the same city and the city is spelled consistently.

Note we don't have a zip code to city fact stored unless there is a person from that zipcode

 


Redundancy and Other Problems

Set-valued or multi-valued attributes in the E-R diagram result in multiple rows in corresponding table

Example: Person (SSN, Name, Address, Hobbies)


Anomalies

An anomaly is an inconsistent, incomplete, or contradictory state of the database

Redundancy leads to the following anomalies:

Update anomaly: A change in Address must be made in several places. Updating one fact may require updating multiple tuples.

Deletion anomaly: Deleting one fact may delete other information. Suppose a person gives up all hobbies. Do we:

Set Hobby attribute to null? No, since Hobby is part of key

Delete the entire row? No, since we lose other information in the row

Insertion anomaly: To record one fact may require more information than is available. Hobby value must be supplied for any inserted row since Hobby is part of key

 


Decomposition

Solution: use two relations to store Person information

Person1 (SSN, Name, Address)

Hobbies (SSN, Hobby)

The decomposition is more general: people with hobbies can now be described

No update anomalies:

Name and address stored once

A hobby can be separately supplied or deleted

Decomposition is the process of breaking a relation into two or more relations to eliminate the redundancies and corresponding anomalies.

 


Normalization Theory

The result of E-R analysis needs further refinement.

Appropriate decomposition can solve problems.  What is appropriate?

The underlying theory is referred to as normalization theory and is based on functional dependencies (and other kinds, like multivalued dependencies)

 

 


Informal Guidelines for Relation Design

Want to keep the semantics of the relation attributes clear.  The information in a tuple should represent exactly one fact or an entity.  The hidden or buried entities are what we want to discover and eliminate.

Minimize the use of null values. Nulls have multiple interpretations:

If nulls are likely (non-applicable) then consider decomposition of the relation into two or more relations that hold only the non-null valued tuples.

Too much decomposition of relations into smaller ones may also lose information or generate erroneous information

 

 


Functional Dependencies

FD's are constraints on well-formed relations and represent a formalism on the infrastructure of relation.

Definition: A functional dependency (FD) on a relation schema R is a constraint X → Y, where X and Y are subsets of attributes of R.

Definition: an FD is a relationship between an attribute "Y" and a determinant (1 or more other attributes) "X" such that for a given value of a determinant the value of the attribute is uniquely defined. 

Definition: An FD X → Y is satisfied in an instance r of R if for every pair of tuples, t and s: if t and s agree on all attributes in X then they must agree on all attributes in Y

A key constraint is a special kind of functional dependency: all attributes of relation occur on the right-hand side of the FD:

  • SSN → SSN, Name, Address

 

 

 


Example Functional Dependencies

Let R be
NewStudent(stuId, lastName, major, credits, status, socSecNo)

FDs in R include

ZipCodeAddressCity

ArtistName→BirthYear

AutobrandManufacturer, Engine type

Author, Title→PublDate

Trivial Functional Dependency

The FD XY is trivial if set {Y} is a subset of set {X}

Examples: If A and B are attributes of R,

are all trivial FDs and will not contribute to the evaluation of normalization.

 

 

 

 

 


FD Axioms

Understanding: Functional Dependencies are recognized by analysis of the real world; no automation or algorithm. Finding or recognizing them are the database designer's task.

FD manipulations:

Axiom Name Axiom Example
Reflexivity if a is set of attributes, b a, then a b SSN,Name → SSN
Augmentation if a b holds and c is a set of attributes, then cacb SSN → Name then
SSN,Phone → Name, Phone
Transitivity if a b holds and bc holds, then a c holds SSN →Zip and Zip → City then SSN →City
Union or Additivity * if a b and a c holds then a bc holds SSN→Name and SSN→Zip then SSN→Name,Zip
Decomposition or Projectivity* if a bc holds then a b and a c holds SSN→Name,Zip then SSN→Name and SSN→Zip
Pseudotransitivity* if a b and cb d hold then ac d holds Address → Project and Project,Date →Amount then Address,Date → Amount
(NOTE) ab c does NOT imply a b and b c  

*Armstrong's Axioms (basic axioms)

 

 

 


Closure

Find all FD's for attributes a in a relation R

a+ denotes the set of attributes that are functionally determined by a

IF attribute(s) a IS/ARE A SUPERKEY OF R THEN a+ SHOULD BE THE WHOLE RELATION R This is our goal.  Any attributes in a relation not part of the closure indicates a problem with the design.

Algorithm for Closure

result := a; //start with superkey a
WHILE (more changes to result) DO
     FOREACH ( FD b  c in R) DO
              IF b  result 
                  THEN result := result  c