Functional Dependencies
last updated 25-sep-15
Develop a good description of the data, its relationships and constraints
Produce a stable set of relations that
Each is contained within the previous form – each has stricter rules than the previous form
E-R modeling provides a set of guidelines, but does not result in a unique database schema.
Nor does it provide a way of evaluating alternative schemas.
Normalization theory provides a mechanism for analyzing and refining the schema produced by an E-R design, or any other design.
Dependencies between attributes within a relation cause redundancy
Ex. All addresses in the same town have the same zip code
SSN | Name | Town | Zip |
---|---|---|---|
1234 | Joe | Huntingdon | 16652 |
2345 | Mary | Huntingdon | 16652 |
3456 | Tom | Huntingdon | 16652 |
5948 | Harry | Alexandria | 16603 |
There's clearly redundant information stored here.
Consistency and integrity are harder to maintain even in this simple example, e.g., ensuring the fact that the zip code always refers the same city and the city is spelled consistently.
Note we don't have a zip code to city fact stored unless there is a person from that zipcode
Set-valued or multi-valued attributes in the E-R diagram result in multiple rows in corresponding table
Example: Person (SSN, Name, Address, Hobbies)
The relation Person can’t describe people without hobbies
but more important is the replication of what would be the key value
SSN | Name | Address | Hobbies |
---|---|---|---|
1111 | Joe | 123 Main | hiking |
1111 | Joe | 123 Main | biking |
2222 | Mary | 321 Elm | lacross |
An anomaly is an inconsistent, incomplete, or contradictory state of the database
Redundancy leads to the following anomalies:
Update anomaly: A change in Address must be made in several places. Updating one fact may require updating multiple tuples.
Deletion anomaly: Deleting one fact may delete other information. Suppose a person gives up all hobbies. Do we:
Set Hobby attribute to null? No, since Hobby is part of key
Delete the entire row? No, since we lose other information in the row
Insertion anomaly: To record one fact may require more information than is available. Hobby value must be supplied for any inserted row since Hobby is part of key
Solution: use two relations to store Person information
Person1 (SSN, Name, Address)
Hobbies (SSN, Hobby)
The decomposition is more general: people with hobbies can now be described
No update anomalies:
Name and address stored once
A hobby can be separately supplied or deleted
Decomposition is the process of breaking a relation into two or more relations to eliminate the redundancies and corresponding anomalies.
The result of E-R analysis needs further refinement.
Appropriate decomposition can solve problems. What is appropriate?
The underlying theory is referred to as normalization theory and is based on functional dependencies (and other kinds, like multivalued dependencies)
Want to keep the semantics of the relation attributes clear. The information in a tuple should represent exactly one fact or an entity. The hidden or buried entities are what we want to discover and eliminate.
Minimize the use of null values. Nulls have multiple interpretations:
If nulls are likely (non-applicable) then consider decomposition of the relation into two or more relations that hold only the non-null valued tuples.
Too much decomposition of relations into smaller ones may also lose information or generate erroneous information
Definition: A functional dependency (FD) on a relation schema R is a constraint X → Y, where X and Y are subsets of attributes of R.
Definition: an FD is a relationship between an attribute "Y" and a determinant (1 or more other attributes) "X" such that for a given value of a determinant the value of the attribute is uniquely defined.
Definition: An FD X → Y is satisfied in an instance r of R if for every pair of tuples, t and s: if t and s agree on all attributes in X then they must agree on all attributes in Y
A key constraint is a special kind of functional dependency: all attributes of relation occur on the right-hand side of the FD:
Let R be
NewStudent(stuId, lastName, major, credits, status, socSecNo)
FDs in R include
ZipCode→AddressCity
ArtistName→BirthYear
Autobrand→Manufacturer, Engine type
Author, Title→PublDate
The FD X→Y is trivial if set {Y} is a subset of set {X}
Examples: If A and B are attributes of R,
are all trivial FDs and will not contribute to the evaluation of normalization.
Understanding: Functional Dependencies are recognized by analysis of the real world; no automation or algorithm. Finding or recognizing them are the database designer's task.
FD manipulations:
Axiom Name | Axiom | Example |
---|---|---|
Reflexivity | if a is set of attributes, b ⊆ a, then a →b | SSN,Name → SSN |
Augmentation | if a→ b holds and c is a set of attributes, then ca→cb | SSN → Name then SSN,Phone → Name, Phone |
Transitivity | if a →b holds and b→c holds, then a→ c holds | SSN →Zip and Zip → City then SSN →City |
Union or Additivity * | if a → b and a → c holds then a→ bc holds | SSN→Name and SSN→Zip then SSN→Name,Zip |
Decomposition or Projectivity* | if a → bc holds then a → b and a → c holds | SSN→Name,Zip then SSN→Name and SSN→Zip |
Pseudotransitivity* | if a → b and cb → d hold then ac → d holds | Address → Project and Project,Date →Amount then Address,Date → Amount |
(NOTE) | ab→ c does NOT imply a → b and b → c |
*Armstrong's Axioms (basic axioms)
Find all FD's for attributes a in a relation R
a^{+} denotes the set of attributes that are functionally determined by a
IF attribute(s) a IS/ARE A SUPERKEY OF R THEN a^{+} SHOULD BE THE WHOLE RELATION R. This is our goal. Any attributes in a relation not part of the closure indicates a problem with the design.
Algorithm for Closure
result := a; //start with superkey a WHILE (more changes to result) DO FOREACH ( FD b → c in R) DO IF b ⊆ result THEN result := result ∪ c