# The Relational Data Model

last updated 11-sep-15

## History of Relational Model

• 1970 paper by E.F. Codd “A Relational Model of Data for Large Shared Data Banks” proposed relational model
• System R, prototype developed at IBM Research Lab at San Jose, California – late 1970s
• Peterlee Test Vehicle, IBM UK Scientific Lab
• INGRES, University of California at Berkeley, in Unix
• System R results were used in developing DB2 from IBM and also Oracle
• Early microcomputer based DBMSs were relational - dBase, R:base, Paradox

Microsoft’s Access, now most popular microcomputer-based DBMS, is relational

Oracle, DB2, Informix, Sybase, Microsoft’s SQL Server, MySQL, PostgreSQL- most popular enterprise DBMSs, all relational

• Based on the mathematical notions of a relation and of sets
• Can use power of mathematical abstraction
• Can develop body of results using theorem and proof method of mathematics – results then apply to many different applications
• Can use expressive, exact mathematical notation
• Theory provides tools for improving design
• Basic data structure are simple tables, easy to understand
• Separates logical from physical level
• Data operations easy to express, using a few powerful commands
• Operations do not require user to understand the complex storage or data structures used

## Data Structures

Relations are represented abstractly as tables

• Tables are related to one another
• Table holds information about similar types of objects or entities
• The rows are like elements of a set
• Do not confuse relations with the concept of a relationship

Rows (tuples) correspond to individual entities

• Each tuple is distinct – no duplicate tuples
• Order of tuples is immaterial
• Cardinality of relation = number of tuples
• This follows the ideas of elements of a set

Columns correspond to attributes

• Each column has a distinct name, the name of the attribute it represents
• Order of attributes in a table is not important
• Each cell contains at most one value
• and that value is drawn from one domain
• Domains consist of the set or range of atomic values
• Arity = number of attributes, sometimes called the degree of the relation

## Example: Relations

• Student table tells facts about students
• Faculty table shows facts about faculty
• Class table shows facts about classes, including what faculty member teaches each
• Enroll table relates students to classes

So a table ≠ entity.

• A table often does correspond to an entity but not always
• A table can represent a relationship (usually many-to-many)
• Not every relationship shows up as a table (one-to-many relationship often do not arise a table)

 Student stuId lastName firstName major credits S1001 Smith Tom History 90 S1002 Chin Ann Math 36 S1005 Lee Perry History 3 S1010 Burns Edward Art 63 S1013 McCarthy Owen Math 0 S1015 Jones Mary Math 42 S1020 Rivera Jane CSC 15

 Class classNumber facId schedule room ART103A F101 MWF9 H221 CSC201A F105 TuThF10 M110 CSC203A F105 MThF12 M110 HST205A F115 MWF11 H221 MTH101B F110 MTuTh9 H225 MTH103C F110 MWF11 H225

 Faculty facId name department rank F101 Adams Art Professor F105 Tanaka CSC Instructor F110 Byrne Math Assistant F115 Smith History Associate F221 Smith CSC Professor
 Enroll stuId classNumber grade S1001 ART103A A S1001 HST205A C S1002 ART103A D S1002 CSC201A F S1002 MTH103C B S1010 ART103A S1010 MTH103C S1020 CSC201A B S1020 MTH101B A

## Mathematical Relations

For two sets D1 and D2, the Cartesian product, D1 X D2 , is the set of all ordered pairs in which the first element is from D1 and the second is from D2. The domains for the two sets are abitrary.

If D1={a,b,c} and D2={x,y} then D1 X D2={(a,x), (a,y), (b,x), (b,y), (c,x), (c,y)}

A relation then is any subset of the Cartesian product, e.g., {(a,x), (a,y), (c,x)}

One can form a Cartesian product of 3 sets; a relation is any subset of the ordered triples so formed.

This can extend to n sets, using n-tuples

### Database Relations

A relation schema, named R, is a set of attributes A1, A2,…,An with their corresponding domains D1, D2,…Dn

A relation r on relation schema R is a set of mappings from the attributes to their domains,
or to say r is a set of n-tuples (A1:d1, A2:d2, …, An:dn) such that d1∈ D1, d2∈D2 , …, dn∈Dn

In a table to represent the relation, list the Ai's as column headings, and let the (d1, d2, …dn) become the n-tuples, the rows of the table

## Relation Schema

A schema defines the following

Relation name

Attribute names and domains

Integrity constraints - e.g.,:

• The values of a particular attribute in all tuples are unique
• The values of a particular attribute in all tuples are greater than 0

Default values

## Relational Database

• Finite set of relations
• Each relation consists of a schema definition and an instance of the relation
• Database schema = set of relation schemas (and other things)
• Database instance = set of (corresponding) relation instances

Example

• Student (StuId: INT, LastName: STRING, FirstName: STRING, major: STRING, credits DEC)
• Faculty (FacId: STRING, Name: STRING, Dept: DEPTS, Rank RANKS)
• Class (FacId: STRING, Schedule: STRING, Room: STRING, ClassNum: COURSES)
• Department(DeptId: DEPTS, Name: STRING)

TableName (attr1:type, attr2:type, ... ) is a simplified non-SQL description of the table.

A running example used in the course: Presidential DB

## Integrity Constraints

These constraints, determined by the database designer, are part of the schema.  They represent restrictions on the state (or sequence of states) of data base.  The constraints are enforced by the DBMS.

Intra-relational - involve only one relation -- are part of the relation definition

• e.g., all Ids are unique
• only certain values are permitted

Inter-relational - involve several relations -- are part of the relation schema or database schema

### Kinds of Integrity Constraints (IC)

Static IC is a limitation on the state of the database

• Syntactic (structural)
e.g., all values in a column must be unique
• Semantic (involve meaning of attributes)
e.g., cannot register for more than 18 credits

Dynamic IC  is a limitation on the sequence of  the database states (supported by some DBMSs, but not in current the SQL standard)

• e.g., cannot raise salary by more than 5%

## Relation Keys

Relations never have duplicate tuples, so you can always tell tuples apart; implies there is always a key (which may be a composite of all attributes, in worst case)

Superkey: set of attributes that uniquely identifies tuples

Candidate key: superkey such that no proper subset of itself is also a superkey (i.e. it has no unnecessary attributes)

Primary key: candidate key chosen for unique identification of tuples

Cannot verify a key by looking at an instance; need to consider semantic information (common sense and knowledge about the enterprise) to ensure uniqueness.

A foreign key is an attribute or combination of attributes that is the primary key of some relation (called its home relation). Usually the home relation is some other relation but there can be cases of self-referencing (recursuve relationship)

## Key Constraint

Values in a column (or columns) of a relation are unique: at most one row in a relation instance can contain a particular value(s)

Key - set of attributes satisfying key constraint

• e.g., Id in Student,
• e.g., (StudId, CrsCode, Semester) in Transcript

Minimality - no subset of a key is a key.  When you determine a key, this rule should be applied.

• (StudId, CrsCode) is not a key of Transcript

Superkey - set of attributes containing key

• (Id, Name) is a superkey of Student, but as a key, it's not minimal

Every relation has a key.  The goal is to determine the "best" key, but a relation can have several keys:

• primary key (Id in Student) – (cannot be null) -- only one is designated per relation
• candidate key ((Name, Address) in Student) is a potential key and sometimes used as information to the DBMS to set up an index for efficient lookup.

## Foreign Key Constraint

Also known as Referential integrity => Item named in one relation must correspond to tuple(s) in another that describes the item

Examples:

• Transcript (CrsCode) references Course(CrsCode )
• Professor(DeptId) references Department(DeptId)

We say "a1 is a foreign key of R1 referring to a2 in R2" meaining that "if v is the value of a1, then there is a unique tuple in R2 in which a2 has the same value v

• This is a special case of referential integrity: a2 must be a candidate key of R2 (CrsCode is a key of Course), e.g., not necessarily the primary key (often is, however)
• If no row exists in R2 then we have a violation of referential integrity
• Not all rows of R2 need to be referenced.: relationship is not symmetric (some course might not be taught)
• Value of a foreign key might not be specified (DeptId column of some professor might be null)

Example

Note the foreign key might consist of several columns:

• (CrsCode, Semester) of Transcript references (CrsCode, Sem) of Teaching

In general, when R1(a1, …an) references R2(b1, …bn):

• There exists a 1 - 1 relationship between a1,…an and b1,…bn
• ai and bi must have the same base domains (although not necessarily the same names)
• b1,…bn is a candidate key of R2

## Semantic Constraints

Domain, primary key, and foreign key are examples of structural (syntactic) constraints

Semantic constraints express rules of application:

• e.g., number of registered students <= maximum enrollment