last updated 31-oct-16

- 1970 paper by E.F. Codd “A Relational Model of Data for Large Shared Data Banks” proposed relational model
- System R, prototype developed at IBM Research Lab at San Jose, California – late 1970s
- Peterlee Test Vehicle, IBM UK Scientific Lab
- INGRES, University of California at Berkeley, in Unix
- System R results were used in developing DB2 from IBM and also Oracle
- Early microcomputer based DBMSs were relational - dBase, R:base, Paradox

Microsoft’s Access, now most popular microcomputer-based DBMS, is relational

Oracle, DB2, Informix, Sybase, Microsoft’s SQL Server, MySQL, PostgreSQL- most popular enterprise DBMSs, all relational

- Based on the mathematical notions of a
**relation**and of**sets**- Can use power of mathematical abstraction
- Can develop body of results using theorem and proof method of mathematics – results then apply to many different applications
- Can use expressive, exact mathematical notation
- Theory provides tools for improving design

- Basic data structure are simple tables, easy to understand
- Separates logical from physical level
- Data operations easy to express, using a few powerful commands
- Operations do not require user to understand the complex storage or data structures used

** Relations are represented abstractly as tables**

- Tables are related to one another
- Table holds information about similar types of objects or entities
- The rows are like elements of a set
- Do not confuse relations with the concept of a relationship

**Rows ( tuples) correspond to individual entities**

- Each tuple is distinct – no duplicate tuples
- Order of tuples is immaterial
- Cardinality of relation = number of tuples
- This follows the ideas of elements of a set

**Columns correspond to attributes**

- Each column has a distinct name, the name of the attribute it represents
- Order of attributes in a table is not important
- Each cell contains at most one value
- and that value is drawn from one domain
- Domains consist of the set or range of atomic values
**Arity**= number of attributes, sometimes called the degree of the relation

**Student**table tells facts about students-
**Faculty**table shows facts about faculty -
**Class**table shows facts about classes, including what faculty member teaches each -
**Enroll**table**relates**students to classes

So a table ≠ entity.

- A table often does correspond to an entity but not always; most entities are tables
- A table can represent a relationship (usually many-to-many)
- Not every relationship shows up as a table (one-to-many relationship often do not arise a table)

For two sets D_{1} and D_{2}, the Cartesian
product, D_{1} X
D_{2} , is the set of all ordered pairs in which the first element
is from D_{1} and the second is from D_{2}. The domains for the two sets are abitrary.

If *D _{1}={a,b,c}* and

A relation then is any subset of the Cartesian product, e.g., *{(a,x), (a,y), (c,x)}*

One can form a Cartesian product of 3 sets; a relation is any subset of the ordered triples so formed.

This can extend to n sets, using n-tuples

A relation schema, named * R*, is a set of attributes A

A relation * r *on relation schema

or to say

In a table to represent the relation, list the * A_{i}*'s as column
headings, and let the (

A schema defines the following

Relation name

Attribute namesanddomains

Integrity constraints- e.g.,:

- The values of a particular attribute in all tuples are unique
- The values of a particular attribute in all tuples are greater than 0

Default values

- Finite set of relations
- Each relation consists of a schema definition and an instance of the relation
- Database schema = set of relation schemas (and other things)
- Database instance = set of (corresponding) relation instances

Example

**Student**(StuId: INT, LastName: STRING, FirstName: STRING, major: STRING, credits DEC)**Faculty**(FacId: STRING, Name: STRING, Dept: DEPTS, Rank RANKS)**Class**(FacId: STRING, Schedule: STRING, Room: STRING, ClassNum: COURSES)**Enroll**(ClassNum: COURSES, StudId: DEC, Grade: GRADES)**Department**(DeptId: DEPTS, Name: STRING)

**TableName** (attr_{1}:type, attr_{2}:type, ... ) is a
simplified non-SQL description of the table.

A running example used in the course: Presidential DB

These constraints, determined by the database designer, are part of the schema. They represent restrictions on the state (or sequence of states) of data base. The constraints are enforced by the DBMS.

**Intra-relational **- involve only one relation -- are part of the relation
definition

- e.g., all Ids are unique
- only certain values are permitted

**Inter-relational **- involve several relations -- are part of the relation
schema or database schema

**Static IC** is a limitation on the state of the database

- Syntactic (structural)

e.g., all values in a column must be unique - Semantic (involve meaning of attributes)

e.g., cannot register for more than 18 credits

**Dynamic IC ** is a limitation on the sequence of the database
states (supported by some DBMSs, but not in current the SQL standard)

- e.g., cannot raise salary by more than 5%

Relations never have duplicate tuples, so you can always tell tuples apart; implies there is always a key (which may be a composite of all attributes, in worst case)

**Superkey:** set of attributes that uniquely identifies tuples

**Candidate key**: superkey such that no proper subset of itself is also
a superkey (i.e. it has no unnecessary attributes)

**Primary key**: candidate key chosen for unique identification of tuples

Cannot verify a key by looking at an instance; need to consider semantic information (common sense and knowledge about the enterprise) to ensure uniqueness.

A **foreign key** is an attribute or combination of attributes that is
the primary key of some relation (called its home relation). Usually the home
relation is some other relation but there can be cases of self-referencing (recursuve
relationship)

Values in a column (or columns) of a relation are unique: at most one row in a relation instance can contain a particular value(s)

**Key** - set of attributes satisfying key constraint

- e.g., Id in Student,
- e.g., (StudId, CrsCode, Semester) in Transcript

**Minimality** - no subset of a key is a key. When you determine a
key, this rule should be applied.

- (StudId, CrsCode) is not a key of Transcript

**Superkey** - set of attributes containing key

- (Id, Name) is a superkey of Student, but as a key, it's not minimal

Every relation has a key. The goal is to determine the "best" key, but a relation can have several keys:

**primary**key (Id in Student) – (cannot be null) -- only one is designated per relation**candidate**key ((Name, Address) in Student) is a potential key and sometimes used as information to the DBMS to set up an index for efficient lookup.

Also known as** Referential integrity** => Item named in one relation must correspond to
tuple(s) in another that describes the item

Examples:

*Transcript (CrsCode) references Course(CrsCode )***Professor(DeptId) references Department(DeptId)**

We say "a_{1} is a **foreign key** of R_{1} referring to a_{2}
in R_{2}" meaining that "if v is the value of a_{1}, then there
is a *unique* tuple in R_{2} in which a_{2} has the same
value v

- This is a special case of referential integrity: a
_{2}must be a candidate key of R_{2}(CrsCode is a key of Course), e.g., not necessarily the primary key (often is, however) - If no row exists in R
_{2}then we have a violation of referential integrity - Not all rows of R
_{2}need to be referenced.: relationship is not symmetric (some course might not be taught) - Value of a foreign key might not be specified (DeptId column of some
professor might be
*null*)

Example

Note the foreign key might consist of several columns:

- (CrsCode, Semester) of Transcript references (CrsCode, Sem) of Teaching

In general, when R_{1}(a_{1}, …a_{n}) references R_{2}(b_{1},
…b_{n}):

- There exists a 1 - 1 relationship between a
_{1},…a_{n}and b_{1},…b_{n} - a
_{i}and b_{i}must have the same base domains (although not necessarily the same names) - b
_{1},…b_{n}is a candidate key of R_{2}

Domain, primary key, and foreign key are examples of structural (syntactic) constraints

**Semantic constraints** express rules of application:

- e.g., number of registered students <= maximum enrollment