# Relational Algebra

last updated 13-sep-18

## Relational Query Languages

Languages for describing queries on a relational database

Structured Query Language (SQL)

• Predominant application-level query language
• Declarative

Relational Algebra

• Intermediate language used within DBMS
• Procedural

## What is an Algebra?

• A language based on operators and a domain of values
• Operators map values taken from the domain into other domain values
• Hence, an expression involving operators and arguments produces a value in the domain
• When the domain is a set of all relations (and the operators are as described later), we get the relational algebra
• We refer to the expression as a query and the value produced as the query result

## Relational Algebra

• Domain: set of relations
• Based on set theory
• Contains extensions to manipulate tables
• Functional language
• Procedural, i.e., order to operations, algorithm implicit in the functional evaluation

## Relational Algebra Operations

Below are fundamental operations that are "complete".  That is, this set of operations alone can define any retrieval.

• Select
• Project
• Rename
• Union
• Set Difference
• Cartesian Product

Convenient, natural additions to the set of operations makes

• Set Intersection
• Natural Join
• Division
• Assignment

## Select Operation

Choose a subset of tuples from a relation based on some criteria, results in another relation called a "result set"

Notation uses lower case sigma:

σcondition(relation)

condition is a boolean expression in which rows are selected/kept/included where the condition is true

• attribute <comparison> constant
• attribute1 <comparison> attribute2
• <comparison> is the usual <, >, <=, >=, =, and <> (not equal)
• attribute is null or is not null
• and, or, and not logical operations
• substring operations and pattern matching

## Examples

• Get the Democrat Presidents
σParty="Democrat"(PRESIDENT)
• Get the living Democrat Presidents
σParty="Democrat" Λ DeathDate is NULL(PRESIDENT)
1123 John 123 Main stamps
1123 John 123 Main coins
5556 Mary 5 Lake Dr. hiking
9876 Bart 10Pine St. stamps
• Get those persons whose hobby is stamps
• σId>3000 OR Hobby=‘hiking’ (Person)
• σ Id>3000 AND Id <3999 (Person)
• σ NOT(Hobby=‘stamps’)(Person)
• σHobby'hiking’ (Person)

## Project Operation

• Produce a subset of attributes from a relation
• Unselected columns are eliminated
• Duplicate rows are eliminated
• Result is a relation

Notation is lower case pi

πattribute- list(relation)

Project examples

• "List the political parties that have had Presidents"
• "List the states that have had Presidents"
• πName,Hobby(Person)

## Expressions

Compose these operations into a sequence to represent a single query.

1123 John 123 Main stamps
1123 John 123 Main coins
5556 Mary 5 Lake Dr. hiking
9876 Bart 10Pine St. stamps

πId,Name(σHobby=stamps' OR Hobby='coins’(Person))

 Result Id Name 1123 John 9876 Bart

It is common to do the select first, followed by project.

"List the states that have living Presidents"

## Set Operations

Union: r ∪ s ⇒ a row is in the result set if it is a row from r or from s; or the set of all tuples that belong to either r or s.

[Intersection: r ∩ s ⇒ a row is in the result set if it is a row from both r and from s.]

Set Difference: r - s ⇒ the set of all tuples in r that are not in s

Both operations require that the sets r and s are union compatible:

• Both r and s have same number of columns
• Domains of attributes match or are compatible in sequence in both r and s
• The names may not necessarily match up; helpful if they do, however.

Union compatible relations can be combined using union ∪, intersection, and set difference -.

### Example:

The following tables are NOT union compatible:

• Professor (Id, Name, Office, Phone)

However,  πName (Person) and πName (Professor) ARE union compatible so that

πName (Person) - πName (Professor) makes sense.

Intersection is more likely solved using AND in the select operation. AND is in the definition of set intersection.

Union is more likely solved using OR in the select operation.

Difference is actually more useful to "remove" entries.

## Cartesian Product

If R and S are two relations, R × S is the set of all concatenated tuples <x,y>, where x is a tuple in R and y is a tuple in S

• Union compatibility for R and S is irrelevant

R × S is expensive to compute--quadratic algorithm:

• The sum of the number of attributes of R and S is the number of attributes of R × S
• The product of the sizes of R and S in the number of rows of R × S
Cartesian Product
R   S   R X S
a b c d a b c d
xl
x2
y1
y2
x1
x2
y1
y2
x3
x4
y3
y4
x1
x2
y3
y4
x3
x4
y1
y2
x3
x4
y3
y4

From here we could do select and project to reduce the combination, but we consider this below.

## Renaming

Result of an expression evaluation is a relation, called the result set.

Attributes of relation must have distinct names. This is no longer guaranteed with Cartesian product. Suppose in the previous example attributes a and c were really the same name, that is, R × S would have attributes: a,b,a,d

Renaming operator tidies this up. To assign the names A1, A2,… An to the attributes of the n column relation produced by

expression expr, use the form relation-expression [A1, A2, … An]

### Example

• Transcript ( StudId, CrsCode, Semester, Grade)
• Teaching (ProfId, CrsCode, Semester)

πStudId, CrsCode (Transcript) [StudId, CrsCode1] ×  πProfId, CrsCode (Teaching) [ProfId, CrsCode2]

## Join

This is a derived operation, i.e., it is based on the basic operations of the relational algebra.  It is a convenience operation because it is done so much.

A (general or theta θ) join of R and S is the expression
R join-condition S

where join-condition is a conjunction of terms Ai oper Bi in which Ai is an attribute of R and Bi is an attribute of S and oper is one of =, <, >, ≤, ≥, ≠

The meaning is essentially:

σjoin.-condition(R X S)

where join-condition in both case are the same, except for possible renamings of attributes.

### Join and Renaming

Problem: R and S might have attributes with the same name – in which case the Cartesian product is not defined

Solution:

• Rename attributes prior to forming the product and use new names in join-condition´.
• Common attribute names are qualified with relation names in the result of the join

## Theta Join – example

Output the names of all employees that earn more than their managers

πEmployee Name(EmployeeMgrId=Id AND Salary>Salary Manager)

the join yields a table with attributes: Employee.Name, Employee.Id, Employee.Salary, Employee.MngrId, Manager.Name, Manager.Id, Manager.Salary

## Equijoin Join - Example

Equijoin: Join condition is a conjunction of equalities

πName, CrsCode (StudentId=StudId AND Grade = `A' Transcript )

 Student Transcript Id Name Address Status StudId CrsCode Sem Grade 111 John 111 CS305 F03 B 222 Mary 222 CS306 F03 A 333 Bill 333 CS304 S04 A 444 Joe
 Mary CS306 Bill CS304

## Natural Join

Special, but most useful, case of equijoin:

• the join condition equates all but only those attributes with the same name
• the condition doesn’t have to be explicitly stated!
• duplicate columns are eliminated from the result

Transcript (StudId, CrsCode, Sem, Grade )
Teaching (ProfId, CrsCode, Sem)

TranscriptTeaching =
StudId, Transcript.CrsCode Transcript.Sem, Grade, ProfId (TranscriptCrsCode=CrsCode AND Sem = Sem Teaching ))

More generally:

RS = πattr-list (σjoin.-condition RS )

where attr-list = attributes (R) ∪ attributes(S)
(duplicates are eliminated )and join-condition has the form: A1=A1 and ...and An=An where {A1 ... An} = attributes (R) ∩ attributes(S)

## Natural Join Example

List all ID's of students who took at least two different courses:

πStudId (σCrsCodeCrsCode2 (TranscriptTranscript [StudId, CrsCode2, Sem2, Grade2]) )

## Division

Goal: Produce the tuples in one relation, r, that match all tuples in another relation, s

• r (A1, …An, B1, …Bm)
• s (B1 …Bm)
• r/s, with attributes A1, …An, is the set of all tuples <a> such that for every tuple <b> in s, <a,b> is in r

Can be expressed in terms of projection, set difference, and cross-product

## Division - Example

List the Ids of students who have passed all courses that were taught in spring 2000

Numerator:

• StudId and CrsCode for every course passed by every student:
• π   StudId, CrsCode σGrade< > ‘F’ (Transcript) )

Denominator:

• CrsCode of all courses taught in spring 2000
• π CrsCode ( σ Semester=‘S2000’ (Teaching) )

Result is numerator/denominator