Database Management Systems — Relational Algebra : 05

12 min readSep 26, 2020

Relational Algebra

An algebra is a formal structure consisting of sets and operations on those sets.

Relational algebra is a formal system for manipulating relations.

 Operands of this algebra are relations.

 Operations of this algebra include the usual set operations (since relations are sets of tuples), and special operations defined for relations

 selection

 projection

 join

Set Operations on Relations

For the set operations on relations, both operands must have the same scheme, and the result has that same scheme.

 R1 U R2 (union) is the relation containing all tuples that appear in R1, R2, or both.

 R1 n R2 (intersection) is the relation containing all tuples that appear in both R1 and R2.

 R1 — R2 (set difference) is the relation containing all tuples of R1 that do not appear in R2.

Selection

Selects tuples from a relation whose attributes meet the selection criteria, which is normally expressed as a predicate.

R2 = select(R1,P)

That is, from R1 we create a new relation R2 containing those tuples from R1 that satisfy (make true) the predicate P.

A predicate is a boolean expression whose operators are the logical connectives (and, or, not) and arithmetic comparisons (LT, LE, GT, GE, EQ, NE), and whose operands are either domain names or domain constants.

select(Workstation,Room=633) =

Name Room Mem Proc Monitor
====================================
coke 633 16384 SP4 color17
bass 633 8124 SP2 color19
bashful 633 8124 SP1 b/w

select(User,Status=UG and Idle<1:00) =

Login Name Status Idle Shell Sever ========================================
jli J. Inka UG 0:00 bsh UG

Projection

Chooses a subset of the columns in a relation, and discards the rest.

R2 = project(R1,D1,D2,…Dn)

That is, from the tuples in R1 we create a new relation R2 containing only the domains D1,D2,..Dn.

project(Server,Name,Status) =

Name Status
==============
diamond up
emerald up
graphite down
ruby up
frito up

project(select(User,Status=UG),Name,Status) =

Name Status
=================
A. Cohn UG
J. Inka UG
R. Kemp UG

Join

Combines attributes of two relations into one.

R3 = join(R1,D1,R2,D2)

Given a domain from each relation, join considers all possible pairs of tuples from the two relations, and if their values for the chosen domains are equal, it adds a tuple to the result containing all the attributes of both tuples (discarding the duplicate domain D2).

Natural join: If the two relations being joined have exactly one attribute (domain) name in common, then we assume that the single attribute in common is the one being compared to see if a new tuple will be inserted in the result.

Assuming that we’ve augmented the domain names in our lab database so that we use MachineName, PrinterName, ServerName, and UserName in place of the generic domain “Name”, then

join(Workstations,Printers)

is a natural join, on the shared attribute name Room. The result is a relation of all workstation/printer attribute pairs that are in the same room.

Example Use of Project and Join

Find all workstations in a room with a printer.

 R1 = project(Workstation,Name,Room)

 R2 = project(Printer,Name,Room)

 R3 = join(R1,R2)

Implementing Set Operations

To implement R1 U R2 (while eliminating duplicates) we can

 sort R1 in O(N log N)

 sort R2 in O(M log M)

 merge R1 and R2 in O(N+M)

If we allow duplicates in union (and remove them later) we can

 copy R1 to R3 in O(N)

 insert R2 in R3 in O(M)

If we have an index and don’t want duplicates we can

 copy R1 to R3 in O(N)

 for each tuple in R2 (which is O(M))

 use index to lookup tuples in R1 with the same index value O(1)

 if R2 tuple equals some such R1 tuple, don’t add R2 tuple to R3

Intersection and set difference have corresponding implementations.

Implementing Projection

To implement projection we must

 process every tuple in the relation

 remove any duplicates that result

To avoid duplicates we can

 sort the result and remove consecutive tuples that are equal

 requires time O(N log N) where N is the size of the original relation

 implement the result as a set  set insertion guarantees no duplicates

 by using a hash table, insertion is O(1), so projection is O(N)

Implementing Selection In the absence of an index we

 apply the predicate to every tuple in the relation

 insert matches in the resulting relation

 duplicates can’t occur  take O(N) time

Given an index, and a predicate that uses the index key, we

 Lookup tuples using the key

 evaluate only those tuples with the predicate

 take O(K) time, where K tuples match the key

Implementing Join with Nested Loops

A nested loop join on relations R1 (with N domains) and R2 (with M domains), considers all |R1| x |R2| pairs of tuples.

R3= join(R1,Ai,R2,Bj)

for each tuple t in R1 do
for each tuple s in R2 do
if t.Ai = s.Bj then
insert(R3, t.A1, t.A2, …, t.AN,
s.B1, …, s.B(j-1), s.B(j+1), …, s.BM)

This implementation takes time O(|R1|*|R2|).

Index Join

An index join exploits the existence of an index for one of the domains used in the join to find matching tuples more quickly.

R3= join(R1,Ai,R2,Bj)

for each tuple t in R1 do
for each tuple s in R2 at index(t.Ai) do
insert(R3, t.A1, t.A2, …, t.AN,
s.B1, …, s.B(j-1), s.B(j+1), …, s.BM)

We could choose to use an index for R2, and reverse the order of the loops.

The decision on which index to use depends on the number of tuples in each relation.

Sort Join

If we don’t have an index for a domain in the join, we can still improve on the nested-loop join using sort join.

R3= join(R1,Ai,R2,Bj)

 Merge the tuples of both relations into a single list

 list elements must identify the original relation

 Sort the list based on the value in the join domains Ai and Bj

 all tuples on the sorted list with a common value for the join domains are consecutive

 Pair all (consecutive) tuples from the two relations with the same value in the join domains

Comparison of Join Implementations

Assumptions

 Join R1 and R2 (on domain D) producing R3

 R1 has i tuples, R2 has j tuples

 |R3| = m, 0 <= m <= i * j

 Every implementation takes at least time O(m)

Comparison

 Nested-loop join takes time O(i * j)

 Index join (using R2 index) takes time O(i+m)

 lookup is O(1) for each tuple in R1

 at most O(m) tuples match

 Sort join takes time O(m +(i+j)log(i+j))

 O(i+j) to merge the tuples in R1 and R2

 O((i+j) log (i+j)) to sort the list

 O(m) to produce the output (0 <= m <= i*j)

Expressing Queries in Relational Algebra

Relational algebra is an unambiguous notation (or formalism) for expressing queries.

Queries are simply expressions in relational algebra.

Expressions can be manipulated symbolically to produce simpler expressions according to the laws of relational algebra.

Expression simplification is an important query optimization technique, which can affect the running time of queries by an order of magnitude or more.

 early “selection” reduces the number of tuples

 early “projection” reduces the number of domains

Algebraic Laws for Join

Commutativity (assuming order of columns doesn’t matter)

join(R1, Ai, R2, Bj) = join(R2, Bj, R1, Ai)

Nonassociativity

join (join(R1, Ai, R2, Bj),Bj,R3,Ck)

is not the same as

join (R1,Ai,join(R2, Bj, R3, Ck),Bj)

Algebraic Laws for Selection

Commutativity

select(select(R1,P1),P2) = select(select(R1,P2),P1)

Selection pushing

 if P contains attributes of R

select(join(R,Ai,S,Bj),P) = join(select(R,P),Ai,S,Bj)

 if P contains attributes of S

select(join(R,Ai,S,Bj),P) = join(R,Ai,select(S,P),Bj)

Selection Splitting (where P = A and B)

select(R,P) = select(select(R,A),B)

select(R,P) = select(select(R,B),A)

Example: Selection Pushing and Splitting

Consider the following 4 relation database

 CSG: Course-StudentID-Grade

 SNAP: StudentID-Name-Address-Phone

 CDH: Course-Day-Hour

 CR: Course-Room

Implement the query “Where is Amy at Noon on Monday?”

Let P be (Name=”Amy” and Day=”Monday” and Hour=”Noon”)

We can use a brute-force approach that joins all the data in the relations into a single large relation, selects those tuples that meet the query criteria, and then isolates the answer field using projection.

 R1 = join(CSG,SNAP)

 R2 = join(R1,CDH)

 R3 = join(R2,CR)

 R4 = select(R3,P)

 R5 = project(R4,Room)

project(select(join(join(join(CSG,SNAP),CDH),CR),P),Room)

Selection Pushing and Splitting (cont)

The selection uses only Name, Day, and Hour attributes (and not Course or Room), so we can push the selection inside the outermost join.

 R1 = join(CSG,SNAP)

 R2 = join(R1,CDH)

 R3 = select(R2,P)

 R4 = join(R3,CR)

 R5 = project(R4,Room)

We cannot push selection further, because the predicate involves attributes from both operands of the next innermost join (R1,CDH).

We can split the selection into two, one based on Name, and the other based on Day-Hour.

 R1 = join(CSG,SNAP)

 R2 = join(R1,CDH)

 R3 = select(R2,Day=”Monday” and Hour=”Noon”)

 R4 = select(R3,Name=”Amy”)

 R5 = join(R4,CR)

 R6 = project(R5,Room)

Selection Pushing and Splitting (cont 2)

Now we can push the first selection inside the join, since it involves only attributes from the CDH relation.

 R1 = join(CSG,SNAP)

 R2 = select(CDH,Day=”Monday” and Hour=”Noon”)

 R3 = join(R1,R2)

 R4 = select(R3,Name=”Amy”)

\ R5 = join(R4,CR)

 R6 = project(R5,Room)

Similarly we can push the second selection inside the preceding join, since it involves only attributes from R1 (ie, Name).

 R1 = join(CSG,SNAP)

 R2 = select(CDH,Day=”Monday” and Hour=”Noon”)

 R3 = select(R1,Name=”Amy”)

 R4 = join(R2,R3)

 R5 = join(R4,CR)

 R6 = project(R5,Room)

Continuing to push the second select inside the first join

 R1 = select(SNAP,Name=”Amy”)

 R2 = join(CSG,R1)

 R3 = select(CDH,Day=”Monday” and Hour=”Noon”)

 R4 = join(R2,R3)

 R5 = join(R4,CR)

 R6 = project(R5,Room)

Algebraic Laws for Projection

Projection pushing

To push a projection operation inside a join requires that the result of the projection contain the attributes used in the join.

project(join(R,Ai,S,Bj),D1,D2,…Dn)

In this case, we know that the domains in the projection will exist in the relation that results from the join.

In performing projection first (on the two join relations)

 we should only project on those domains that exist in each of the two relations

 we must ensure that the join domains Ai and Bj exist in the resulting two relations

Let PDR = {D|D domain in R, D in {D1…Dn}} U Ai

Let PDS = {D|D domain in S, D in {D1…Dn}} U Bi

R1 = project(R,PDR)

R2 = project(S,PDS)

R3 = join(R1,Ai,R2,Bj) = project(join(R,Ai,S,Bj),D1,D2,…Dn)

Example: Projection Pushing

Implement the query “Where is Amy at Noon on Monday?”

 R1 = select(SNAP,Name=”Amy”)

 R2 = join(CSG,R1)

 R3 = select(CDH,Day=”Monday” and Hour=”Noon”)

 R4 = join(R2,R3)

 R5 = join(R4,CR)

 R6 = project(R5,Room)

This approach carries along unnecessary attributes every step of the way.

 R1 carries Address and Phone attributes

 R4 carries Grade attribute

We use projection pushing to eliminate unnecessary attributes early in the implementation.

 R1 = select(SNAP,Name=”Amy”)

 R2 = join(CSG,R1)

 R3 = select(CDH,Day=”Monday” and Hour=”Noon”)

 R4 = join(R2,R3)

 R5 = project(CR, Course, Room)

 R6 = project(R4, Course)

 R7 = join(R5,R6)

 R8 = project(R7,Room)

Note that R5 is unnecessary, since the domains in the projection are all the domains of CR.

Projection Pushing (cont)

Implement the query “Where is Amy at Noon on Monday?”

 R1 = select(SNAP,Name=”Amy”)

 R2 = join(CSG,R1)

 R3 = select(CDH,Day=”Monday” and Hour=”Noon”)

 R4 = join(R2,R3)

 R5 = project(R4, Course)

 R6 = join(CR,R5)

 R7 = project(R6,Room)

We can continue pushing the projection on Course below the join for R4.

 R1 = select(SNAP,Name=”Amy”)

 R2 = join(CSG,R1)

 R3 = select(CDH,Day=”Monday” and Hour=”Noon”)

 R4 = project(R2,Course)

 R5 = project(R3,Course)

 R6 = join(R4,R5)

 R7 = join(CR,R6)

 R8 = project(R7,Room)

Projection Pushing (cont2)

We can continue pushing the projection on Course for R4 below the join for R2.

 R1 = select(SNAP,Name=”Amy”)

 R2 = project(CSG,Course,StudentID)

 R3 = project(R1,StudentID)

 R4 = join(R2,R3)

 R5 = project(R4,Course)

 R6 = select(CDH,Day=”Monday” and Hour=”Noon”)

 R7 = project(R6,Course)

 R8 = join(R6,R7)

 R9 = join(CR,R8)

 R10 = project(R9,Room)

A query language is a language in which user requests information from the database. it can be categorized as either procedural or nonprocedural. In a procedural language the user instructs the system to do a sequence of operations on database to compute the desired result. In nonprocedural language the user describes the desired information without giving a specific procedure for obtaining that information.

The relational algebra is a procedural query language. It consists of a set of operations that take one or two relations as input and produces a new relation as output.

Fundamental Operations

 SELECT

 PROJECT

 UNION

 SET DIFFERENCE

 CARTESIAN PRODUCT

 RENAME

Select and project operations are unary operation as they operate on a single relation.Union, set difference, Cartesian product and rename operations are binary operations as they operate on pairs of relations.

Other Operations

 SET INTERSECTION

 NATURAL JOIN

 DIVISION

 ASSIGNMENT

The select operation: — to identify a set of tuples which is a part of a relation and to extract only these tuples out. The select operation selects tuples that satisfy a given predicate or condition.

 It is a unary operation defined on a single relation.

 It is denoted as σ.

The union operation: — is used when we need some attributes that appear in either or both of the two relations.

 It is denoted as U.

Example:

Borrower (customer-name, loan-number)

Depositor (customer-name, account-number)

Customer (customer-name, street-number, customer-city)

For a union operation r U s to be valid, two conditions must hold:

 The relation r and s must be of the same arity, i.e. they must have the same number of attributes.

 The domains of the ith attribute of r and the ith attribute of s must be the same for all i.

The set difference operation: — finds tuples in one relation but not in other.

 It is denoted as –

The Cartesian product operation: — allows combining information from two relations.

 It is denoted as r X s where r and s are relations.

If relation r has n1 tuples and relation s has n2 tuples then r X s has n1*n2 tuples.

Example:

Borrower (customer-name, loan-number)

Loan (loan-number, branch-name, city, amount)

The rename operation: — used to rename.

 It is denoted as ρ.

E : relational algebra expression

ρ x (E): returns the result of expression E under the name x.

ρ x (A1, A2, A3… An) (E): returns the result of expression E under the name x with attributes renamed to A1, A2, A3… An.

The set intersection operation: — finds tuples in both the relations.

 It is denoted as ∩.

Example:

Borrower (customer-name, loan-number)

Depositor (customer-name, account-number)

Customer (customer-name, street-number, customer-city)

The natural join operation: — it is a binary operation and a combination of certain selections and a Cartesian product into one operation.

 It is denoted as |X| .

 It is associative.

It forms a Cartesian product of its two arguments. Then performs a selection forcing equality on those attributes those appear in both the relations. And finally removes duplicates attributes.

r(R): r is a relation with attributes R. s(S): s is a relation with attributes S.

If R ∩ S = Ф i.e. they have no attributes in common then r |X| s = r X s

The division / quotient operation: -

 It is denoted as ÷.

Letr(R) and s(S) be relations

r ÷ s: — the result consists of the restrictions of tuples in r to the attribute names unique to R, i.e. in the Header of r but not in the Header of s, for which it holds that all their combinations with tuples in s are present in r.

Extended Relational Algebra Operations

GENERALIZED PROJECTION: — It extends the projection operation by allowing arithmetic functions to be used in projection list.

Π F1,F2 … Fn (E)

Where E: relational algebra expression

Fi: arithmetic expression

AGGREGATE FUNCTION:-It takes a collection of values and returns a single value as a result.

Limitations Of Relational Algebra

Although relational algebra seems powerful enough for most practical purposes, there are some simple and natural operators on relations which cannot be expressed by relational algebra. The transitive closure of a binary relation is one of them.

Relational Algebra Implemented In SQL

SQL (Structured query Language) is the most popular computer language used to create, modify, retrieve data from relational database management system.The basic structure of an SQL expression consists of three clauses:

SELECT: — This clause corresponds to the projection operation of the relational algebra. It is used to list the attributes of the result of a query.

FROM: -It corresponds to the Cartesian product operation of the relational algebra. It lists the relations scanned in the evaluation of an expression.

WHERE: — This clause corresponds to selection predicate of relational algebra. It consists of a predicate involving attributes of the relations that appear in the FROM clause.

SQL QUERY FORM:

Select A1, A2….An

From r1, r2…rm

Where P

Ai : attribute

Ri : relation

P : predicate

SELECT clause- specifies the table columns retrieved.

FROM clause- specifies the tables to be accessed.

WHERE clause- which rows in the FROM tables to use.

Joining Tables

The FROM clause allows more than 1 table in its list. The rows from one table must be correlated with the rows of the others. This correlation is known as joining.

Set Operations

UNION, INTERSECT and EXCEPT operations can be done in SQL corresponding to their operations U, ∩ and — in relational algebra only if the domains of the attributes of the relations match and the relations have same arity i.e same number of attributes.

Database Management Systems — Relational Algebra : 05

Relational Algebra

Set Operations on Relations

Selection

Name Room Mem Proc Monitor==================================== coke 633 16384 SP4 color17 bass 633 8124 SP2 color19 bashful 633 8124 SP1 b/w

Login Name Status Idle Shell Sever ========================================jli J. Inka UG 0:00 bsh UG

Projection

Name Status ============== diamond up emerald up graphite down ruby up frito up

Name Status =================A. Cohn UG J. Inka UG R. Kemp UG

Join

Example Use of Project and Join

Implementing Set Operations

Implementing Projection

Implementing Join with Nested Loops

R3= join(R1,Ai,R2,Bj)

for each tuple t in R1 do for each tuple s in R2 do if t.Ai = s.Bj then insert(R3, t.A1, t.A2, …, t.AN, s.B1, …, s.B(j-1), s.B(j+1), …, s.BM)

Index Join

R3= join(R1,Ai,R2,Bj)

for each tuple t in R1 do for each tuple s in R2 at index(t.Ai) do insert(R3, t.A1, t.A2, …, t.AN, s.B1, …, s.B(j-1), s.B(j+1), …, s.BM)

Sort Join

Comparison of Join Implementations

Expressing Queries in Relational Algebra

Algebraic Laws for Join

Algebraic Laws for Selection

Example: Selection Pushing and Splitting

Selection Pushing and Splitting (cont)

Selection Pushing and Splitting (cont 2)

Algebraic Laws for Projection

project(join(R,Ai,S,Bj),D1,D2,…Dn)

Example: Projection Pushing

Projection Pushing (cont)

Projection Pushing (cont2)

Fundamental Operations

Other Operations

The division / quotient operation: -

Extended Relational Algebra Operations

Relational Algebra Implemented In SQL

Joining Tables

Set Operations

Written by Effective Reader

Name Room Mem Proc Monitor
====================================
coke 633 16384 SP4 color17
bass 633 8124 SP2 color19
bashful 633 8124 SP1 b/w

Login Name Status Idle Shell Sever ========================================
jli J. Inka UG 0:00 bsh UG

Name Status
==============
diamond up
emerald up
graphite down
ruby up
frito up

Name Status
=================
A. Cohn UG
J. Inka UG
R. Kemp UG

for each tuple t in R1 do
for each tuple s in R2 do
if t.Ai = s.Bj then
insert(R3, t.A1, t.A2, …, t.AN,
s.B1, …, s.B(j-1), s.B(j+1), …, s.BM)

for each tuple t in R1 do
for each tuple s in R2 at index(t.Ai) do
insert(R3, t.A1, t.A2, …, t.AN,
s.B1, …, s.B(j-1), s.B(j+1), …, s.BM)