Relational Algebra CSE 544: Relational Operators, Sorting 1. Union and ...

嚜燎elational Algebra

? Operates on relations, i.e. sets

每 Later: we discuss how to extend this to bags

? Five operators:

CSE 544:

Relational Operators, Sorting

Wednesday, 5/12/2004











Union: ﹍

Difference: Selection: 考

Projection: 旭

Cartesian Product: ℅

? Derived or auxiliary operators:

每 Intersection, complement

每 Joins (natural,equi-join, theta join, semi-join)

每 Renaming: 老

1. Union and 2. Difference

? R1 ﹍ R2

? Example:

ActiveEmployees ﹍ RetiredEmployees

What about Intersection ?

?

?

?

?

It is a derived operator

R1 ﹎ R2 = R1 每 (R1 每 R2)

Also expressed as a join (will see later)

Example

UnionizedEmployees ﹎ RetiredEmployees

? R1 每 R2

? Example:

AllEmployees 每 RetiredEmployees

3. Selection

? Returns all tuples which satisfy a condition

? Notation: 考c(R)

? Examples

考Salary > 40000 (Employee)

考name = ※Smithh§ (Employee)

4. Projection

? Eliminates columns, then removes duplicates

? Notation: 旭 A1,#,An (R)

? Example: project social-security number and

names:

旭 SSN, Name (Employee)

Output schema: Answer(SSN, Name)

? The condition c can be =, , ≡,

1

Cartesian Product Example

5. Cartesian Product

? Each tuple in R1 with each tuple in R2

? Notation: R1 ℅ R2

? Example:

Employee ℅ Dependents

? Very rare in practice; mainly used to

express joins

Renaming

? Changes the schema, not the instance

? Notation: 老 B1,#,Bn (R)

? Example:

老LastName, SocSocNo (Employee)

Output schema: Answer(LastName, SocSocNo)

Employee

Name

John

Tony

SSN

999999999

777777777

Dependents

EmployeeSSN

999999999

777777777

Dname

Emily

Joe

Employee x Dependents

Name

SSN

EmployeeSSN

John

999999999 999999999

John

999999999 777777777

Tony

777777777 999999999

Tony

777777777 777777777

Renaming Example

Employee

Name

John

Tony

SSN

999999999

777777777

老LastName, SocSocNo (Employee)

LastName

John

Tony

Natural Join

? Notation: R1 |℅| R2

? Meaning: R1 |℅| R2 = 旭A(考C(R1 ℅ R2))

? Where:

每 The selection 考C checks equality of all common

attributes

每 The projection eliminates the duplicate common

attributes

Dname

Emily

Joe

Emily

Joe

SocSocNo

999999999

777777777

Natural Join Example

Employee

Name

John

Tony

SSN

999999999

777777777

Dependents

SSN

999999999

777777777

Dname

Emily

Joe

Employee

Dependents =

旭Name, SSN, Dname(考

考 SSN=SSN2(Employee x 老SSN2, Dname(Dependents))

Name

John

Tony

SSN

999999999

777777777

Dname

Emily

Joe

2

Natural Join

? R=

S=

Natural Join

A

B

B

C

X

Y

Z

U

X

Z

V

W

Y

Z

Z

V

Z

V

? R |℅| S=

? Given the schemas R(A, B, C, D), S(A, C, E),

what is the schema of R |℅| S ?

? Given R(A, B, C), S(D, E), what is R |℅| S ?

A

B

C

X

Z

U

X

Z

V

Y

Z

U

Y

Z

V

Z

V

W

? Given R(A, B), S(A, B), what is R |℅| S ?

Theta Join

? A join that involves a predicate

? R1 |℅| 牟 R2 = 考 牟 (R1 ℅ R2)

? Here 牟 can be any condition: =, ≡

Eq-join

? A theta join where 牟 is an equality

? R1 |℅| A=B R2 = 考 A=B (R1 ℅ R2)

? Example:

Employee |℅| SSN=SSN Dependents

? Most useful join in practice

Semijoin

? R |℅ S = 旭 A1,#,An (R |℅| S)

? Where A1, #, An are the attributes in R

? Example:

Employee |℅ Dependents

Semijoins in Distributed

Databases

? Semijoins are used in distributed databases

Dependents

Employee

SSN

Name

...

...

SSN

...

Dname

Age

...

network

Employee

(考 age>71(Dependents))

Employee |℅|

|℅| ssn=ssn

(Dependents))

ssn=ssn (考 age>71

R = Employee |℅ T

T = 旭 SSN 考 age>71 (Dependents)

Answer = R |℅| Dependents

3

Complex RA Expressions

Operations on Bags

旭 name

buyer-ssn=ssn

pid=pid

seller-ssn=ssn

旭 ssn

旭 pid

考name=fred

Person

Purchase

Person

考name=gizmo

A bag = a set with repeated elements

Relational Engines work on bags, not sets !

All operations need to be defined carefully on bags

? {a,b,b,c}﹍{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}

? {a,b,b,b,c,c} 每 {b,c,c,c,d} = {a,b,b,d}

? 考C(R): preserve the number of occurrences

? 旭A(R): no duplicate elimination

? Cartesian product, join: no duplicate elimination

Product

Logical Operators in the Bag

Algebra

?

?

?

?

?

?

?

Union, intersection, difference

Selection 考

Projection 旭

Join

Duplicate elimination 汛

Grouping 污

Sorting 而

Relational

Algebra

(on bags)

Query Plan:

? logical tree

? implementation

choice at every

node

? scheduling of

operations.

SELECT

SELECTcity,

city,count(*)

count(*)

FROM

FROMsales

sales

GROUP

GROUPBY

BYcity

city

HAVING

HAVINGsum(price)

sum(price)>>100

100

City=&seattle*

考 p > 100

c

sales

Architecture of a Database Engine

SQL query

buyer



旭 city, c

T(city,p,c)

污 city, sum(price) p, count(*)

Physical Operators

SELECT

SELECT S.buyer

S.buyer

FROM

FROMPurchase

PurchaseP,

P,Person

PersonQQ

WHERE

P.buyer=Q.name

WHERE P.buyer=Q.nameAND

AND

Q.city=&seattle*

Q.city=&seattle*AND

AND

Q.phone

>

&5430000*

Q.phone > &5430000*

Example

Parse Query

phone>*5430000*

Buyer=name

(Simple Nested Loops)

Query

optimization

Select Logical Plan

Logical

plan

Select Physical Plan

Purchase

(Table scan)

Person

(Index scan)

Query Execution

Physical

plan

Some operators are from relational

algebra, and others (e.g., scan)

are not.

4

Cost Parameters

In database systems the data is on disks, not in main memory

The cost of an operation = total number of I/Os

Cost parameters:

? B(R) = number of blocks for relation R

? T(R) = number of tuples in relation R

? V(R, a) = number of distinct values of attribute a

Cost

Cost of an operation =

number of disk I/Os needed to:

每 read the operands

每 compute the result

Cost of writing the result to disk is not included on the following

slides

Question: the cost of sorting a table with B blocks ?

Answer:

Sorting While Scanning

? Sometimes it is useful to have the output

sorted

? Three ways to scan it sorted:

每 If there is a primary or secondary index on it,

use it during scan

每 If it fits in memory, sort there

每 If not, use multi-way merge sort

Cost Parameters

? Clustered table R:

每 Blocks consists only of records from this table

每 B(R) > T(R) / blockSize

? Unclustered table R:

每 Its records are placed on blocks with other tables

每 When R is unclustered: B(R) > T(R)

? When a is a key, V(R,a) = T(R)

? When a is not a key, V(R,a)

Scanning Tables

? The table is clustered:

每 Table-scan: if we know where the blocks are

每 Index scan: if we have a sparse index to find the

blocks

? The table is unclustered

每 May need one read for each record

Cost of the Scan Operator

? Clustered relation:

每 Table scan:

? Unsorted: B(R)

? Sorted: 3B(R)

每 Index scan

? Unsorted: B(R)

? Sorted: B(R) or 3B(R)

? Unclustered relation

每 Unsorted: T(R)

每 Sorted: T(R) + 2B(R)

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download