Query Optimization 2 - Stanford University

Query Optimization 2

Instructor: Matei Zaharia cs245.stanford.edu

Recap: Data Statistics

Information about tuples in a table that we can use to estimate costs

? Must be approximated for intermediate tables

We saw one way to do this for 4 statistics:

? T(R) = # of tuples in R ? S(R) = average size of tuples in R ? B(R) = # of blocks to hold R's tuples ? V(R, A) = # distinct values of attribute A in R

CS 245

2

Another Type of Data Stats: Histograms

15 12

10

5

number of tuples in R with A value in a given range

Aa(R) = ?

10 20 30 40

CS 245

3

Outline

What can we optimize?

Rule-based optimization

Data statistics

Cost models

Cost-based plan selection

Spark SQL

CS 245

4

Outline

What can we optimize?

Rule-based optimization

Data statistics

Cost models

Cost-based plan selection

Spark SQL

CS 245

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download