Hadoop&& vs. ParallelDatabases1

[Pages:67]Hadoop vs.

Parallel Databases

Juliana Freire!

The Debate Starts...

The Debate Continues...

? A comparison of approaches to large-scale data analysis. Pavlo et al., SIGMOD 2009 !

o Parallel DBMS beats MapReduce by a lot!! o Many were outraged by the comparison!

? MapReduce: A Flexible Data Processing Tool. Dean and Ghemawat, CACM 2010!

o Pointed out inconsistencies and mistakes in the comparison!

? MapReduce and Parallel DBMSs: Friends or Foes? Stonebraker et al., CACM 2010!

o Toned down claims...!

Outline

? DB 101 - Review! ? Background on Parallel Databases ? for more detail, see

Chapter 21 of Silberschatz et al., Database Systems Concepts, Fifth Edition! ? Case for Parallel Databases! ? Case for MapReduce! ? Voice your opinion!!

Storing Data: Database vs. File System

? Once upon a time database applications were built on top of file systems...

? But this has many drawbacks:

o Data redundancy, inconsistency and isolation

? Multiple file formats, duplication of information in different files

o Difficulty in accessing data

? Need to write a new program to carry out each new task, e.g., search people by zip code or last name; update telephone number

o Integrity problems

? Integrity constraints (e.g., num_residence = 1) become part of program code -- hard to add new constraints or change existing ones

? Atomicity of updates

o Failures may leave database in an inconsistent state with partial updates carried out, e.g., John and Mary get married, add new residence, update John's entry, and database crashes while Mary's entry is being updated...

Why use Database Systems?

? Declarative query languages ? Data independence ? Efficient access through optimization ? Data integrity and security

o Safeguarding data from failures and malicious access

? Concurrent access ? Reduced application development time ? Uniform data administration

Query Languages

? Query languages: Allow manipulation and retrieval of data from a database

? Queries are posed wrt data model

o Operations over objects defined in data model

? Relational model supports simple, powerful QLs:

o Strong formal foundation based on logic o Allows for automatic optimization

SQL and Relational Algebra

? Manipulate sets of tuples ? c R= select -- produces a new relation with the subset of

the tuples in R that match the condition C

o Type = "savings" Account o SELECT * FROM Account

WHERE Account.type = `savings'

? AttributeList R = project -- deletes attributes that are not in projection list.

o Number, Owner, Type Account o SELECT number, owner, type FROM Account

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download