Column Oriented Databases Vs Row Oriented Databases



|ITK - 478 |

|Column Oriented Databases Vs Row Oriented Databases |

|Special Interest Activity |

|Submitted by: Venkat. K & Rakesh. K |

|11/19/2007 Updated: 12/03/2007 |

Table of Contents

Introduction 3

MonetDB 5

Monet Architecture 5

VOC Data set: 8

Queries in MonetDB: 11

TPC-H Benchmark 13

TPC-H DDL 14

For MonetDB 14

For Oracle 17

TPC-H Data Insertion 19

MonetDB/SQL 21

Queries 21

LucidDB 26

Main features: 26

Query Optimization and Execution 29

Data insertion 32

Advantages and Disadvantages: 34

Summary: 34

References: 35

Introduction [1][2]

Database that we find extensively is a row oriented database which stores data in rows. It has high performance for the OLTP i.e. online transaction processing. This document talks about Column oriented database that stores data by column.

After leading the team that worked on the C-Stotre, an open source column oriented database Mike Stonebraker along with 6 other have come up with the market product of the column oriented database, Vertica. Mike Stonebraker is considered as one of the major contributor to the column oriented database.

The document takes you through different column oriented databases that are in the market and the installation procedures and details of two such databases.

The following example illustrates the difference between them

Consider the Project table which has three column pjno, status, pjtitle and pjno is the primary key.

|Pjno |Status |Pjtitle |

|5555 |A |Marketing |

|6666 |C |Inventory |

|7777 |A |Order entry |

The row store implementation of the table is stored in the file as following. Here values of different attributes from the same tuple are stored consecutively.

5555,A,Marketing;6666,C,Inventory;7777,A,Order Entry;

The column store implementation of the table is stored in the file as following. Here the values of same attributes are stored consecutively. The column oriented database stores data in columns and are joined with the help of the IDs. It also has the feature of compressing the data by column wise with the help of projections. It stores the repeated data in the columns as one.

5555,6666,7777; A,C,A; Marketing, Inventory, Order entry;

The row store architecture is well suited for OLTP operations where as column store architecture is suited for OLAP operations.

The following are the advantages [1]

• While row stores are extremely "write friendly", in that adding a row of data to a table requires a simple file appending I/O, column stores perform better for complex read queries. For tables with many columns and queries that use only few of them, a column store can confine its reads to the columns required, whereas a row store must read the entire table. In addition, the storage efficiency properties of column stores can greatly reduce the number of actual disk reads required to satisfy a query.

• Column data is of uniform type. Therefore it is much easier to compress than row data, and NULL values need never be stored. Row stores cannot omit columns from any row and still achieve direct random access to a table, because random access requires that the data for each row be of fixed width. In column stores, this is trivially true because of type uniformity within a single column's storage, allowing omission of NULL values and therefore efficient storage of wide, sparsely populated tables. In practice, row stores can and do implement tables with variable-width rows, but this require either some form of indirect access or giving up random access in favor of some type of fast ordered access, e.g. B-trees (used in the architecture of lucid DB). However, both storage efficiency and code complexity of such approaches generally compare unfavorably to implementations of sparse column stores.

The same above concept is used in all of the following databases and in the databases we are going to talk about.

Open source -

• C-store

• MonetDB

• LucidDB

• Metakit

Proprietary

• BigTable

• Sybase IQ

• Xplain

• KDB

• DataProbe

We performed our special interest activity on MonetDB and LucidDB.

MonetDB [4]

MonetDB is an open source high-performance database management system developed at the National Research Institute for Mathematics and Computer Science (CWI; Centrum voor Wiskunde en Informatica) in the Netherlands. It was designed to provide high performance on complex queries against large databases, e.g. combining tables with hundreds of columns and multi-million rows. MonetDB has been successfully applied in high-performance applications for data mining, OLAP, GIS, XML Query, text and multimedia retrieval. MonetDB internal data representation is memory-based, relying on the huge memory addressing ranges of contemporary CPUs, and thus departing from traditional DBMS designs involving complex management of large data stores in limited memory.

MonetDB is one of the first database systems to focus its query optimization effort on exploiting CPU caches.

The MonetDB family consists of:

• MonetDB/SQL: the relational database solution

• MonetDB/XQuery: the XML database solution

• MonetDB Server: the multi-model database server

Monet Architecture

[pic]

1. Query language parser – reads SQL (structured query language) from the user and checks for the syntax.

2. Query rewriter – rewrites the query into some normal form.

3. Query optimizer – translates the logical description of the query into a query plan.

4. Query executor – executes the physical query and produces the result.

5. Access methods – system services to access data from the tables.

6. Buffer manager – handles caching of table data stored in the table.

7. Lock manager – system services for locking the transaction.

8. Recovery manager – when the transactions are commit it makes sures that data is persistent and erases when there is not.

The architecture emphasis that the multiple front ends can connect to the back end. We can have relational, object oriented queries as front end and have MonetDB as back-end. The intermediate language between the front end and back end is the MIL (Monet Interpreter Language).

Query execution is divided into strategic and tactical phase. The strategic optimization is done in front end while the tactical phase is done at run-time in the Monet Query Executor.

Monet uses binary table model where are all the tables consists of exactly two columns. These are called as Binary Association Tables. Each front end uses mapping rules to map logical data model as seen as the end user onto binary tables in Monet. In case of relational model relational tables are vertically fragmented, by storing each column from a relational table in a separate BAT. The right column of the BATs holds the column value and left column holds the row or object identifier.

[pic]

Consider the following tables

[pic]

A relational data model can be stored in Monet by splitting each relational table by column. Each column becomes a BAT that holds the column values in the right column (tail) and object identifier (OID) in left column (head). The relational tuples can be reconstructed by taking all tail values of the column BATs with the same OID.

The tables are decomposed into following columns

[pic]

MIL is a procedural block-structured language with standard control structures like if-then-else and while loops.

The following is the MIL translation of the SQL

[pic]

Installation of MonetDB [5]

MonetDB is an open source database. It can be downloaded from this link



The following are the tools for MonetDB

• DBVisualizer

• Squirrel

• Aqua Data Studio

• iSQL

We used the DBVisualizer for our activity. The driver settings for it are

[pic]

DBVisualizer

VOC Data set:

Exploring the wealth of functionality offered by MonetDB/SQL is best started using a toy database. For this we use the VOC database which provides a peephole view into the administrative system of an early multi-national company, the Vereenigde geoctrooieerde Oostindische Compagnie (VOC for short - The (Dutch) East Indian Company) established on March 20, 1602.

We used Oracle Server which is given access as part of the course with the following details for connection using SqlDeveloper. We have also run the queries by running the oracle in our system instead of college server using following credentials.

Connection Name: Test10G (You can give anything)

Username & Password (my computer ID and password)

Oracle is default

Connection type: Basic

[pic]

Scripts –

CREATE TABLE "voyages" (

"number" integer NOT NULL,

"number_sup" char(1) NOT NULL,

"trip" integer,

"trip_sup" char(1),

"boatname" varchar(50),

"master" varchar(50),

"tonnage" integer,

"type_of_boat" varchar(30),

"built" varchar(15),

"bought" varchar(15),

"hired" varchar(15),

"yard" char(1),

"chamber" char(1),

"departure_date" date,

"departure_harbour" varchar(30),

"cape_arrival" date,

"cape_departure" date,

"cape_call" boolean,

"arrival_date" date,

"arrival_harbour" varchar(30),

"next_voyage" integer,

"particulars" varchar(530)

);

CREATE TABLE "craftsmen" (

"number" integer NOT NULL,

"number_sup" char(1) NOT NULL,

"trip" integer,

"trip_sup" char(1),

"onboard_at_departure" integer,

"death_at_cape" integer,

"left_at_cape" integer,

"onboard_at_cape" integer,

"death_during_voyage" integer,

"onboard_at_arrival" integer

);

CREATE TABLE "impotenten" (

"number" integer NOT NULL,

"number_sup" char(1) NOT NULL,

"trip" integer,

"trip_sup" char(1),

"onboard_at_departure" integer,

"death_at_cape" integer,

"left_at_cape" integer,

"onboard_at_cape" integer,

"death_during_voyage" integer,

"onboard_at_arrival" integer

);

CREATE TABLE "invoices" (

"number" integer,

"number_sup" char(1),

"trip" integer,

"trip_sup" char(1),

"invoice" integer,

"chamber" char(1)

);

CREATE TABLE "passengers" (

"number" integer NOT NULL,

"number_sup" char(1) NOT NULL,

"trip" integer,

"trip_sup" char(1),

"onboard_at_departure" integer,

"death_at_cape" integer,

"left_at_cape" integer,

"onboard_at_cape" integer,

"death_during_voyage" integer,

"onboard_at_arrival" integer);

CREATE TABLE "seafarers" (

"number" integer NOT NULL,

"number_sup" char(1) NOT NULL,

"trip" integer,

"trip_sup" char(1),

"onboard_at_departure" integer,

"death_at_cape" integer,

"left_at_cape" integer,

"onboard_at_cape" integer,

"death_during_voyage" integer,

"onboard_at_arrival" integer

);

CREATE TABLE "soldiers" (

"number" integer NOT NULL,

"number_sup" char(1) NOT NULL,

"trip" integer,

"trip_sup" char(1),

"onboard_at_departure" integer,

"death_at_cape" integer,

"left_at_cape" integer,

"onboard_at_cape" integer,

"death_during_voyage" integer,

"onboard_at_arrival" integer

);

CREATE TABLE "total" (

"number" integer NOT NULL,

"number_sup" char(1) NOT NULL,

"trip" integer,

"trip_sup" char(1),

"onboard_at_departure" integer,

"death_at_cape" integer,

"left_at_cape" integer,

"onboard_at_cape" integer,

"death_during_voyage" integer,

"onboard_at_arrival" integer

);

ALTER TABLE "voyages" ADD PRIMARY KEY ("number", "number_sup");

ALTER TABLE "craftsmen" ADD PRIMARY KEY ("number", "number_sup");

ALTER TABLE "impotenten" ADD PRIMARY KEY ("number", "number_sup");

ALTER TABLE "passengers" ADD PRIMARY KEY ("number", "number_sup");

ALTER TABLE "seafarers" ADD PRIMARY KEY ("number", "number_sup");

ALTER TABLE "soldiers" ADD PRIMARY KEY ("number", "number_sup");

ALTER TABLE "total" ADD PRIMARY KEY ("number", "number_sup");

ALTER TABLE "craftsmen" ADD FOREIGN KEY ("number", "number_sup")

REFERENCES "voyages" ("number", "number_sup");

ALTER TABLE "impotenten" ADD FOREIGN KEY ("number", "number_sup")

REFERENCES "voyages" ("number", "number_sup");

ALTER TABLE "invoices" ADD FOREIGN KEY ("number", "number_sup")

REFERENCES "voyages" ("number", "number_sup");

ALTER TABLE "passengers" ADD FOREIGN KEY ("number", "number_sup")

REFERENCES "voyages" ("number", "number_sup");

ALTER TABLE "seafarers" ADD FOREIGN KEY ("number", "number_sup")

REFERENCES "voyages" ("number", "number_sup");

ALTER TABLE "soldiers" ADD FOREIGN KEY ("number", "number_sup")

REFERENCES "voyages" ("number", "number_sup");

ALTER TABLE "total" ADD FOREIGN KEY ("number", "number_sup")

REFERENCES "voyages" ("number", "number_sup");

Queries in MonetDB:

We performed the following ran the following queries on MonetDB, Oracle (SQL Developer) in both college and one on our system. The time given in the box is the time taken from their respective GUI clients.

The data used for the queries and schema is directly taken from the MonetDB website under VOC copy.

Note:

1. Time provided for the each query varies each time we run the query.

2. We have left out the Query plan as DBvisualizer used for the MonetDB is an trail pack it doesn’t support the query plan feature.

|Query |Execution time |

|Oracle: |.819 |

| |0.505(from the system) |

|select count(*) from “passengers”; | |

|MonetDB: |.0106 |

| | |

|select count(*) from Passengers; | |

|Query |Execution time |

|Oracle: |3.513 |

| | |

|select * from “craftsmen”; |1.007(from the system) |

|MonetDB: |0.406 |

| | |

|select * from craftsmen; | |

|Query |Execution time |

|Oracle: |4.97 |

| | |

|SELECT number from “impotenten”; |0.505(from the system) |

|MonetDB: |.093 |

| | |

|SELECT "number" from impotenten; | |

|Query |Execution time |

|Oracle: |1.0041 |

| | |

|SELECT COUNT(*) FROM "voyages" | |

|WHERE "particulars" LIKE '%_recked%'; |0.153(from the system) |

|MonetDB: |0.39 |

| | |

|SELECT COUNT(*) FROM voyages | |

|WHERE particulars LIKE '%_recked%'; | |

|Query |Execution time |

|Oracle: | |

| |0.56 |

|SELECT "chamber", CAST(AVG("invoice") AS integer) AS average | |

|FROM "invoices" | |

|WHERE "invoice" IS NOT NULL | |

|GROUP BY "chamber" |0.507(from the system) |

|ORDER BY average desc; | |

|MonetDB: |0.313 |

| | |

|SELECT chamber,CAST(AVG(invoice) AS integer) AS average | |

|FROM invoices | |

|WHERE invoice IS NOT NULL | |

|GROUP BY chamber | |

|ORDER BY average DESC; | |

| | |

|Query |Execution time |

|Oracle: |0.506 |

|SELECT voyages.number FROM voyages inner join craftsmen | |

|on voyages.number = craftsmen.number and | |

|voyages.number_sup = craftsmen.number_sup |0.504(from the system) |

|WHERE voyages.particulars LIKE '%_recked%'; | |

|MonetDB: |0.26 |

|SELECT voyages.number FROM voyages inner join craftsmen | |

|on voyages.number = craftsmen.number and | |

|voyages.number_sup = craftsmen.number_sup | |

|WHERE voyages.particulars LIKE '%_recked%'; | |

TPC-H Benchmark

The TPC Benchmark (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. It is stated that the results of the TPC-H benchmark ran by the MonetDB company is 10 times faster in Monetdb when compare to MySQL and PostgreSQL.

TPC-H SCHEMA[pic]

TPC-H DDL

Because of the difference in data types support in MonetDB and Oracle we needed to define DDL separately.

For MonetDB

create table part(

p_partkey int ,

p_name varchar(55),

p_mfgr char(25),

p_brand char(10),

p_type varchar(25),

p_size int,

p_container char(10),

p_retailprice decimal(10,5),

p_comment varchar(23));

alter table part add primary key(p_partkey);

create table supplier(

s_suppkey int ,

s_name varchar(25),

s_address varchar(100),

s_nationkey int,

s_phone varchar(20),

s_acctbal decimal(10,5),

s_comment varchar2(120));

alter table supplier add primary key(s_suppkey);

ALTER TABLE supplier ADD FOREIGN KEY (s_nationkey)

REFERENCES nation(n_nationkey );

create table partsupp(

ps_partkey int,

ps_suppkey int,

ps_availqty int,

ps_supplycost decimal(10,5),

ps_comment varchar(220));

alter table partsupp add primary key(ps_partkey, ps_suppkey);

ALTER TABLE partsupp ADD FOREIGN KEY (ps_partkey)

REFERENCES part (p_partkey );

ALTER TABLE partsupp ADD FOREIGN KEY (ps_suppkey)

REFERENCES supplier (s_suppkey );

create table customer(

c_custkey int,

c_name varchar(30),

c_address varchar(100),

c_nationkey int,

c_phone varchar(30),

c_acctbal decimal(10,5),

c_mktsegment varchar(20),

c_comment varchar(150));

alter table customer add primary key(c_custkey);

ALTER TABLE customer ADD FOREIGN KEY (c_nationkey)

REFERENCES nation(N_nationkey );

Create table nation(

n_nationkey int,

n_name varchar(50),

n_regionkey int,

n_comment varchar(200));

alter table nation add primary key(n_nationkey);

ALTER TABLE nation ADD FOREIGN KEY (n_regionkey)

REFERENCES region (r_regionkey );

Create table region(

R_regionkey int,

R_name varchar(25),

R_comment varchar(180));

alter table region add primary key(R_regionkey);

create table orders(

o_orderkey int,

o_custkey int,

o_orderstatus varchar(10),

o_totalprice decimal(10,5),

o_orderdate varchar(20),

o_orderpriority varchar(20),

o_clerk varchar(20),

o_shippriority int,

o_comment varchar(100));

alter table orders add primary key(o_orderkey);

ALTER TABLE orders ADD FOREIGN KEY (o_custkey)

REFERENCES customer (c_custkey );

Create table lineitem(

L_orderkey int,

L_partkey int,

L_suppkey int,

L_linenumber int,

L_quantity decimal(10,5),

L_extendedprice decimal(10,5),

L_discount decimal(10,5),

L_tax decimal(10,5),

L_returnflag varchar(10),

L_linestatus varchar(10),

L_shipdate varchar(20),

L_commitdate varchar(20),

L_receiptdate varchar(20),

L_shipinstruct varchar(30),

L_shipmode varchar(20),

L_comment varchar(50));

alter table lineitem add primary key(l_orderkey,l_linenumber);

ALTER TABLE lineitem ADD FOREIGN KEY (L_orderkey)

REFERENCES orders (o_orderkey );

ALTER TABLE lineitem ADD FOREIGN KEY (L_partkey)

REFERENCES part (p_partkey );

ALTER TABLE lineitem ADD FOREIGN KEY (L_partkey,l_suppkey)

REFERENCES partsupp (ps_partkey,ps_suppkey );

ALTER TABLE lineitem ADD FOREIGN KEY (L_suppkey)

REFERENCES supplier (s_suppkey );

For Oracle

create table part(

p_partkey integer ,

p_name varchar(55),

p_mfgr char(25),

p_brand char(10),

p_type varchar2(25),

p_size integer,

p_container char(10),

p_retailprice decimal(10,5),

p_comment varchar2(23));

alter table part add primary key(p_partkey);

create table supplier(

s_suppkey integer ,

s_name varchar(25),

s_address varchar(100),

s_nationkey integer,

s_phone varchar(20),

s_acctbal decimal(10,5),

s_comment varchar2(120));

alter table supplier add primary key(s_suppkey);

ALTER TABLE supplier ADD FOREIGN KEY (s_nationkey)

REFERENCES nation(n_nationkey );

create table partsupp(

ps_partkey integer,

ps_suppkey integer,

ps_availqty integer,

ps_supplycost decimal(10,5),

ps_comment varchar2(220));

alter table partsupp add primary key(ps_partkey, ps_suppkey);

ALTER TABLE partsupp ADD FOREIGN KEY (ps_partkey)

REFERENCES part (p_partkey );

ALTER TABLE partsupp ADD FOREIGN KEY (ps_suppkey)

REFERENCES supplier (s_suppkey );

create table customer(

c_custkey integer,

c_name varchar2(30),

c_address varchar2(100),

c_nationkey integer,

c_phone varchar2(30),

c_acctbal decimal(10,5),

c_mktsegment varchar2(20),

c_comment varchar2(150));

alter table customer add primary key(c_custkey);

ALTER TABLE customer ADD FOREIGN KEY (c_nationkey)

REFERENCES nation(N_nationkey );

Create table nation(

n_nationkey integer,

n_name varchar2(50),

n_regionkey integer,

n_comment varchar2(200));

alter table nation add primary key(n_nationkey);

ALTER TABLE nation ADD FOREIGN KEY (n_regionkey)

REFERENCES region (r_regionkey );

Create table region(

R_regionkey integer,

R_name varchar2(25),

R_comment varchar2(180));

alter table region add primary key(R_regionkey);

create table orders(

o_orderkey integer,

o_custkey integer,

o_orderstatus varchar2(10),

o_totalprice decimal(11,5),

o_orderdate varchar2(20),

o_orderpriority varchar2(20),

o_clerk varchar2(20),

o_shippriority integer,

o_comment varchar2(100));

alter table orders add primary key(o_orderkey);

ALTER TABLE orders ADD FOREIGN KEY (o_custkey)

REFERENCES customer (c_custkey );

Create table lineitem(

L_orderkey integer,

L_partkey integer,

L_suppkey integer,

L_linenumber integer,

L_quantity decimal(10,5),

L_extendedprice decimal(10,5),

L_discount decimal(10,5),

L_tax decimal(10,5),

L_returnflag varchar2(10),

L_linestatus varchar2(10),

L_shipdate varchar2(20),

L_commitdate varchar2(20),

L_receiptdate varchar2(20),

L_shipinstruct varchar2(30),

L_shipmode varchar2(20),

L_comment varchar2(50));

alter table lineitem add primary key(l_orderkey,l_linenumber);

ALTER TABLE lineitem ADD FOREIGN KEY (L_orderkey)

REFERENCES orders (o_orderkey );

ALTER TABLE lineitem ADD FOREIGN KEY (L_partkey)

REFERENCES part (p_partkey );

ALTER TABLE lineitem ADD FOREIGN KEY (L_partkey,l_suppkey)

REFERENCES partsupp (ps_partkey,ps_suppkey );

ALTER TABLE lineitem ADD FOREIGN KEY (L_suppkey)

REFERENCES supplier (s_suppkey );

TPC-H Data Insertion

We have downloaded the reference dataset for all the tables in the schema from tpc.prg. The reference data which we got was delimited by bars. So we had to use some program which takes each token and insert in the specified column.

We used a Java Program to insert rows into the MonetDB as the tool which we are using DbVisualzer(Trial Edition) did not allow us to import data. The following is the java program to insert data into customer table

import java.util.*;

import java.io.*;

import java.sql.*;

public class customerscanner {

public static void main(String[] args) throws FileNotFoundException {

Scanner in = new Scanner(System.in);

String insertSQL = " INSERT INTO "

+ " sys.customer "

+ " ( c_custkey,c_name,c_address, c_nationkey, c_phone, c_acctbal, " +

" c_mktsegment,c_comment)"

+" VALUES ( ";

String url = "jdbc:monetdb://localhost/demo";

Connection con;

Statement stmt;

try {

Class.forName("nl.cwi.monetdb.jdbc.MonetDriver");

} catch(java.lang.ClassNotFoundException e) {

System.err.print("ClassNotFoundException: ");

System.err.println(e.getMessage());

}

while(in.hasNext())

{

String sql = "";

String line = in.nextLine();

StringTokenizer tokenizer = new StringTokenizer(line);

String token = tokenizer.nextToken("|");

sql = insertSQL + token + ",";

token = tokenizer.nextToken("|");

sql += "'"+ token + "',";

token = tokenizer.nextToken("|");

sql += "'"+ token + "',";

token = tokenizer.nextToken("|");

sql += token + ",";

token = tokenizer.nextToken("|");

sql += "'"+ token + "',";

token = tokenizer.nextToken("|");

sql += token + ",";

token = tokenizer.nextToken("|");

sql += "'"+ token + "',";

token = tokenizer.nextToken("|");

sql += "'"+ token + "');";

System.out.println(sql);

try {

con = DriverManager.getConnection(url,"monetdb", "monetdb");

// System.out.println("Connected to the DB.");

// Create a Statement

stmt = con.createStatement();

// Execute the Statement

// System.out.println("\nExecuting statement ...");

if(stmt.executeUpdate(sql) ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download