Trio Database System Management



Table of contents:

1. Understand Trio new database system…………………...............................................................1

Trio System Architecture..........................................................................................................1

Uncertainty-Lineage Database (ULID)…….................................................................................1

Trio: The Trio Query Language……............................................................................................2

2.0 System requirements of installing Trio system.......................................................................3

3.0 Source codes/packages for Trio system……………………………………….........................................3

4.0 Procedure of Trio system installation......................................................................................4

Python……...........................................................................................................................4

Readline…...........................................................................................................................4

Ctypes……............................................................................................................................4

PostgreSQL……....................................................................................................................4

Graphviz………......................................................................................................................5

Easy_install.........................................................................................................................5

PyGreSQL............................................................................................................................5

Pylons……….........................................................................................................................5

PLY…….................................................................................................................................5

PyParsing.............................................................................................................................5

PyDot..................................................................................................................................5

Trio API................................................................................................................................5

5.0 Configuration of Trio System.…………………...............................................................................6

Windows superuser authentication to access PostgreSQL.................................................6

TrioExploerer…………………….………………...............................................................................6

TrioPlus…………………….……………...........................................................................................6

6.0 Experiment Trio DBMS Using TrioExplorer and TrioPlus…………..............................................7

TrioExplorer (Home, Schema, Script, and Help).................................................................7

TrioPlus...............................................................................................................................9

7.0 Trio API Integrated Into Other Python Scripts.........................................................................9

TrioCnx……………..................................................................................................................9

TrioCursor…………………........................................................................................................9

XTuple……………………………………………………………………..........................................................9

Alternative………………….......................................................................................................9

8.0 Trio API and Translator (Python)...........................................................................................10

9.0 Trio Query Language (TriQL) Structure..................................................................................10

Current supported for TriQL.............................................................................................10

Drop index/table...............................................................................................................11

Create Trio table/index.....................................................................................................11

TriQL language (uncertainty, Compute confidences).......................................................12

Table of TriQL contents.....................................................................................................16

10.0 Advantage and Disadvantage of using Trio DBMS ............................................................17

Advantage.........................................................................................................................17

Disadvantage....................................................................................................................17

11.0 Documenting Trio system report/bugs………….....................................................................18

12.0 Written Components………………………….……….......................................................................18

Turn-in documents............................................................................................................18

Demo and presentation....................................................................................................19

13.0 References………………..…….…………………………........................................................................19

1.0 Understanding of Trio new Database System:

Trio is a new kind of database system (DBMS), which was developed by Stanford University Lab at Dec, 2006. It is based on an extended relational model called Uncertainty Lineage Database (ULDB) [1], and also supports Trio’s query language Called TriQL [1]. This new database system technology handles structured data, uncertainty of data, and data lineage together in a fully integrated manner. In order to understand deeply and explicitly about this new Trio DBMS different than others, we are going to look at its architecture/design, ULDB data model, and TriQL.

1.1 Trio System architecture:

M. Mutsuzaki, M. Theobald, etc in [1] summarized in details of Trio-1.0 system architecture, design, and data structure handling. As seem in Fig 1-System architecture, there are four primary components comprised as Trio DBMS, including command-line client (TrioPlus), TrioExplorer, Trio API and translator (Python), and standard relational DBMS (PostgreSQL). Also standard relational DBMS (PostgreSQL) contained four essential elements – Encoded data table, Lineage table, Trio metadata, and Trio Stored Procedures. Clustering these four together becomes Uncertain Lineage Database (ULDB) [1] on handling uncertainty and lineage data during execution. Trio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts TriQL queries in addition to regular SQL, and query results may be x-tuples in the ULDB model as well as regular tuples.”

|(Fig 1) System architecture [1] |

1.2 Uncertainty-Lineage Database (ULDB)

ULDB, which called Uncertainty-Lineage Database, was mentioned in [2] as “the first database formalism to integrate uncertainty and lineage”. The solution of handling uncertainty and lineage is a primary key of developing Trio DBMS. What are uncertainty and lineage? As discovered by O. Benjelloun, A. Das Sarma , etc in [2], the definition of uncertainty is “…captured by tuples that may include several alternative possible values, with optional confidence values associated with each alternative.” And lineage is as “…associates with a data item information about its derivation.” After cleared out the concepts of ULDB, the next step focuses on ULDB data model which extended the standard SQL relational model into alternatives, maybe (‘?’) annotations, numerical confidence, and lineage [2]. In the following discussion and demo, I will use Drives (person, color, car) and Saw (witness, color, car) uncertainty tables with/no with confidence as an example. A table with blue color in TrioExplor is uncertainty and orange color is with computing confidence. The source code is listed in Learning Trio Query Language (TriQL) structure section.

1.2.1 Alternatives

In ULDB, it is consisted with x-tuple (x-relation). Each x-tuple can have one or more alternatives. As defined in [2], alternatives are presenting uncertainty about the contents of a tuple. For example in Fig 3, persons called Frank and Jimmy are x-tuples which contains one or more alternatives by using annotation ‘||’. By clicking on the ‘||’, x-tuple will be split into two alternatives for both Frank and Jimmy in Fig 4. Each alternative is a regular tuple. The reason of combining regular tuples as x-tuple is because the possible instances have same values/relations in particular fields.

|(Fig 2) uncertainty tables |(Fig 3) Drives table |(Fig 4) Drives table |

1.2.2 ‘?’ (Maybe) Annotations

‘?’ annotation present the existence of a tuple on the x-tuple, also called maybe x-tuple [2]. For example in Fig 3, there is a ‘?’ on the right side of Billy’s record. In specific by the definition, Billy may or may not sure what he saw is a blue Honda. If x-tuple has a ‘?’, all of alternatives will have ‘?’.

1.2.3 Numerical Confidences

Numerical confidence also was considered as probability [2]. In Fig 5 is a confidence table. When we view the tuples of Drivers table, there is a number attached with each x-tuple indicated the confidence as in Fig 6; Frank has 1.0, Johnny/Jimmy has 1.0, and Billy has 0.9. By looking at Fig 7, there is a detailed probability for each tuple. The total confidences of x-tuple can be less or equal to 1.0 [2]. In this example, I inserted the data along with confidences into Drivers table. Optionally, TrioExplorer will compute the confidences by probability if no confidence available.

|(Fig 5) w/confidence tables |(Fig 6) Drives table |(Fig 7) Drives table |

1.2.4 Lineage

Lineage, which as defined in [2], is “recorded at the granularity of tuple alternatives: Lineage connects as x-tuple alternative to other x-tuple alternative.” Specifically, lineage is a function which is used to derive alternatives for a set of alternatives. For an example, when the SQL query ‘select person from Drives’ was executed in TrioExplorer, the returned result is in Fig 8. The blue arrow in the picture presents the lineage. By clicking on it, we can see that Billy record was derived from an entire Billy record. By inserting this function in DBMS, the users can capture derivation of SQL query for better understanding of data relationship.

|(Fig 8) Drives table |(Fig 9) Drives table |

1.3 TriQL: The Trio Query Language

TriQL was developed specifically for querying Trio DBMS. It contains two major parts: built-in functions and predicates for querying confidence values and lineage, and regular SQL syntax [2]. As mentioned in [1], TriQL, which used to query over ULDBs, are translated automatically to SQL queries over the encoded tables. Here I will not discuss about TriQL query language and structure, but will be covered in Learning Trio Query Language (TriQL) structure section by using demo.

2.0 System Requirement of Installing Trio System

To know the system requirement before the installation is a very important step. It may the installation of Trio system much easier and compatible. The requirement mainly concentrates on type of operation system, version of database server (PostgreSQL), and Python. In [3], the web page describes the operating systems in which was experimented successfully with Trio System, such as Linux, Mac OS X, and Win-32 (XP, Vista, and 32-bit Server). In PostgreSQL as database server which supported Trio system function has various versions available in , including 8.2.5, 8.1.10, 8.0.14, and 7.4.18 [4]. As mentioned in postgreSQL website, it only supports Linux and Win32. Optimal copy for Trio system is PostgreSQL 8.1.10. The last one is Python API. It can run on windows, Linux/Unix, Mac OS X, OS/2, and Amiga [5]. In the Trio system installation section, I used Windows XP professional as one of operating systems to install Trio system.

3.0 Source Codes/packages for Trio System

In page listed 12 packages for completing the installation of Trio system. Each package has its own website which contained different versions of source codes corresponding with installation instructions. This section is to list all the resources for users to make their lives easier and install properly. (Please use listed versions of packages)

3.1 Listing source codes:

1. Python 2.4 can be downloaded from .

2. Easy_install can be downloaded from and the file called ez_setup.py.

3. Readline 1.7.win32 can be downloaded from .

4. ctypes-1.0.2.win32-py2.4 can be download from .

5. PostgreSQL 8.1 can be downloaded from .

6. Graphviz 2.14 is the only version compatible with Trio API. It is available in .

7. PyGreSQL can be downloaded from .

8. Pylons 0.9.5 can be downloaded from .

9. PLY 2.2 can be downloaded from .

10. PyParsing can be downloaded from .

11. PyDot can be downloaded from .

12. Trio API 1.0 can be downloaded from .

4.0 Procedure of Trio system installation

Since Trio is deployed on top of the above packages, following order below in installing them one by one will be much easier. Some packages are depending among the others packages. For convenience, most of the required Python modules (Pylons, PyGreSQL, etc.) can be installed via ''Easy Install'' for Windows platforms. Most of the procedures below were mainly from [3], but modified as needed.

1. Python

o Download Python 2.4 windows version (python-2.4.4.msi).

o Install Python in C:/Python directory

o Set path=c:/Python24; in environment variables

2. Readline - download Readline-1.7.win32-py2.4.exe and install into Python directory

3. Ctypes – download ctypes-1.0.2.win32-py2.4.exe and install into Python directory

4. PostgreSQL

o Download PostgreSQL 8.1 windows version (postgresql-8.1.msi )

o Install PostgreSQL 8.1 as following:

▪ Language selection (Fig 10) – English

▪ Introduction screen (Fig 11) – next

▪ Welcome message and instructions (Fig 12) – next

▪ Feature selection (Fig 13) – next

▪ Service installation (Fig 14) – check install a service, input account name ‘postgres’ and password.

▪ Initdb (Fig 15) – check initialize database cluster, super username and password.

▪ Procedural languages (Fig 16) – Check PL/pgsql only.

▪ Contrib modules (Fig 17) – check Admin81 only.

▪ Next in (Fig 18, 19, 20, 21) to complete the installation.

o Set path C:\Program Files\PostgreSQL\8.1\bin; after completed the installation.

|[pic] |[pic] |[pic] |

|(Fig 10) [4] |(Fig 11) [4] |(Fig 12) [4] |

|[pic] |[pic] |[pic] |

|(Fig 13) [4] |(Fig 14) [4] |(Fig 15) [4] |

|[pic] |[pic] |[pic] |

|(Fig 16) [4] |(Fig 17) [4] |(Fig 18) [4] |

|[pic] |[pic] |[pic] |

|(Fig 19) [4] |(Fig 20) [4] |(Fig 21) [4] |

5. Graphviz - Download Graphviz 2.14 version and install to your workstation and set path C:\PROGRA~1\ATT\Graphviz\bin; in environment variables after completed the installation

6. Easy_install - Download ez_setup.py in C:/ directory

7. PyGreSQL – In command line, cd\ to c: directory, and run python ez_setup.py PyGreSQL to install components.

8. Pylons – In command line, cd\ to c: directory, and run python ez_setup.py Pylons==0.9.5 to install Pylons. Set path c:\python24\Scripts in environment variables.

9. PLY – In command line, cd\ to c: directory, and run python ez_setup.py Ply==2.2 or easy_install Ply==2.2.

10. PyParsing – In command line, cd\ to c: directory, and run python ez_setup.py PyParsing.

11. PyDot – download the source from website. Access to folder in command line, and then install manually by running ‘python setup.py install’.

12. Trio API

o Download source code in any directory

o Copy Trio-1.0\spi\triospi_win32.dll to PostgreSQL’s lib directory and renamed as triospi.dll

5.0 Configuration of Trio System

Windows superuser authentication to access PostgreSQL

2 Double click in start->all programs->PostgreSQL 8.1->pgadmin III

3 Right click on Login Roles to create new login role (see in Fig 22)

4 Role name ‘myname’ (as same as windows login account)

5 Set password (Password can be any)

6 Check all role privileges and click ok.

7 Right click on Database to create new database (see in Fig 23)

8 Database name ‘myname’ (as same as username)

9 Owner is ‘myname’ and click ok

10 Initialize Trio schema information (see in Fig 24)

11 In Trio-1.0\setup, open setup.py with notepad to comment out the last three codes and put the following. (After complete the initialization, please change back to original).

os.system("psql %s %s < setup.sql" % (pgdbname, username))

os.system("psql %s %s < setup_triospi.sql" % (pgdbname, username))

os.system("psql %s %s < trio_get_conf.sql" % (pgdbname, username))

16 Save the file, and at the command line, cd \Trio-1.0\setup, and run ‘python setup.py myname myname’ in command line.

17 Provide password to create schema and done.

|(Fig 22) |(Fig 23) |(Fig 24) |

TrioExplorer [3]

21 Make sure PostgreSQL is working.

22 Running TrioExplorer – Ensure path ‘c:\python24\Scripts;’ in environment variables. And double click ‘start_te_server.bat’ under Trio-1.0\explorer.

23 At the command line, you are now prompted for an admin user login to PostgreSQL, which should have been created along with your PostgreSQL installation and which will be used by TrioExplorer to create new user roles and database instances.

24 TrioExplorer should now be reachable from your browser using . For new users can now press ‘Create a new user’ and create their own Trio login and database instances, which are then managed by the PostgreSQL server.

TrioPlus [3] – Trioplus is to use to prepare for a new user role and database instances be installed manually which should be different from the default PostgreSQL instance, and which requires logging in onto the PostgreSQL server with admin privileges.

27 Create new PostgreSQL user role and database instance

28 Run ‘createuser demo’

29 Run ‘createdb demo’, the name must be the same as username

30 Initialize Trio schema information for new user by access as same as windows superuser authentication to access PostgreSQL. Use TrioExplorer will be easily just press ‘Create new role’ in Web.

31 Connect to new Trio database using the command line clients by running ‘python trioplus.py –u demo –d demo –p’

6.0 Experiment Trio DBMS Using TrioExplorer And TrioPlus

|(Fig 25) |6.1 TrioExplorer: |

| | |

| |Fig 25 described the interface of TrioExploer in home |

| |tab after login in using login username and password. On|

| |the left side of the web page listed all symbols which |

| |used to show the types of tables/records. In the body of|

| |the page, a text-field area uses to execute TriQL. After|

| |press ‘execute’, the information will show up under with|

| |correct result. |

|(Fig 26) |Fig 26 is the schema tab in TrioExplorer. After created |

| |the tables of database, the diagram described the |

| |relations of tables, type of tables, and attributes. On |

| |the left side also listed the symbols and meanings of |

| |the diagram. |

|(Fig 27) |Fig 27 is a sample tab. It uses to load the sample |

| |script which was provided by the application for testing|

| |purpose. After load the sample script data, the |

| |application will automatically execute the queries in |

| |DBMS. Also a copy of sample script will list on the body|

| |of the page. |

|(Fig 28) |Fig 28 is a script tab in TrioExplorer which uses to |

| |load a script from a file in your workstation to the |

| |application to be executed. After loaded successfully, |

| |you can click on the script to put a single line of |

| |script to the text-field to be executed in DBMS. |

|(Fig 29) |Fig 29 is a help tab which helps users for common |

| |problems such as how to use all tabs, how to create |

| |database, and more. |

|(Fig 30) |6.2 TrioPlus |

| | |

| |As seem in Fig 30, by accessing to Trio-1.0/ directory |

| |at the command line, just type ‘python trioplus.py –u |

| |myname –d myname –p’ to connect to Postgres server. |

| |Please aware that myname account has to be created in |

| |Trio DBMS first. After established the connection with |

| |postgreSQL, you can execute any TriQL query statement in|

| |‘>’ as same as in the text-field of TrioExplorer. |

7.0 Trio API Integrated Into Other Python Scripts

Trio API contains many methods in the classes Triodb.py and xtyple.py under Trio-1.0 directory for integrating into other Python scripts. By importing both classes to Python scripts, it provides additional capability to handle tuple alternatives, computation of confidence, and lineage. As seem closely in the classes, the Trio API methods were designed to work very similar as Python DB-API interface [3]. There are several Trio API methods described detailed below was from [3].

7.1 TrioCnx

1. TrioCnx(pgdb) – This constructor method creates a new Trio connection from a given PyGreSQL connection pgdb, the default Python DB-API.

2. cursor() – return a new TrioCursor object for the current connection.

3. commint() – commits the current transaction.

4. rollback() – performs a rollback for the current transaction.

5. close() – closes the Trio connection (and the underlying pgdb connection)

7.2 TrioCursor

1. execute(triql) - Executes a TriQL statement triql for this cursor object.

2. fetchone() - Fetches a single XTuple object from the current cursor position.

3. fetchall() - Fetches and returns a list of all XTuple objects beginning from the current cursor position.

7.3 XTuple

1. len() - Returns the number of Alternative objects contained in this XTuple object.

2. getAlternative(idx) - Returns the Alternative object at the designated index idx.

3. getConfidence() - Returns the confidence value (if any) of this XTuple object as the sum of its Alternative objects' confidence values.

4. getQuestionMark() - Returns whether this XTuple object has a question mark or not.

7.4 Alternative

1. getLineage() – returns a list of immediate lineage information as (source-table, source-aid) pairs of this alternative.

2. traceLineage() – performs a transitive lineage traversal for this alternative back to the base data

3. getConfidence() - Returns the confidence value (if any) of this alternative.

4. computeConfidence() - Computes the confidence value of this alternative based on the traceLineage() function.

8.0 Trio API And Translator (Python)

Trio API and translator (Python) is one of important components in Trio system. Every Trio query language (TriQL), entered from either TrioExplorer or TrioPlus by users, will be translated into encoded data table, Lineage table, Trio metadata, and/or Trio Stored Procedures of ULDB, and be executed in relational database. Node.py in Trio-1.0 directory and lexer.py in Trio-1.0\trioparser directory contain classes, methods, TriQL keywords initiation for supporting and validating TriQL queries through Trio API.

| | |

|TriQL built-in functions |TriQL keywords |

| Aggregate(), Alternative(), As(), Avg(), BinaryExpression(), |'all', 'and', 'any', 'as', 'avg', 'by', 'cascade', ‘compute', 'conf',|

|Brackets(), Cascade(), ColumnList(), Column(), Command(), |'confidences', 'count', 'create', 'distinct', 'drop', 'eavg', |

|ComputeConfidences(), Conf(), ConfUniform(), |'ecount', 'emax', 'emin', 'esum', 'except', 'exists', ‘flatten', |

|ConfScaled(), Count(), CreateIndex(), |'float', ‘float4', 'float8', 'from', 'group', 'groupalts', |

|CreateTableAs(), CreateTable(), CreateTempTableAs(), |'having', |

|CreateTempTable(), DataType(), Dot(), DropIndex(), DropTableList(), |'in', 'index', 'insert', 'int', 'int4', 'int8', 'intersect', |

|DropTable(), Eavg(), Ecount(), EcountStar(), Emax(), |'into', 'is', 'like', 'lineage', #'lavg', #'lcount', #'lmax', |

|Emin(), Empty(), Esum(), FromClause(), GroupAlts(), GroupbyClause(), |#'lmin', #'lsum', 'max', 'maybe', 'min', 'noconf', 'nolineage', |

|GroupByKey(), HavingClause(), HorizontalSelect(), IdentifierList(), |'nomaybe', 'not', 'null', 'on', 'or', 'order', 'scaled', |

|InsertList(), Insert(), InsertSpec(), Lineage(), Literal(), Lsum(), |'scaledbyexp', 'select', 'sum', 'table', 'temporary', 'trio', |

|Max(), Maybe(), Maybe(), Min(), OrderBy(), Question(), |'trio_aid', 'trio_xid', 'uncertain', 'uniform', 'union', 'merged', |

|SelectClause(), Select(), SelectOptions(), SetOperator(), Star(), |'values', 'varchar', 'view', 'where', 'with' |

|Sum(), UnaryExpression(), UncertainSet(), UniformTable(), | |

|ViewTable(), WhereClause(), WithConfidences() | |

9.0 Trio Query Language (TriQL) Structure

As mentioned in [3], “Trio is based on the ULDB model, an extension to the relational model that adds both uncertainty (U) and lineage (L) as first-class concepts.” Therefore, Trio Query Lanaguage (TriQL) is designed for querying and updating ULDB database using Encoded data table, Lineage table, Trio metadata, and Trio Stored Procedures. Also discussed earily, ULDB is extended over the standard SQL. So TriQL contains not only built-in functions to handle uncertainty, query/compute confidence values, derive data lineage, but also regular SQL queries.

9.1 Current supported for TriQL

| | |

|Supported DDL and Insert Commands |Supported Subset of TriQL Query Language |

|Create Table |Select-Project-Join queries |

|Create Trio Table |Create Table T as |

|Create Index |Select Distinct |

|Drop Table - |Union, Intersect, or Except of two subqueries (duplicate-eliminating) |

|Including “Cascade” option to drop derived tables |Union All of two subqueries |

|Drop Index |Merged, Flatten, and GroupAlts |

|Insert Into T Values |Horizontal subqueries in Where clause |

|With or without alternatives, confidence values, and ? |Including shortcuts and aggregation/group-by/having |

|T must be a base table |Horizontal subqueries in Select clause |

|Insert Into T |Including shortcuts and aggregation but not group-by/having |

|T must be a base table |Conf() function with any number of tables or Conf(*) |

| |except for Conf(*) in the SELECT clause in conjunction with DISTINCT or |

| |MERGED |

| |Maybe() predicate |

| |“Uniform ” in From clause |

| |“as conf” for query-defined result confidence values |

| |Including “Uniform as conf” and “Scaled as conf” |

| |“Compute Confidences” at end of query |

| |Lineage() predicate |

| |Not in horizontal subqueries |

| |“=⇒” abbreviation allowed |

| |NoLineage, NoConf, and NoMaybe |

Based on the sample of TriQL in , I created the TriQL tables/indexes and inserted the uncertainty data and confidences for demo.

9.2 Drop index/table – TriQL can use ‘drop index/table’ statemetn to delete the table as same as the standard SQL statement over the DBMS.

drop index DRIVES_INDEX;

drop table Drives;

drop index SAW_INDEX;

drop table Saw;

9.3 Create Trio table/index – To create TriQL table for uncertainty, the word ‘Trio’ must involve in the create statement ‘create trio…’ as below. Also a ‘uncertain()’ funciton in the same statement helps to define which field inside the table will be considerred as uncertainty. If the tail of the statement has ‘with confidences’, it means the table can be insertted with confidences and compute confidences during the query execution. ‘[…|…]’ means x-tuple. ‘?’ as mentioned above is alternative. ‘:number’ is the confidence.

---------------------------------------------------------------------

-- Creating tables and indexes, inserting data

---------------------------------------------------------------------

create trio table Saw(witness varchar(32), color varchar(32), car varchar(32), uncertain(color, car));

create index SAW_INDEX on Saw(color, car);

insert into Saw values [('Amy','blue','Honda') | ('Amy','red','Toyota')] ?;

insert into Saw values [('Betty','green','Mazda') | ('Betty','green','Toyota') | ('Betty','green',NULL)];

insert into Saw values ('Cathy','red','Acura') ?;

insert into Saw values [('Diane','red','Toyota') | ('Diane','blue','Toyota')];

create trio table Drives(person varchar(32), color varchar(32), car varchar(32), uncertain(person, color, car));

create index DRIVES_INDEX on Drives(person,color, car);

insert into Drives values [('Frank','red','Toyota') | ('Frank','blue','Toyota')];

insert into Drives values ('Billy','blue','Honda') ?;

insert into Drives values [('Jimmy','green','Mazda') | ('Johnny','green','Mazda')];

---------------------------------------------------------------------------------------------------------------

- With confidence value

---------------------------------------------------------------------------------------------------------------

create trio table Saw(witness varchar(32), color varchar(32), car varchar(32), uncertain(color, car)) with confidences;

create index SAW_INDEX on Saw(color, car);

insert into Saw values [('Amy','blue','Honda'):0.4 | ('Amy','red','Toyota'):0.3];

insert into Saw values [('Betty','green','Mazda'):0.5 | ('Betty','green','Toyota'):0.2 | ('Betty','green',NULL):0.3];

insert into Saw values [('Cathy','red','Acura'):0.6];

insert into Saw values [('Diane','red','Toyota'):0.2 | ('Diane','blue','Toyota'):0.8];

create trio table Drives(person varchar(32), color varchar(32), car varchar(32), uncertain(person, color, car)) with confidences;

create index DRIVES_INDEX on Drives(person,color, car);

insert into Drives values [('Frank','red','Toyota'):0.7 | ('Frank','blue','Toyota'):0.3];

insert into Drives values [('Billy','blue','Honda'):0.9];

insert into Drives values [('Jimmy','green','Mazda'):0.4 | ('Johnny','green','Mazda'):0.6];

9.4 TriQL language

|(Fig 31) |9.4.1 uncertainty |

| | |

| |To create Saw (witness, color, car) and Drives (person, color, car) in the |

| |DBMS without confidences, basically copy the first part of above script |

| |into the text-field area under home tab and press ‘execute’. The |

| |application will execute and create two tables, Drives and Saw as seem in |

| |Fig 31. All the data was successfully inserted into the table. The green |

| |color of the table means uncertainty. |

|(Fig 32) |Selection - In Fig 32 shows the results which selectted all reconrds from |

| |Saw table if its car is toyato while executing the TriQL ‘select * from Saw|

| |where car = ‘Toyato’. In the table, the blue arrow on the left side |

| |presents the lineage, and right side with ? presents the alternative. The |

| |3D in the last recod contains two tuples (x-tuple). |

|(Fig 33) |Projection – Fig 33 shows the results of executing the query ‘Select color |

| |From Saw Where car = 'Mazda' or car = 'Toyota'. this query will find the |

| |colors of sighted Mazdas and Toyotas. |

|(Fig 34) |Projection (Merged) – Fig 34 shows the results of executing the query |

| |‘Select Merged color From Saw Where car = 'Mazda' or car = 'Toyota'. This |

| |query will find the colors of sighted Mazdas and Toyotas. Merged in query |

| |statement will eliminate the duplicated tuples in horizontal. |

|(Fig 35) |Join – Fig 35 shows the results of executing the query ‘Select S.witness, |

| |D.person as suspect, D.color, D.car |

| |From Drives D, Saw S |

| |Where D.color = S.color and D.car = S.car’. the query joins Saw and Drives |

| |table based on the same color. Using merged fuction in query statement as |

| |‘Select Merged D.person as suspect, D.color |

| |From Drives D, Saw S |

| |Where D.color = S.color’ will help to eliminate the duplicated tuples. |

|(Fig 36) |Duplicate-Elimination (Distinct) – Fig 36 shows the results of executing |

| |query ‘Select Distinct color, car from Saw’. The query scan the entire Saw |

| |and return color and car for each tuple. By using distince funciton will |

| |eleminiate any vertical duplicated tuples. |

|(Fig 37) |Flatten – Fig 37 shows the results of executing query ‘Select Flatten * |

| |From Saw’. Flatten is used to turn tuples with alternative values into |

| |regular tuples. The query converts table Saw into a list of sightings [3]. |

|(Fig 38) |GroupAlts - Fig 38 shows the reuslts of executing the TriQL ‘Select |

| |GroupAlts(color) * From Saw’. The query takes a list of alternatives, and |

| |reorganizes the Saw table so the data is ‘keyed’ on color and uncertainty |

| |about car and witness [3]. The GroupAlts funciton is used to create or |

| |restructure alternative values. |

|(Fig 39) |Horizontal Subqueries: The [ ] in the Where Clause – Fig 39 Shows the |

| |results of executing the query ‘Select * From Saw |

| |Where 2 0.3’. The conf() takes a single table name |

| |or variable from the ‘From’ clause as a parameter, and returns the |

| |confidence of the current alternative being evaluated from that table. The |

| |query join Drives and Saw tables to generate supects if confidence > 0.3. |

| | |

| |Conf() for Multi-table: ‘Select S.witness, D.person as suspect, D.color, |

| |D.car |

| |From Drives D, Saw S |

| |Where D.color = S.color and D.car = S.car |

| |And Conf(S) > 0.3 And Conf(*) > 0.2’ condition for result alternatives with|

| |confidence > 0.2. |

|(Fig 43) |Built-in function Maybe() – Fig 43 shows the results of executing query |

| |‘Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw |

| |SWhere D.color = S.color and D.car = S.car And (Not Maybe(D)) And (Not |

| |Maybe(S))’. Maybe() takes a table name or variable from the ‘From’ clause |

| |as a parameter, and returns true if and only if the current alternative |

| |being evaluated from that table comes from a tuple with a ‘?’. |

|(Fig 44) |Uniform result confidence – Fig 44 shows the results of executing the query|

| |‘Select *, uniform as conf From Saw’. The uniform used with ‘as conf’ |

| |assigns confidence values to a tuple as, if the tuple has n alternatives |

| |and no ‘?’, assign confidence 1/n to each alternative, or if the tuple has |

| |n alternatives and a ‘?’, assign confidence 1/(n+1) to each alternative. So|

| |the query computes and returns the confidences. |

|(Fig 45) |Scaled result confidence – Fig 45 shows the results of executing the query |

| |‘Select *, scaled as conf From Saw’. scaled used with ‘as conf’ assigns |

| |confidence values to a tuple, If the tuple has no confidence values and n |

| |alternatives, assign confidence 1/n to each alternative, or If the tuple |

| |has confidence values that sum to s, assign confidence value c/s to each |

| |alternative, where c is the existing confidence value for that alternative.|

| |So the query computes and returns the confidences. |

9.5 Table of all possible TriQL contents – in [3] about the TriQL Query Language (website ), it listed all built-in functions for querying ULDB. For convinient, I collected all and put them in a table below.

|ULDBs |Uncertian attibutes, maybe annotations and confidence values |

|SQL over ULDBs |Selection, projection, join, subqueries, duplicate-elimination, |

| |grouping and aggregation, aggregate variants, set operators, order by |

|Flatten and GroupAlts |Flatten is used to turn tuples with alternative values into regular |

| |tuples, while GroupAlts is used to create or restructure alternative |

| |values |

|Horizontal subqeries: The [ ] construct |[ ] in the where clause, [ ] with joins, Syntactics shortcuts in [ ], |

| |[ ] in the select clause, [ ] with Self-Joins |

|Builit-in Functions Conf() and Maybe() |Multi-table conf() |

|Result confidences |Result confidence evaluation, uniform and scaled result confidences, |

| |On-Demand confidence computation |

|Built-in Predicate Lineage() |The Lineage() predicate lets queries filter joined tuples based on |

| |whether they are related via lineage |

|Options Nolineage, Noconf, and NoMaybe |Indicate lineage, confidence values, and/or?'s should be omitted from |

| |query results |

|Data modificaiton |Insert statement, delete statement, update statement |

10.0 Advantage and Disadvantage of using Trio DBMS

10.1 Advantage

10.1.1 Open source and free support for any non-benefit users to experience new Trio DBMS

Trio DBMS was created by Stanford University at Dec, 2006 and updated very often with new plug-in and Trio version. If uncounted any problems, the supporter in Stanford University, like Martin, Theobald, will provide solutions. Also there are many resources available for users to better understand about Trio DMBS in term of design/architecture, components, query structure, and more.

10.1.2 Advanced components in relational DBMS

Current relational DBMSs, like Oracle, SQL, DB2, and more, can handle well-structure data easily, but not any uncertainty data and lineage. Hence, the Trio DBMS is more advanced. In Stanford Trio project, the ULDB was successfully built on top of postgreSQL. I believe later, it can be implemented on any relational DBMS as one of plug-in.

10.1.3 Computing confidences

Computing Confidences is one of function in Trio System. It is very important for calculating the probability of each tuple in total. From the marketing perspective, the probability gives the better view of users to make decision, data analysis, and statistics.

10.1.4 Efficient, Convenient, safe, Multi-User storage of and access to, Massive, Persistent data [6]

Persistent and convenient is because all data stored in simple tables (“relations”) and queries and updates via simple but powerful declarative language (SQL). In data transactions, it is safer, and also allows multi-user access. The DBMS can storage and indexing structures which makes it massive and efficient.

10.2 Disadvantage

10.2.1 Time cost for query

In Trio DBMS query structure, the time costing to query uncertainty, lineage, and confidences, or regular SQL is more than regular relational DBMS because Trio needs to create super set of tuples which requires several extra steps. As below example, ULDB takes four steps and relational database takes only two steps. Therefore, relational database query is much faster than ULDB.

Using ‘SELECT attr-list FROM X1, X2, ..., Xn WHERE predicate’ as a query example in [6] for a comparison between relational database and ULDB.

Over standard relational database:

For each tuple in cross-product of X1, X2, ..., Xn

1. Evaluate the predicate

2. If true, project attr-list to create result tuple

Over ULDB:

For each tuple in cross-product of X1, X2, ..., Xn

1. Create “super tuple” T from all combinations of alternatives

2. Evaluate predicate on each alternative in T ; keep only the true ones

3. Project attr-list on each alternative to create result tuple

4. Details: ‘?’, lineage, confidences

10.2.2 Dependency

Trio DBMS is not a completed DBMS but depending on any relational DBMS, like postgreSQL. The TrioExplorer was built in using python and other packages. Later some packages was updated to new version, for instance, graphviz 2.16, it is not compatible with TrioExplorer which will cause problems. Therefore, in order to keep Trio DBMS running, it requires efforts to maintain, reconfigure, and update.

10.2.3 On development stage - Trio project has not been developed completely with all functionalities, such as TrioExplorer for DB admonition, DB security, and more.

11.0 Documenting Trio System report/bugs

During the experiment of Stanford Trio project, I went through the Trio instalation, TrioExplorer, TrioPlus and TriQL. There are several places with mistakes/not correct/imcomplete.

1. The install instruction in website , indicated unclearly the version of Graphviz for Trio system. In Graphviz website only has version 2.16 but not compatable except version 2.14. Graphviz version 2.14 is available for download in .

2. The windows authentication supperuser needs to be created first in the PostgreSQL in order to connect to database. After established the connection, TrioExplorer and TrioPlus can use the supperuser’s login and password as windows authentiction to access to database system. However, it doesn’t mention at all in the installation procedure on how to create this typle of new user. The only way to solve it is to use PostgreSQL->pgadmin III manually.

3. After created the supperuser, I have to modify some codes in setup.py in Trio-1.0->setup directory in order to run ‘python setup.py –u myname –d myname –p’.

4. TriQL query statements in , there are many samples queries not working properly as desired.

12.0 Written Components

12.1 Turn-in Documents

Turn in Special interesting activity task plan summary, Activity deliverables, Standard, minimal documentation of hands-on work, and Electronic copy of activity.

12.2 Demo and Presentation

12.2.1 Presentation of Trio database system architecture

Brief description of Trio database system architecture, ULDB data model, Python API, TriQL query structure, features, installation step-by-step instruction, Python Sever Installation step-by-step instruction, and more

12.2.2 Demo

Launch Python server, PostgreSQL, TrioExplorer, TrioPlus, TriQL (Create tables, indexes, insert, select, update, more query statement).

12.3 Reflective written component

After spent some many hourses experimenting Trio project, I learned about python (server, coding, and executing programs), PostgreSQL (how different than other DBMS, installation, etc), Trio (installation, architecture, TriQL, and ULDB, TrioExplorer, TrioPlus), and more. during the experiment, I found out several things are very important to complete my goal such as support, resources, environment, and scale. The support means where to get help if encouted problems. Resources can provide a bais picture of the entire system, so I know what I am working on. Environment is where to experienment the project and what the operating system is, what the database server is, and more. The last one is the scale. To make sure to complete the project on time, I have to know how big the project is in order to manage my time. I felt that this is a greate experince to establish skill and ability in term of trouble shooting, problem solving, and adopting new technology.

13.0 References

1] M. Mutsuzaki, M. Theobald, A. de Keijzer, J. Widom, P. Agrawal, O. Benjelloun, A. Das Sarma, R. Murthy, and T. Sugihara. Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS. Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR '07), Pacific Grove, California, January 2007. Demonstration description.

2] O. Benjelloun, A. Das Sarma, C. Hayworth, and J. Widom. An Introduction to ULDBs and the Trio System. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases, 29(1):5-16, March 2006.

3] Trio: A System for integrated Management of Data, Uncertainty, and Lineage. Retrieved on November, 18, 2007 from .

4] PostgreSQL. Retrieved on November, 20, 2007 from .

5] Python. Retrieved on November, 15, 2007 from .

6] Trio: A System for Data, Uncertainty, and Lineage. given by Jennifer at various venues, 2006-07. Ppt.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download