California State University, Sacramento



Homework#3 (h3) CSC204 Prof MitchellPosted: March 14, 2021 Revised!, March 15: Extra credit due date Thursday April 1, 11:59PM Regular due date Friday April 2, 11:59PMRevised!, March 21: Original Q#3 moved to Q#4 and (new) Q#3 is a first tp exerciseIntroductionh3 is in some ways a “transition” from foundations: schema-based DBs, physical (table) data representation, AMs. And the most important component of the SQL engine, a good qo.The next part of the course covers a start on advanced aspects of SQL AND fundamentals of tp.Q#0 the usual HandIn as with all home works this semesterQ#1 The SQL language has evolved from its first published standard in 1986. Many very useful and advanced such features have come since 1999 and yet a substantial number of professional DB developers either do not know about them, or if they do know, they do not take advantage of them.The rest of this course will cover a modest number of advanced SQL capabilities, ones that have more likely been in the SQL standard for some time, thus with more of a chance of being adapted in real appsOur first work with advanced SQL is looking at places in a SQL statement that permit anonymous (un-named) nested queries. Later in the semester, nesting is extended by ways in which a nested query can be named and the referenced in other places within one larger query – this permits a query result to act like a named temporary table during execution of such multi-nested queries.This question calculates some index physical storage structure info using:* some pre-calculated numeric constants* a SQL query containing multiple nested queries in the SELECT args* bind variable values (3 weeks from now we will go into detail on the importance of bind variables as an interface between SQL and external languages such as Python/Java/C#/JavaScript or even SQL*Plus.)The goal of this question is to compare the (index leaf block count) in the DD for a specific SH table to (a calculation of that same index block count using only ONE SQL query and several pre-computed bind variables).The specific 3GL used here for bind variables is an Oracle built-in 3GL named PL/SQL. Using PL/SQL eliminates the need to spend time on SQL < -- > 3GL connectivity or related 3GL-specific setups needed for 3GL < - > SQL communications.Step 1 - The index leaf block count obtained using the DD index info is easy:Retrieve the value in the column named LEAF_BLOCKS from all_indexes for the index on the “normal” type PK of table SH.CUSTOMERS.The remaining steps detail how to use some PL/SQL bind variables as terms in ONE SQL nested query calculation of the same leaf index block count as in Step 1. We have already covered enough detail on the Oracle normal PK index structure to accomplish this calculation.PL/SQL is automatically available in the kinds of Oracle sessions done so far. It is only necessary to become familiar with how PL/SQL communicates with Oracle SQL. One approach is to use a so-called anonymous block of PL/SQL statements, “anonymous” because such a block of code is not known to the DD – it exists only as s source file of PL/SQL statements of the form shown below (with annotations).Only new statements are annotated in Blue in steps below, not SQL/SQL*Plus statements already familiar to you.Implement all of Step 2 through Step 6 below in one source file named xxx.plsql (a .plsql file name is an Oracle convention name for a source text file containing a combination of PL/SQL, SQL, and SQL*Plus statements)Note - Each PL/SQL statement is terminated with “;”, but not keywords such as declare, begin (or exception covered later)Step 2 – code all statements below stopping just before the begin set echo onset serveroutput on< -- Enable output to user display (strangely, by default, it is OFF)spool h3q1s21.ans/* h3q1s21.sql */ < -- Familiar SQL*Plus comment format, simply to echo this script’s filename to spool/* Next, PL/SQL declare section of variable declarations with and without initialized values */declare< -- When bind variables are used, they must be declared in a “declare” sectionleafBC number;custRC number := 55500; < -- type number is INHERITED from (i.e. same as) Oracle native data type numberindexBlockSize number;Note that the assignment operator is “:=”blockHeaderSize number := 113+(2*24); < -- Published Oracle size (bytes) of the fixed part of a block headeravailIndexBlockSpace number;Step 3 – Code the statements from the begin until just before Step 4This part of the code calculates two bind variables needed in a later stepbegin < -- Begin statement starts the body of executable statementsselect value into indexBlockSize < -- Query DD view v$parameter to get the fixed DB block sizefrom v$parameter Note – must store query results into >=1 bind variable via select xxx into someBindVariable …where name = 'db_block_size';YOU define HERE a bind variable whose value is 90% of difference of blockHeaderSize from indexBlockSize, and name this variable availIndexBlockSpacedbms_output.put_line( 'Available index block space = ' || availIndexBlockSpace ); < -- Display/verify the value32762841772800PRELIMINARIES for next step -You must develop ONE SQL statement (refer to as S below) that stores into a bind variable (here named leafBC, but name it anything) the final index leaf block count. An Oracle query can include the above declared an initialized bind variables. In addition SQL SELECT args are not restricted to table or view column names. In fact, a SELECT arg can be an expression containing >= 1 nested query.Oracle documents an expression for a normal Oracle PK leaf index block entry size (in bytes) =2 + (rowid size) + average(PK size) , where2 is an internal constant number of overhead bytes per entry, and rowid and PK sizes are obtained by querying the DUAL table and sh.customers table using the Oracle built-in function vsize Step 4 - Combining all above preliminariesCode below ONE SQL query (refer to it here as “Q”) that stored in bind variable leafBC the leaf index block count for the PK on table sh.customers. Subexpressions Q must include:Previously computed availIndexBlockSpace valueThe constant value custRC, from the declare sectionTwo nested queries calculating of a) a rowid size and b) the average columnsize of PK values in sh.customers522568357114 dbms_output built-in package method put_line has string and variable args“||” is concatenation operator00 dbms_output built-in package method put_line has string and variable args“||” is concatenation operator (the nested queries can use the Oracle built-in function vsize to get a) and b) above292798517970500Step 5 – code a simple last summary display usingdbms_output.put_line('Number of index leaf blocks in sh.customers PK index = ' || leafBC);Step 6 – Code a PL/SQL end statement, and the following 2 lines that compile the PL/SQL source, and report errorsend;/< -- Always terminate each PL/SQL block with these short directives on 2 separate linesshow errors Doing so will a) translate the source script/file and b) display translation errors, if any(as long as serveroutput is ON)spool offset serveroutput offset echo off You should get a results leafBC value that is quite close to the Step 1 results. With Steps 2-5 you are using SQL (rather than by hand) to calculate the same quantities that Oracle used to produce its results in Step 1 More Hints on Step 4.==================Step 4 is where Steps 2 and 3 must come togetherSpecifically, understand1) the Oracle DUAL table is primarily used to see how an Oracle SQL expression is evaluatedThe simple demo is:SQL> select 2 + (39/5) from dual; 2+(39/5)---------- 9.8You need to combine dual use with SQL SELECT statement knowledge:SELECT arguments can include >= stand-alone queries, such as:SQL> select 3 + (select max(salary) from company.employee)/4 from dual;3+(SELECTMAX(SALARY)FROMCOMPANY.EMPLOYEE)/4 <= Ugly column label: SQL*Plus column command fixes that------------------------------------------- 13753and this confirms that SQL did the arithmetic correctly:SQL> select max(salary) from company.employee;MAX(SALARY)----------- 55000 Extrapolate on what the above means ==> You can develop some quite creative expressions asSELECT targets ... you can have MORE THAN ONE stand-alone query in a SELECT argument.HOWEVER, none of such "nested" queries is delimited by ";" each such parenthesized query can be treated as a "first-class" expression term.THERE IS SOMETHING DEEPER GOING ON HERE ...Namely, everything in SQL is a table - data is stored in "tables" (as an RDBMS user sees it), andthe results of each SQL query is also a table. This is significant.In Q#1 we use that fact that a query with a 1-row,1-column result is a "table", but such atable can appear as an expression term in SQL syntax; here such a term is a nested query result.= = > More on other aspects of SQL query nesting later in course.HandIn all spooled results from Steps 1 through 6Q#2 Individual transactions (tr) bring another context into ordinary SQL processing. DB developers must be knowledgeable on how a SQL statement “inside” a tr is affected when that tr runs on a DBMS and conversely, how a tr is affected by execution of its statements. Only then can DB developers build apps that are a) correct b) scalable and c) provide informative and useful error diagnosticsModern tp systems involve several language layers:1) SQL + 2) SQL < -- > SQL command interface + 3) tp + 4) SQL < -- > 3GL + 5) web/internet/distributed DBCSC204 cannot cover the basics of all of 2) through 5), but we will study core concepts and constructs of 2), 3) and 4).This question involves some basic interactions for 2) and 3) above.Review your previous DB course work (the first class you took) on the SQL commit and rollback statements. They are part of the SQL language, but are primarily used to specify at the coding level and control at execution time how a sequence of SQL statements behave as “unit of processing”.Specifically, SQL commit and rollback are each quite complicated operations related to how a tr “finishes”.Implement the following steps in a spool script:Issue a commit statement (Note - you must terminate it with “;” since it is a legal SQL statement and we are using the default Oracle SQL statement terminator “;”)Create a simple (Oracle type number) 1-column table T (you can name your table any name) with a PK constraintINSERT three rows into T having column values 1, 2, and 1Display the rows of TIssue a commit statement < == makes all effects of steps XXX permanent in the DBSteps 3 and 4 illustrate how a sequence of SQL statements is processed, namely, as individually independent DML operationsDelete all rows of T, but do not drop T – revert it back to an empty table (resets for next part)Issue another commit statementInitiate an EXPLICIT transaction start (source code, not DB defaults is declaring the tr start) set transaction name 'demo_tr'; < = tr name can be any Oracle SQL stringWHENEVER SQLERROR CONTINUE ROLLBACK(WHENEVER is a SQL*Plus statement specifying that if any kind of SQL-related error occurs during this transaction, this Oracle session continues on rather than EXITing back to OS level (CONTINUE does this), and, because of a detected SQL-related error, this tr is rolled back (ROLLBACK does this). Rollback is the SQL command to cancel all/any changes that this tr applied to the DB before it was aborted by the rollback statementContinue the tr (first 2 of its statements colored Blue above) by adding the same exact three INSERT statements done in step 3(You now have a complete 5-statement tr – because this tr is a demo, and intended to fail, there is no need to add a commit statement after the fifth statement – if demo_tr really is a tr, all of its DML effects should be cancelled by the ROLLBACK clause of the WHENEVER, and we see that in fact happened!)Finally display all rows of your table, verifying the very different result in tr context vs. successive stand-alone SQL statementsHandIn spooled results from all Steps 1-10Q#3 – A first exercise in tp that displays transaction synchronization. Such a snapshot shows why a tr is waiting and what that tr is waiting for … Create a table with schema T(c1,c2), cj Oracle type number; c1 is the PK column; you can specify any name for the table.T1, T2, and T3 represent actions of 3 concurrent Oracle sessions in your schema. Create a source x.plsql file for each Tj.Each session’s actions at logical times tj must be implemented as shown.You must have a script ready to run corresponding to the actions shown for each Tj in the Chart below.The dbms_lock built-in Oracle package simulates Tj operations interleaving: the pause in T1 progress at time t2, is when T2 starts execution at time t3. Similarly, other dbms_lock calls are used for simulating tr pausing and interleaving.Start the Tj as quickly as you can in relative order indicated. (Recall: SQL> @ x.plsql executes a PL/SQL script file)The sleep(10) (the parameter is in seconds time unit) provides more than adequate time to start T1 relative to when T2 started, and to start T3 relative to when T2 started; the sleep argument value specified gives you (more than) adequate time to manually launch T1, then T2, and then T3 so that the intended tr interleaving is simulated properly.time1072169123405T1479498123720T229160793177T3LEGEND: (ordinary) user trs Observer tr (only for time t4 and t6)t0START T1 with:/* T1 */Create your table T as specified, and create one row (1,1);commit;t1select systimestamp from dual;update T set c2=v1 where c1=1; ( -- Update existing row “r” ; v1 can be any value different from 1) t2dbms_lock.sleep(10); < --(make this the single statement in a PL/SQL anonymous block)t3START T2 (in < 5 seconds) with:/* T2 */select systimestamp from dual;update T set c2=v2 where c1=1; ( -- an update to same row “r” as accessed by T1; v2 can be any value different from 1 and v1 ) t4*** This session hangs until T1 *** does commit or rollbackSTART T3 with:/* T3 */select systimestamp from dual;Then, execute the script:display_held_locks_by_user.sql(Prefix the script’s file name with path shown in the footnote at the end of this question)At prompt, input your Oracle username)t5commit; -- Finalize T1 t6select * from T;HandIn: spooled results of sessions T1, T2 and T3 in this order. As shown in the above Chart, include a comment in the form /* Tj */ at the start of each Tj script to clearly identify each.Notes about cleanup/reset between Q#3 solution attempts/retries As P3 tp course notes indicate, an Oracle tr does not implicitly begin when (any number of) queries only are executed. This is because Oracle tp failure recovery is based on physiological tr logging. That is, Oracle queries are not captured in tr recovery logs. (Other RDBMSs may, and do DIFFER from this approach).= = > T1 and T2 are Oracle transactions, but the activities done in session T3 do NOT constitute a tr.The above line means that Q#3 involves 2 tr; however, according to the Chart of Tj actions, T1 does a commit;, but T2 is not finalized.= = > Before re-trying Q#3 after a previous failure/error result, you must finalize T2.Given that you need to repeat Q#3 after the previous failed attempt, finalize T2 with either commit;/rollback --it does not matter which is used because you will repeat your solution attempt.The above discussion is critical for understanding Oracle tp behavior.IF you finish a failed session, and T2 is not finalized, it is likely that your next attempted Q#3 execution will FAIL, with the following error when T1 starts execution, and tries to (re)-create table T:ERROR at line 1:ORA-00955: name is already used by an existing objectThe reason for this error should be clear – the previous-run’s T2 session still holds a row UPDATE lock on T’s one-and-only row. So, assuming your T1 script does a DROP table BEFORE (re)creating T, the DROP TABLE obviously will fail since another tr is still accessing T.= = > Given the above discussion, session T3 requires NO CLEANUP (commit or rollback) after a run because it is not a tr Conclusion – be sure to properly finalize a previous execution of Q#3 before attempting another run Q#4 Module P2 Slide#57 derives the cost (cost defined as total time to hash all rows by increasing PK value) of processing a large file/table in PK order for a file/table in hash physical org. (Use table terminology for rest of question). Also, for the Slide#57, change the time in phrase “ avg(random disk bucket read) = 1X10-3 sec” to 1X10-4 secReducing disk bucket read time by a factor of 10 better approximates SSD storage VS older MHD disks systems“Large-scale” sorting in the DB world is defined as a sort that cannot be completely done in memory, and that spills over and uses the DB shared temporary tablespace to save sections of sorted data as a sort progresses.How much large-scale sorting happens in typical university courses ?= > Not much, as seen below, derived from a 7-year history of cscoracle use in CSUS DB courses:SQL> select name, value from v$sysstat where name like 'sort%';NAME VALUE-------------- --------sorts (memory) i.e. sorts that did not need to use disk 48703770sorts (disk) i.e. sorts that spilled over to temp disk 9226RDBMSs internally implement various commonly-known sort algorithms, but use DB context to optimize sorting.Overview of sorting in Oracle – typical of RDBMSs – Credit Craig Shallahamer, Oracle consultant“ As the sorting-focused-super-simplified Oracle architecture figure below shows, the key players are a client process, foreground process, PGA memory, temporary segments and temporary tablespace .dbf files.13231094137311004455330603003789218547313ProcessesMemory / Cache levelsDisk00ProcessesMemory / Cache levelsDisk6858001378585304800013647303034145105993023414182979301711036715010Here is how it works. When a user process (think: sqlplus) instructs its foreground process (think: Oracle) to run a SQL statement that must sort (think: order by), the foreground process will attempt to do the entire sort in memory. More specifically, within the foreground process memory, known as the PGA memory. That's the good news.The bad news is, if the sort is too big to be completed in the allocated PGA memory, the foreground process will divide the sort into smaller pieces, commonly called, "sort runs." When a sort run is not immediately needed, foreground process can write it to disk in a temporary segment. If needed, foreground process can read it back into its PGA memory.While this divide and conquer approach works, as you can image with the additional work including IO, it can take a while. So, while sorting is part of normal data processing, at some point it can become a significant amount of database time. The trick is to correctly diagnose the situation. … “?How fast can we sort? – Answer - it depends on who/what/how the sort happens.The plain/ordinary Linux sort utility typically can sort 2 billion bytes in about 25 sec, so extrapolate our 200 million rows of data with 200 byte row size = > 40 billion bytes, so the given hash org file can be sorted in approx. 500 sec = approx. 8 minutes.Sorting the same data set in Oracle will probably take somewhat longer because Oracle uses its own memory rather than theoretically the memory Linux is willing to allocate to a sort process. A bad consequence of this is that an Oracle sort (ORDER BY) is likely to spill over to disk more quickly than Linux sort because of having less memory to work with.Even if an Oracle sort is, say 2.5 X slower (not a documented or benchmarked figure, simply providing a reasonable value needed for the problem), that is = approx. 20 minutes to create fs.USING THE ABOVE BACKGROUND, answer the following questionsPart A ) Assuming that Oracle sorted the original hash org table fh into sorted table fs , and that processing fs in PK order takes 30% of the time taken to create fs,calculate the ratio of (time taken by brute-force hashing of all rows of fh ) i.e., the ratio’s numerator to( time taken by periodic PK ordered processing = (creation time and processing time of fs)(The ratio value = how many times SLOWER brute-force hashing would be vs. using fs)Part B ) List as many fundamentally different reasons Oracle might cite for why it does not support CREATE TABLE for physical hash org on PK as hash key. Answer based on the factors covered in module P2 involved in comparing and choosing physical file org for a set of data.HandIn – Since a spool file is a text file, EDIT the spool results for Q# 0 through Q#3 by appending youranswers to Q#4 with your own additional hand-written text (a rare time in this course when a question answer is not generated via spooling) ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download