Virtualization Destructive Test Plan



Virtualization Destructive Test Plan For Vendor Compatibility testing with Oracle Database.

Document Version: 1.2

| | | |

|V1.0 |10/18/13 |Initial version |

|V1.1 |11/08/13 |Added stress test details. |

|V1.2 |02/02/14 |Revised for 12cR1 (12.1.0.1) |

Requestor:

Certification ID:

Vendor Compatibility Type:

|[pic] Oracle 11g Release 2 |[pic] Oracle 12c Release 1 |

Vendor Technology Stack:

|Platform(s): |Oracle VM |Version 6 Update 2 |

| |Server | |

|Virtualization Technology: |Live Migration |OVM 6.2 |

Table of Contents

1. Background 3

Objective 3

Scope 3

Stakeholders 3

Vendor Technical Skills 3

Tasks and Schedule 4

2. Test Environment Specifications 5

Before You Begin 5

Workload Driver 5

Hardware and System Components 6

Host and Storage Topologies 6

Software Reference Configuration 7

3. Test Evaluation Criteria 8

Preconditions 8

PASS/FAIL Criteria 8

Test Results Collection Process 9

Defect Tracking and Result Logging 9

4. Pre-test check list 10

5. Virtualization Compatibility Software level Test Details 10

Virtualization – Software level Test Categories 11

List of Vendor-Covered Internal Software level Tests 11

Oracle Virtualization– Software level Tests 11

Appendix A: Sample Collection Logs 15

Appendix B: Repeat tests 15

Appendix C: Checking for physical and logical corruption 15

Appendix D: Inject SQL rows 16

Appendix E: Session failure 16

Appendix F: Table and datafile deletion 17

Background

Objective

This Virtualization Destructive Test Plan (VDTP) defines a set of destructive test scenarios as defined by Oracle Server Technologies and supplemented by the technology vendor. The main objective is to certify the compatibility of a vendor-supplied virtualization server stack components with the Oracle Database, so customers and system integrators can be confident in deploying both vendor and Oracle technologies.

OCE, have traditionally maintained a consistent set of automated, regression-style tests. When such tests successfully PASS in a vendors server technology stack, Oracle will effectively certify or validate the virtualization compatibility.

Scope

• Virtualization Compatibility: The virtualization destructive test scenarios (under high system load) consist of a number of software and hardware failures, if applicable, against the Oracle database that Oracle and/or the vendor is to detect and recover from.

The following CMS ID's are covered within this document. E.g.

- OCE Support ID : OEL 6.2 with Oracle Single Instance v12.1.0.1

Stakeholders

|Name |Organization |Role |

| |(company/team name) |(Approver, Owner, Reviewer) |

|Put your name and phone number/email | | |

|here. | | |

|Vendor email address |Vendor |e.g. Contractor |

| | | |

| | | |

|CetSupp_ww@ |OCE Development & Support |Reviewer/Approver |

Vendor Technical Skills

• Oracle Database administration

• General virtualization system and network administration

• System testing / Quality assurance

Tasks and Schedule

|Seq |Task Name |Completion Date |

|0 |Virtualization Destructive Test Plan (VDTP) reviewed and customized |MM/DD/YYYY |

| |(Kick-off meeting: {Date}) | |

|1 |Virtualization hardware ready |MM/DD/YYYY |

|2 |Vendor Virtualization software installed |MM/DD/YYYY |

|3 |Oracle software installed |MM/DD/YYYY |

|4 |Basic infrastructure validation tests completed |MM/DD/YYYY |

|5 |Database workload schema and software ready |MM/DD/YYYY |

|6 |Complete test coverage |MM/DD/YYYY |

| | | |

|7 |Final deliverables: | |

| |Document Destructive Test Results | |

| |Publish/update Best Practices sheet (on Oracle Metalink and/or vendor site) – | |

| |based on results of destructive tests | |

| | | |

Test Environment Specifications

Before You Begin

▪ Look at all the test cases, if there are particular tests that cannot be accomplished, discuss these before starting. Pre-approvals can be obtained to waive certain conditions.

▪ Acquire the necessary tools to stress your CPU and IO loads. The tests require system statistics collection utility (such as sar, vmstat, iostat, top) running in the background while tests are executed.

▪ Look at logs collection requirement, to ensure that there are sufficient scripts to collect the results.

Workload Driver

• For the purpose of covering the virtual destructive tests Oracle requires the use of OAST

• The following table shows the list of all concurrent workloads available to vendors, to be executed against all RAC nodes, at the start of every hardware fault injection test:

|# |Workload |Workload Description |Workload Behavior |

|1 |OAST |OLTP |TPC-C workload |

| | | |Available from OTN website via CertSup_ww@. |

|2 |Cpu and memory hug|This is the external/non database | |

| | |workload and should not be used | |

| | |alone. It should only be used as a| |

| | |supplement if the database workload| |

| | |cannot achieve the desired | |

| | |constrains | |

Hardware and System Components

Oracle and the vendor will work together to define the minimum hardware specifications required to validate the virtual destructive test results. The following table can be used to customize the system requirements :

|Component |Description |Specifications |

|Host |Minimum of 2 CPUs |8 CPU entitlement, what are the virtual/ |

| | |logical CPUs ? |

|Memory and Swap |Minimum of 4Gb RAM, plus 4Gb memory swap. |32 GB memory, swap ? Detail the Guest Memory |

| | |size. |

|Storage hardware and |Storage disk pool/enclosure layout and volume management |SAN / NAS Technology ? |

|topology |software -- vendor’s choice. | |

Host and Storage Topologies

This section should be used to graphically lay out virtualization topologies to be tested.

NOTE : Outline the Test Environment layout.

Software Reference Configuration

Oracle and the vendor jointly define the minimum software versions and configuration schemes required for the RAC hosts. The following table can be used to customize the software requirements, and is based on the Oracle 11g RAC HA (high availability) Cluster Reference Configuration.

|Software Component |Description |Specifications |

|Operating system |Indicate 32 or 64 bit, platform, version number, update number |Oracle Enterprise Linux 6.2 OVM 3.0 |

|Oracle Database |Indicate the following configuration choices: |RDBMS 12.1.0.1 |

| |RDBMS Homes and trace files | |

| |(on local file systems, shared NAS mount points or CFS |FAST_START_MTTR_TARGET is left at |

| |directories). Indicate whether these file systems are on local |default setting, per prior agreement. |

| |disk,NAS, or SAN. | |

| | | |

| |Database, control and log files | |

| |(on ASM and raw disks, dedicated volume groups with striping | |

| |and mirroring provided externally, NAS mount points or CFS | |

| |directories) | |

| | | |

| |FAST_START_MTTR_TARGET=60 to expedite crash recovery | |

|Software patches |List any mandatory patch numbers (operating system, Oracle, | |

| |vendor) |Oracle RDBMS 12.1.0.1 |

Test Evaluation Criteria

Preconditions

1. All hardware, host-to-storage interconnects (interface cards, cables, switches) are factory tested and provisioned for shared cluster deployments.

2. All operating system components are installed with the most recent pre-release or production releases, kernel packages, patch levels, etc.

3. Software installation should succeed 100% and adhere to standard methods used by joint customers, i.e. no incomplete or undocumented steps will be allowed. Vendor should work closely with Oracle to ensure all install-related problems are thoroughly rectified.

4. No special Oracle or vendor software configuration settings will be permitted to bypass behavioral problems. The only exceptions shall be those where such configuration changes improve overall system stability and/or high availability. The Software Reference Configuration section should reflect such changes and the resulting benefits.

*************** PRE TEST REQUIREMENT *******************

Create the DB and instance resources.

Oracle’s recommendation is to use Oracle Universal Installer (OUI) and the Database Creation Assistant (DBCA), and Grid Control to create database and database resources.

PASS/FAIL Criteria

The following is NOT an exhaustive list of test PASS/FAIL criteria, but provides an idea of what Oracle usually looks for (e.g. in system trace files or descriptions of test outcomes):

1. Tests are marked FAIL when the following Oracle or vendor product functionality loss or unexpected outcome is recorded after any hardware fault injection:

2. Tests are marked PASS when none of the outcomes above is observed, or when any test outcome references high-priority (e.g. priority 2 or higher) bugs logged against Oracle or vendor software, and such bugs are fixed and subsequently verified.

3. Use a tool similar to “sar and vmstat to collect, report, or save system activity information” or “top/vmstat” or something else in the background during the destructive test to show you have CPU load >= 90%. The sar/top output is collected at every 30 seconds interval. If programs are needed for statistics collection, please contact OCE Support.

Test Results Collection Process

This section documents the REQUIRED method to collect test results for subsequent certification audits and verify test executions. By following the log collecting process and log structure, most of the tests can be verified in more efficient way (analyze by scripts). YOU WILL GET YOUR RESULTS FASTER.

|# |Description |

|1 |Clean up the following files: |

| | |

| |RDBMS background, core and user dump directories. (use sqlplus, SQL> ‘show parameters dump_dest’ then you will |

| |see the directory destination) |

| |/log/diag/rdbms//*/* |

| |/var/log/messages (or similar files) |

|2 |Run each destructive test, taking note of the test start time, test stop time and fault injection time. |

|3 |At the end of the test run, please tar up and compress the following traces as _.tar.gz  |

| |(e.g. OracleCorp_VDTP-STOR-01.tar.gz): |

| |System message files (e.g. /var/log/messages on Linux) |

| |All files under RDBMS background_dump_dest, core_dump_dest and user_dump_dest directories. |

| |Please see appendix C for log files layout. |

|4 |[If running into Oracle-related problems]  Run diagcollection.pl script (as root): |

| |# cd /tmp |

| |# script __diagcollection.out |

| |# echo "Ensure that ORACLE_HOME and ORACLE_BASE" |

| |# echo "are set to the right locations!" |

| |# $ORACLE_HOME/bin/diagcollection.pl --collect --all |

| |# exit |

|5 |tar up _.tar.gz and ftp to the following location (as anonymous user):  |

| | |

Defect Tracking and Result Logging

• Oracle product defects may be documented and tracked through Support TARs, or directly with Oracle Server Technologies. In either case, the defect becomes an entry in the Oracle Bug Database. Test traces and reproducible evidence will be uploaded to a corporate repository such as bugftp.

• {Vendor to provide product defect tracking and reporting tool to be used.}

Pre-test check list

|[Pre-0] |Send email to CertSupp_ww@ to ask | |Purpose is to certify that db|

|check the test spec version and obtain|for the latest version of test specification | |can be reliably deployed in |

|the permission to waive the N/A test |Brief state the purpose of certification. | |the virtualization |

|in advance. |List the test cases that are not applicable to| |environment |

| |your certification. | | |

|[Pre-1] | |Attach your output |This check may not apply if |

| |Steps: | |the RDBMS is running with |

|processes running in real time and |Use ‘ps –eacf | egrep ‘…..’’ to make sure |Explain your system dependent |default scheduling and |

|should be memory resident. |desired processes are running in RT mode. |command. |priority. |

| | | | |

| |Use system dependent command to check if | | |

| |processes are locked in memory. | | |

|[Pre-2] |Steps: |Attached your output |e.g 11.2.0.4, 12.1.0.1 |

|check the active software version and |sqlplus / as sysdba and check the DB version | | |

|others | |Make sure the active version is the| |

| | |one you tested. | |

| | | | |

Virtualization Compatibility Software level Test Details

This section provides a starting template for the definition of all software level Virtualization destructive test scenarios. To be considered for validation of the vendor’s virtual technology with Oracle Database, the vendor team must work jointly with Oracle OCE Team to enhance and customize the test cases defined herein.

New test cases may be driven by rationales such as:

• To augment test coverage to specific Virtualization or critical areas

• To address known virtualization-related issues in the field

The vendor test team must fill out the shaded columns for Oracle evaluation, i.e. Actual Test Outcome and PASS/FAIL checklist. Please provide as much documentation as needed, e.g. test exceptions, best practices, product defect numbers, patches or workarounds applied.

Virtualization – Software level Test Categories

Here are the proposed virtualization software level test categories defined for vendor validation with Oracle database :

[A] Vendor covered internal Software level Tests

[B] Oracle Virtualization – Software level Tests

List of Vendor-Covered Internal Software level Tests

E.g. include Stress tests with various workloads including database (Oracle, DB2), web services (websphere), file services (NFS, ftp)

E.g. Administrative tests including Tivoli, DLGR (prior and post virtualizationtechnology being tested, such as Live Migration)

Functional tests including regressions of Oracle VM, VIOS, and Hypervisor components, memory update stress tests.

Oracle Virtualization– Software level Tests

|[SW-VDST-1] |Preconditions: |Expected outcome: |OUTCOME |

| | | | |

|Run “IO and CPU ERP” workload for 24 |Run this test with parameters: |The LGR and Database should stay up|To include, back-to-back |

|hours continuously with DEDICATED LGR |db_block_checking=full db_block_checksum=full |and running. Verify: |active virtualization |

|CPU. | | |technology (e.g. Live |

| |Run OAST in a dedicated processor LGR with a |No database corruption occurred. |Migration), and Enterprise |

| |mimimun of 2 physical CPU entitlement and 4 |See Appendix C. |Manager running. |

| |logical CPU's. | | |

| | |No host reboot |Memory+CPU hog used to keep |

| |Virtualization technology, looping for | |the LGR continuously paging |

| |duration of workload. |No workload hung |and CPU consumption high |

| | | | |

| |Enterprise Manager running prior to and during|No ORA-600 or data corruption |After test completion test |

| |workload. | |the database for any physical|

| | |No ORA-3113 or instance death. |logical corruption. |

| |Run OAST workload for 24 hours. | | |

| |Run artificial memory hug, non database |Attach the db logs See Appendix A. | |

| |workload to supplement the database workload. | | |

|[SW-VDST-2] |Preconditions: |Expected outcome: |OUTCOME |

| | | | |

|Run “IO and CPU ERP” workload for 24 |Run this test with parameters: |The LGR and Database should stay up|Can be same remarks as |

|hours continuously with SHARED |db_block_checking=full db_block_checksum=full |and running. Verify: |SW-VDST-1. |

|(virtual) LGR CPU. | | | |

| |Run OAST in shared processor LGR with a | | |

| |minimum of 2 physical CPU entitlement, 4 |No database corruption occurred. | |

| |logical CPU and 4 virtual CPU's. |See Appendix C. | |

| | | | |

| |Virtualization technology, looping for |No host reboot | |

| |duration of workload. | | |

| | |No workload hung | |

| |Enterprise Manager running prior to and during| | |

| |workload. |No ORA-600 or data corruption | |

| | | | |

| |Run OAST workload for 24 hours. |No ORA-3113 or instance death. | |

| |Run artificial memory hug, non database | | |

| |workload to supplement the database workload. |Attach the db logs See Appendix A. | |

|[SW-VDST-3] |Preconditions: |Expected outcome: |OUTCOME |

| | | | |

|Instance Failure. |Run this test with parameters: |Verify automatic database recovery |Kill PMON to cause the |

| |db_block_checking=full db_block_checksum=full |at instance startup. |database to fail. |

| | | | |

| |Virtualization technology, using | |The database should be |

| |synchronization hooks rather than asynchronous|No database corruption occurred. |restarted, in sync with a |

| |back-to-back Virtualization technology loops, |See Appendix C. |Virtualization technology, to|

| |to ensure the recovery process overlaps with a|Attach the db logs See Appendix A. |perform automatic recovery. |

| |suspend/resume. | | |

| | | |Examine logs to verify that a|

| |Enterprise Manager running prior to and during| |Virtualization technology |

| |workload. | |occurred during database |

| | | |recovery. |

| |Run OAST workload | | |

| |Run artificial memory hug, non database | |Upon completion of recovery, |

| |workload to supplement the database workload | |the database should be |

| |Let the workloads run stable for 20 minutes | |verified to not contain the 1|

| |Inject SQL rows See Appendix D. | |million rows of uncommitted |

| |Find pid of PMON process for this instance | |data. |

| |‘kill –9 ’. | | |

| |Startup database. | |The database should also be |

| | | |verified to have no |

| | | |corruption. |

|[SW-VDST-4] |Preconditions: |Expected outcome: |OUTCOME |

| |As above for [SW-VDST-3] with ‘Shutdown Abort’| | |

|Instance Failure. |as opposed to kill -9. |Verify automatic database recovery |Could be the same remarks as |

| | |at instance startup. |SW-VDST-3. |

| | | | |

| | | |Shutdown abort should be used|

| | |No database corruption occurred. |in place of killing PMON. |

| | |See Appendix C. | |

| | |Attach the db logs See Appendix A. | |

|[SW-VDST-5] |Preconditions: |Expected outcome: |OUTCOME |

| | | | |

|Session Failure. |Run this test with parameters: |Verify automatic SMON transaction | |

| |db_block_checking=full db_block_checksum=full |rollback. |Killing of the connections |

| | | |should be synchronized with a|

| |Virtualization technology, using | |Virtualization technology. |

| |synchronization hooks rather than asynchronous|Query v$session to see that the | |

| |back-to-back Virtualization technology loops, |killed sessions are removed. |After all the row-injecting |

| |to ensure the recovery process overlaps with a|No database corruption occurred. |connections were killed, the |

| |suspend/resume. |See Appendix C. |sessions and locks should be |

| | |Attach the db logs See Appendix A. |verified to be cleaned up |

| |Enterprise Manager running prior to and during| |automatically. |

| |workload. | | |

| | | |Rows in the table should be |

| |Run OAST workload | |counted to ensure each |

| |Run artificial memory hug, non database | |connection inserted a |

| |workload to supplement the database workload. | |multiple of 100 rows. |

| |Let the workloads run stable for 20 minutes | | |

| |Inject SQL rows with multiple connections See | | |

| |Appendix E. | | |

| |Find pid of session processes ‘kill –9 ’. | | |

|[SW-VDST-6] |Preconditions: |Expected outcome: |OUTCOME |

| | | | |

|Data loss. |Run this test with parameters: |Use database flashback recovery to |In Appendix F, ie. “flashback|

| |db_block_checking=full db_block_checksum=full |restore the dropped table. |table to before drop” can be |

| | | |changed to “flashback table |

| |Virtualization technology, using |No database corruption occurred. |to scn” |

| |synchronization hooks rather than asynchronous|See Appendix C. | |

| |back-to-back Virtualization technology loops, |Attach the db logs See Appendix A. |Insert 333333 rows with id=1.|

| |to ensure the recovery process overlaps with a| |Commit and record the current|

| |suspend/resume. | |SCN. Insert 444444 rows with|

| | | |id=2. Delete all rows with |

| |Enterprise Manager running prior to and during| |id=1. Commit, and then |

| |workload. | |flashback table to previously|

| | | |recorded SCN. |

| |Using the existing OAST database from previous| | |

| |test drop the recently created table. See | |The flashback operation was |

| |Appendix F. | |synchronized to overlap with |

| | | |a virtualization technology. |

| | | | |

| | | |After the flashback, the |

| | | |table should be checked to |

| | | |contain only the 333333 rows |

| | | |with id=1. |

|[SW-VDST-7] |Preconditions: |Expected outcome: |OUTCOME |

| | | | |

|Datafile loss. |Run this test with parameters: |Recover the lost datafile and | |

| |db_block_checking=full db_block_checksum=full |verify: |Shutdown database, backup a |

| | | |datafile, startup, create a |

| |Virtualization technology, using |No database corruption occurred. |table in the tablespace that |

| |synchronization hooks rather than asynchronous|See Appendix C. |datafile is a part of, insert|

| |back-to-back Virtualization technology loops, |Attach the db logs See Appendix A. |1000000 rows into the table. |

| |to ensure the recovery process overlaps with a|. |shutdown database, remove |

| |suspend/resume. | |datafile and put backup file |

| | | |in place, start database to |

| |Enterprise Manager running prior to and during| |do media recovery. |

| |workload. | | |

| | | |The media recovery should |

| |Using the existing OAST database from the | |synchronize to overlap with a|

| |previous test, create a cold backup of the | |Virtualization technology. |

| |datafile holding the recently created table of| | |

| |inserted rows. | |After recovery, the table |

| | | |should be verified to contain|

| |Physically remove the datafile holding the | |the 100000 rows. |

| |table. See Appendix F. | | |

Appendix A: Sample Collection Logs

The various trace data to be collected for all timing and test outcome evaluation can be found in the following directory locations:

|Trace File |11gR1 Trace Location |Description |

|Kernel syslog file|Vendor-specific, e.g. /var/log/syslog/* |OS events, including node reboot times |

|RDBMS logs |Depends on your setting, do a “show parameters |RDBMS trace files, bdump/ cdump/ & udump/ |

| |dump_dest” under sqlplus prompt | |

Appendix B: Repeat tests

Please ensure the test is repeated many times. Because of the timing relationship with the suspensions (e.g. Live Migration Virtualization technology), one cannot guarantee that the kills occur at the right time to reveal problems.

Appendix C: Checking for physical and logical corruption

Checking for physical and logical corruption in the datafiles will require use of RMAN (BACKUP VALIDATE CHECK LOGICAL DATABASE) and the Database Verify Utility (dbv).

Using the Database Verify utility (dbv) to Oracle block corruptions you can use a quick script similar to:

#!/bin/bash

BLOCKSIZE=$1

DATADIR=$2

cd $DATADIR

ls -1 *.dbf | while read FILE

do

dbv file=$FILE blocksize=$BLOCKSIZE

done

Which can be invoked as:

./dbv.sh 8192 $ORACLE_HOME/oradata/$ORACLE_SID >> dbv.log 2>&1

Using RMAN to check the database for block and logical corruptions use

BACKUP VALIDATE CHECK LOGICAL DATABASE.

Use 4 channels to speed up the process with the following command file:

run {

allocate channel c1 device type disk ;

allocate channel c2 device type disk ;

allocate channel c3 device type disk ;

allocate channel c4 device type disk ;

backup validate check logical database;

}

Invoke the RMAN command file and have the rman output go to a logfile, as:

rman target / cmdfile rman_validate.cmd log rman_validate.log 2>&1 &

As RMAN proceeds, check for any corruptions via:

select count(*) from v$database_block_corruption;

Appendix D: Inject SQL rows

Synchronze Instance and Session failures with the Virtualization technology being tested, e.g. Live Migration, using the "hook" mechanism, if possible, prior to the LGR suspend/quiesce phase.

For Instance failure insert 1 million rows uncomitted, then when informed by the hook, kill the instance.

The scenario should look similar to the following:

1. Prep time (set to 13 seconds) prior to the LGR suspending

2. LGR notification via a hook to note that the Virtualization technology is ready

3. Inject SQL rows uncommitted.

4. Crash DB, via pmon kill

5. Initiate the Virtualization technology once trigger is confirmed

6. Restart DB and verify automatic recovery

With the above method we can at least guarantee that the Virtualization technology, e.g. Live Migration, is in progress when invoking each of the tests.

Appendix E: Session failure

Synchronze Session failures with a Virtualization technology using the "hook" mechanism, if possible, prior to the LGR suspend/quiesce phase.

# Enable multiple database writer processes

db_writer_processes=10

Create multiple worker sessions, similar to the following:

Every worker loops forever doing

insert 20 records

sleep 1 second

commit once for each 100 records

Start 1000 workers at 0.1 second intervals

At start of preptime, start killing the workers 1 by 1 with no wait time.

Preptime should be set in a way that the guest suspend/quiesce phase happens when roughly 500 workers have been killed.

Loop until all entries in v$session and v$lock owned by the workers are cleaned up.

Count number of records inserted by each worker to confirm that each worker inserted a multiple of 100 records and the sequence numbers start from 1 with no skip.

Appendix F: Table and datafile deletion

Synchronze table and datafile deletions with a Virtualization technology using the "hook" mechanism prior to the LGR suspend phase.

Using the existing OAST database from previous test drop the newly created table. Note that additional data may need to be added to existing table if recovery is too brief to overlap with a suspend phase.

Use database flashback recovery to restore the dropped table.

In the case of datafile recovery, recover the datafile prior to the LGR suspend. Note that additional data may need to be added to existing datafile if recovery is too brief to overlap with a suspend phase.

-----------------------

PASS/FAIL Criteria

General:

• Internal Oracle errors reported by Oracle RDBMS

• Process hangs or memory leaks

• OS kernel crashes

• Entire host deaths

Storage:

• File (logical or physical) corruptions

• Host I/O hangs



................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download