The Oracles-Based Software Testing:problems and solutions



The Oracles-Based Software Testing: problems and solutions

Ljubomir Lazić, SIEMENS d.o.o, Radoja Dakića 7, 11070 Beograd, Serbia&Montenegro,

Dušan Velašević, Electrical Engineering Faculty, Belgrade University, Bulevar kralja Aleksandra 73,11070 Beograd, Serbia&Montenegro

Nikos Mastorakis, Military Institutions of University Education, Hellenic Naval Academy, Terma Hatzikyriakou, 18539, Piraeus, Greece

Abstract: - Software test (manual, automated or combined) is often a difficult and complex process. The most familiar aspects of test automation are organizing and running of test cases and capturing and verifying test results. When we design a test we identify what needs to be verified. A set of expected result values are needed for each test in order to check the actual results. Generation of expected results is often done using a mechanism called a test oracle. For the most part, however, the effective use of test oracles has neglected, even though they are a critical component of software testing process. All software-testing methods depend on the availability of an oracle, that is, some method for checking whether the system under test has behaved correctly on a particular execution. An ideal oracle would provide an unerring pass/fail judgment for any possible program execution, judged against a natural specification of intended behavior. Practical approaches must make compromises to balance trade-offs and provide useful capabilities. This paper is not encyclopedic, but discusses representative examples of the main approaches and tactics (mix of theoretical and technical challenges) for solving common problems. It describes the purpose and use of oracles in automated software verification and validation. Several relevant characteristics of oracles are included with the advantages, disadvantages, and implications for test automation.

Key-Words:- software testing, test oracles, test automation strategy, formal methods, validation and verification, test evaluation.

1. Introduction

The typical software testing process is a human intensive activity and as such, it s usually unproductive, error prone, and often inadequately done. Major productivity enhancements can be achieved by automating technique through tool development and use. Errors made in testing activities can be reduced through formalizing the methods used. Defining software testing process secures more accurate, more complete and more consistent testing than do human-intensive, ad hoc testing processes. Moreover, accurately defining the testing process renders it repeatable, measurable, an hence improvable. Test case execution of Software/System Under Test (SUT) is in nature measurable experiment that is decomposed in a test selection step, in which the test cases that will validate or not the requirement are generated, a test execution step, in which the test cases are executed and results of the execution collected, and a test satisfaction step, in which the results obtained during the test execution phase are compared to the expected results i.e. whether software behaves according to specification. This last step is commonly performed through the help of an oracle i.e. reference system [1]. The results of this comparison give the result of the requirement, ‘yes’ if it is validated, ‘no’ if it is not, and ‘undecided’ if the oracle is unable to analyze the result, or if the test execution did not return a result. If the answer is ‘no’, then the test set has found errors in the program —the goal of testing. The testing process is typically systematic in test data selection and test execution. For the most part, however, the effective use of test oracles has been neglected, even though they are a critical component of an effective method to design in software quality for testability that must include the concurrent development of test oracles in testing process [2,3]. Oracle development can represent significant effort which may increase design and implementation cost, however, overall testing and maintenance costs should be reduced. Oracle development must therefore be carefully integrated into the software development/testing life cycle [2,3]. Oracles must be designed, verified and validated for unit testing, through subsystems testing (integration testing) up to system testing in a systematic, disciplined and controlled manner. Test oracles prescribe acceptable behavior for test execution. In the absence of judging test results with oracles i.e. reference system, testing does not achieve its goal of revealing failures or assuring correct behavior in a practical manner – manual result checking is neither reliable nor cost-effective. Our focus in this paper is on the Test Oracle (TO) issues i.e. problems and their solutions.

The test setup creates the conditions necessary for some errors to manifest (assuming the errors are there). The test run exercise takes the SUT through the suspect code. Then we can confirm whether there are errors in the software. The test setup is often neglected, assuming that a test will work from whatever state the SUT, data, and system are in. Yet, it is obvious that software will do things differently based on the starting state of the SUT and system. Is the SUT running? Is the account we are accessing already in the database, or do we need to add it? Is an error dialog currently on the screen? What permissions does the current user have? Are all the necessary files on the system? What is the current directory? Testers note and correct for all such conditions when manually testing. But these are potentially major issues for automated tests. Although the test designer typically is conscious only of the values directly given to the SUT, the SUT behavior is influenced by its data, program state, and the configuration of the system environment it runs in. The SUT may behave differently based on the computer system environment; operating system version, size of memory, speed of the processors, network traffic, what other software is installed, etc. Even more difficult to analyze or manage is the physical environment – We’ve worked on errors due to temperature, magnetic fields, electrostatic discharge, poor quality electric grounding, and other physical environmental factors that caused software errors. These errors may be unusual, but they occur, so we need to keep them in mind when automating tests.

Capture and comparison of results is one key to successful software testing. For manual tests this often consists of viewing results to determine if they are anything like what we might expect. It is more complicated with automated tests, as each automated test case provides a set of inputs to the SUT and compares the returned results against what is expected. The term oracle may be used to mean several things in testing—the process of generating expected results, the expected results themselves, or the answer to whether or not the actual results are what we expected. In this article, the word oracle is used to mean an alternate program or mechanism used for generating expected results. All software testing methods depend on the availability of an oracle, that is, some method for checking whether the system under test has behaved correctly on a particular execution. In much of the research literature on software test case generation or test set adequacy, the availability of oracles is either explicitly or tacitly assumed, but applicable oracles are not described. In the current industrial practice of software testing, the oracle is often a human being. A person running a manual test is also able to perceive unexpected behaviors for which an automated verification doesn’t check. This is a powerful advantage for manual tests; a person may notice a flicker on the screen, an overly long pause before a program continues, a change in the pattern of clicks in a disk drive, or any of dozens of other clues that an automated test would miss. The authors has seen automated tests “pass” and then crash the system, a device, or the SUT immediately afterwards. Although not every person might notice these things and any one person might miss them sometimes, an automated test only verifies those things it was originally programmed to check. If an automated test isn’t written to check timing, it can never report a time delay. There are at list four domains of software engineering where uncertainty is evident: Uncertainty in requirements analysis, Uncertainty in the transition from system requirements to design and code, Uncertainty in software re-engineering and Uncertainty in software reuse [4]. Software testing, like other development activities, is human intensive and thus introduces uncertainties and obeys Maxim of Uncertainty in Software Engineering-MUSE [4]. In particular, many testing activities, such as test result checking, are highly routine and repetitious and thus are likely to be error-prone if done manually, which introduces additional uncertainty. Care must be taken during test planning to decide on the method of results comparison. Oracles are required for verification and the nature of an oracle depends on several factors under the control of the test designer and automation architect. Different oracles may be used for a single automated test and a single oracle may serve many test cases. If test results are to be analyzed from satisfaction point of view, three types of oracle exist: a) Oracle that gives exact outcome for every test case, b) Oracle that provide range of program outcome and c) Oracle that can not provide program outcome for some test cases. Relatively little has been written about the oracle problem. Howden suggested the term ”Test Oracle” [5]. Weyuker discussed the limits of oracles [1,6]. Development of specification-based oracles is considered in [7]. Barbey’s strategy for object-oriented testing uses a formal specification that also serves as an oracle [8]. This approach draws on similar techniques developed for verification of abstract data types (ADT), which also use the ADT specification as an oracle [9,10]. A survey of oracles appears in [11]. Poston lists four kinds of oracles: range checking (similar to assertions), manual projection (pre-specification), simulation and reference testing [12]. Reference testing is the process of judging the actual results and then saving acceptable results to support regression testing. Hoffman’s oracle taxonomy and discussion of approximation, statistical, embedded (built-in check) oracles provides practical insights of using test oracles in software test automation [13-15].

The paper begins with an outline of formal testing theory. In section 3 the test oracles characteristics and types are presented. In section 4, some test oracles problems and in section 5 adequate solutions are stressed.

2. The oracles-based software testing: formal theory

Our approach for testing software from formal specifications relies on a solid theoretical framework. It is a generalization of the approach reported by G. Bernot, M. C. Gaudel, and B. Marre i.e. BGM method [9] and an adaptation to object-oriented systems of the BGM method presented in [8,10]. The BGM method has been developed for testing data types by using formal specifications. The formal testing method is an approach to detect errors in a program by validating its functionalities without analyzing the details of its code, but by comparing it against a specification. The goal is to answer the question:

"Does a program satisfy its formal specification?"

or, in accordance with the goal of testing, to find out if a program does not satisfy its specification. The formal testing process is usually decomposed into the following three phases:

Phase 1 Test selection: the test cases that express the properties of the specification are generated.

Phase 2 Test execution: the test cases are executed and results of the execution collected.

Phase 3 Test satisfaction: the results obtained during the test execution phase are compared to the expected results.

This last phase is performed with the help of an oracle. Our oracle is a program based on external observation of the behavior of the tested program. This section presents the whole test process of the formal testing theory, starting from the test foundation and then focusing on the test selection and test satisfaction phases i.e. oracles-based testing. Throughout this section we use the following notations:

• SPEC: class of all specifications written in the specification language considered,

• PROG: class of all programs expressed in the language used for the implementation,

• TEST: class of all test sets that can be written,

• |= : satisfaction relationship on PROG × SPEC, expressing the validity of the program with respect to the specification,

• |=O : satisfaction relationship on PROG × TEST, deciding if the test cases are successful or not

for the program under test. |=O is the oracle satisfaction relationship.

However, the oracle |=O can only be constructed for a program P and a test set TSP if TSP is practicable, i.e.:

• TSP is practicable, i.e. it has a “reasonable” finite size, which is not the case of the exhaustive test set for arbitrary specifications, and

• TSP is decidable, i.e. it is possible to decide whether the program is acceptable or not for the given specification.

• TSP,H is a limited test set provided by sampling using hypothesis H that state under which conditions the satisfaction of the specification is ensured assumption that the program reacts in the same way as for TSP test data. i.e. it is possible to decide whether the program is acceptable or not for the given specification.

2.1 Test foundation

The strategy used to answer the question "Does a program satisfy its formal specification?" is to select from the specification the services required to the program. For each service, the specification allows the selection of a number of scenarios. A scenario is called a test case and the set of all test cases makes up what we call the test set. The idea of the test selection phase is to derive, from a specification SP ∈ SPEC, a test set TSP ⊆TEST allowing to reject any incorrect program P∈PROG (an incorrect program contains errors with respect to its specification) and to accept any correct program P ∈ PROG (a correct program does not contain errors with respect to its specification).

Definition 1. Satisfaction relationship |= and |= O

Let P ∈ PROG be a program, SP ∈ SPEC and TSP ⊆TEST specification and O test oracle. Let TF be a System/State Transition or input domain Transformation, Function, Formulae and Outcome be the expected program behavior, then TF(P) = Outcome1 is a TF of the system representing the semantics of P and TF(SP) = Outcome2 is a TF of the system representing the semantics of SP. Assuming there is a one-to-one morphism between the signatures of P and SP, the satisfaction relationship |= ⊆ PROG × SPEC is such that:

(P |= SP) ⇔ ( Outcome1= Outcome2).

The satisfaction relationship |= O ⊆ PROG × TEST is such that:

(P |= O TSP,H) ⇔ (∀ 〈TF, Outcome 〉 ∈ TSP,H

((TF(P) |= O TF and Outcome = true) or (TF (P) |≠ O TF and Outcome = false))). ⋄

The rejection of any incorrect program is expressed by the following relation:

(P |≠ SP) ⇒ (P |≠O TSP) (1)

i.e. an incorrect implementation P of the specification SP implies the failure of the test set TSP executed on a program P. A test set satisfying (1) is said to be valid.

The acceptation of any correct program is expressed by the following relation:

(P |= SP) ⇒ (P |= O TSP) (2)

i.e. a correct implementation P of the specification SP implies the success of the test set TSP executed on a program P. A test set satisfying (2) is said to be unbiased.

Consequently, the aim of the test selection phase is to find a test set TSP such that:

(P |= SP)⇔(P |= O TSP) (3)

i.e. the program P satisfies its specification SP if and only if it satisfies the test set TSP.

As noted by Weyuker and Ostrand [16], this equation is similar to a proof of correctness, and it can only be correct if TSP includes a complete coverage of SP, i.e. it contains enough test cases to cover all possible behaviors expressed by SP. The only test set that will fit in this equation is the exhaustive test set, because there is no way to guarantee the validity and unbiasedness of a non-exhaustive test set.

A valid and unbiased test set TSP can be used to test a program P only if TSP has a “acceptable” finite or even “optimal” size. Limiting the size of test sets is performed by sampling: a trade-off must be found between size and accuracy. This trade-off is formally expressed by a set of reduction hypotheses HR applicable to the program P. The hypotheses HR state under which conditions the satisfaction of the specification is ensured by the satisfaction of the test set by making the assumption that the program reacts in the same way for some test data. These hypotheses correspond to generalizations of the behavior of the program.

Moreover, an oracle O and its satisfaction relation |= O can only be constructed for a program P and a test set TSP if TSP is decidable, i.e. the oracle is always able to compare all the necessary elements to determine the success or failure of any test case. This problem is solved using the oracle hypotheses, HO, which state that the oracle knows whether a test case is decided or not, and that a test case is either observable, or else the oracle contains criteria to augment the test case to make it observable.

Assuming that hypotheses H = HO ∪ HR have been made about the program, the test equation (3) becomes:

(P satisfies H ) ⇒ (P |= SP)⇔ (P |= O TSP,H) (4)

The equivalence relationship ⇔ is satisfied assuming some hypotheses about the program and that the test set TSP,H is valid and unbiased. A nice property of equation (4) is that the quality of the test set TSP,H is only dependent on the quality of the hypotheses. The drawback however is that proving that a program satisfies the hypotheses is not trivial.

2.2 Test selection

Assuming that we have an oracle O that ensures the observability of the system with the oracle hypotheses HO, the first task of the test procedure consists of selecting, from the specification, a test set that allows the exhaustive validation of each service required from the system. This is theoretically achieved by selecting an exhaustive test set, which contains all the test that are required by the specification. However the exhaustive test set selected is generally infinite, and it is necessary to apply a number of reduction hypotheses HR to the behavior of the program to obtain a finite test set of “acceptable” size. Note that the test procedure itself is successful if the test set helped finding as many errors as possible, i.e. if the test set is unsuccessful. The efficiency of test set, i.e. its aptitude (capability) to detect errors, must be aligned to some quality criteria that depends on time, resources and budget constraints i.e. software testing constraints (TC). Therefore, we proceed by successive reductions of the number of tests (see Fig. 1). Thus, when the test set is successful, the program is correct on condition that it satisfies the oracle and the reduction hypotheses. The idea of the test set selection procedure is to find TSP,H such that:

(P satisfies H) [pic] (P |=O TSP,H [pic]P |= SP).

The equivalence relationship in is satisfied when the test set TSP,H is pertinent, i.e. valid (any incorrect program is discarded) and unbiased (it rejects no correct program). According to the formal testing process, the goal of the test selection phase is to find a test set TSP,H that can be submitted to an oracle and which is pertinent, i.e. valid and unbiased.

Definition 2. Pertinent test set TSP,H

Given a set of hypotheses H, P ∈ PROG and a specification SP ∈ SPEC, the test set TSP,H ⊆ TEST is pertinent if and only if:

[pic]

Fig 1. Formal testing process

• TSP,H is valid: (P satisfies H) ⇒ (P |= O TSP,H ⇒ P |= SP).

• TSP,H is unbiased: (P satisfies H) ⇒ (P |= SP ⇒ P |= O TSP,H ). ⋄

The test selection phase consists of selecting, from a possibly infinite set of TF corresponding to all the specification properties required to the program, a finite set of TF, which is sufficient, under some hypotheses, to state the preservation of these properties. The infinite set of TF is called the exhaustive test set. This exhaustive test set is then reduced to a final test set by applying reduction hypotheses to the program. The required property of the final test set TSP,H is to be practicable.

2.3 Test satisfaction

Once a test set has been selected, it is used during the execution of the program under test. Then the results collected from this execution must be analyzed. For this purpose, it is necessary to have a decision procedure to verify that an implementation satisfies a test set. This process is called oracle-based test process model as depicted in Fig. 2. The oracle is an equivalent mechanism that must decide the success or failure of every test case, i.e. whether the evaluation of test cases is satisfied or if test cases reveal errors.

[pic]

Fig 2. Testing model with Oracle

Definition 3. Oracle

The oracle O = 〈 |= O, DomO 〉 is a partial decision predicate of a TF in a program P ∈ PROG. For each test case t∈TEST belonging to the oracle domain DomO, the satisfaction relationship |= O on

PROG × TEST allows the oracle to decide:

• If t is successful in P (P |= O t).

• If the answer is inconclusive ( t is non-observable). ⋄

The oracle is constituted of a collection of equivalence relationships that compare similar elements of the scenario derived from the specification to the program under test; these elements are said to be observable. The problem is that the oracle is not always able to compare all the necessary elements to determinate the success or failure of a test; these elements are said to be non-observable. This problem is solved using the oracle hypotheses HO which are part of the possible hypotheses and collect all power limiting constraints imposed by the realization of the oracle:

Definition 4. Oracle Hypotheses

The oracle hypotheses HO are defined as follows:

• When a test case t∈ TEST is observable ( t∈ DomO) for a program P, the oracle knows how to decide the success or failure of t:

(P satisfies HO) ⇒ ((P |= O t) ∨ (¬ (P |= O t))).

• When a test case t is non-observable for a program P ( t∈ DomO), the oracle has a set C of criteria ci allowing to observe t:

(P satisfy HO ∧P |= O (ci ∈ C ∧ ci ( t))) ⇒ ( P |= O t). ⋄

The first hypothesis stipulates that for any observable test case, the oracle is able to determine whether the test execution yields yes or no, i.e. that no test case execution remains inconclusive. The second hypothesis stipulates that for any non-observable test case, there are criteria to transform it into an observable test case. Since the oracle cannot handle all possible TF that are proposed as test cases, oracle hypotheses must be taken into account to limit the test selection to decidable test formulas. Thus, it seems rational to put the oracle hypotheses HO at the beginning of the test selection phase of the test process. Moreover, a method call can lead to non-deterministic behaviors. Thus, another necessary oracle hypothesis is the assumption that this non-determinism is bounded and fair. In this way, non-deterministic mechanisms can be tested by a limited number of applications of the same test case. Several observations can be made from the model. Different types of oracles are needed for different types of software and environments. The domain, range, and form of input and output data varies substantially between programs. Most software has multiple forms of inputs and results so several oracles may be needed for a single software program. For example, a program’s direct results may include computed functions, screen navigations, and asynchronous event handling. Several oracles may need to work together to model the interactions of common input values. Although an oracle may be excellent at predicting certain results, only the SUT running in the target environment will process all of the inputs and provide all of the results. As with all testing tasks, it is important to understand the trade-offs between the various risks and the costs involved in testing. It is very easy to get lost in the wonderful capabilities of automated tests and lose sight of the important goal of releasing high quality software with “acceptable” size of test suite i.e. TSP,H that satisfies software testing constraints (TC).

Definition 5. Practicable Test Context i.e. test suite

Given a specification SP ∈ SPEC, a test context (H, TSP,H )O is defined by a set of hypotheses H about a program under test P ∈ PROG, a test set TSP,H ⊆ TEST and an oracle O ∈ PROG. (H, TSP,H )O is practicable if:

• TSP,H is pertinent and has an “acceptable” finite size.

• O = 〈 |= O, DomO 〉 is decidable (i.e. it is defined for each element of TSP,H (TSP,H ⊆ DomO) .

In a practicable test context (H, TSP,H )O, the test set TSP,H is said to be practicable. ⋄

To simplify, (H, TSP,H )O is noted (H, T)O in the rest of the survey. The selection of a pertinent test set T of “acceptable” size is performed by successive refinements of an initial test context (H0, T0)O, which has a pertinent test set T0 of “unacceptable” size, until the obtaining of a practicable test context (H, T)O. This refinement of the context (H i, T i )O into (H j, T j )O induces a pre-order between contexts:

(H i, T i )O ≤ (H j, T j )O

At each step, the pre-order refinement context (H i, T i )O ≤ (H j, T j )O is such that:

• The hypotheses H j are stronger than the hypotheses H i : H j ⇒ H i.

• The test set T j SP, Hj is included in the test set T i SP, Hi : T j SP, Hj ⊆ T i SP, Hi.

• If P satisfies H j then (H j, T j )O does not detect more errors than (H i, T i )O:

(P satisfies H j) ⇒ (P |= O T i SP, Hi ⇒ P |= O T j SP, Hj).

• If P satisfies H j then (H i, T i )O detects at least as many errors as (H i, T i )O:

(P satisfies H j)⇒(P |= O T j SP, Hj ⇒ P |= O T i SP, Hi).

3. The test oracle characteristics and types

The Oracle can be a program specification as described in formal test theory (section 2), a table of examples, or simply the programmer/tester’s knowledge of how a program should operate. It is a mechanism to evaluate (test satisfaction step) the actual results of a test case as pass (satisfied) or no pass (not satisfied). This evaluation requires a result generator to produce the expected results for test case and a comparator to check the actual results against the expected results. It may be manual, automated, or partially automated. Because software engineering processes and products include elements of human participants (e.g., designers, testers), information (e.g., design diagrams, test results generation and comparation), and tasks (e.g. “design an object-oriented system model”, or “execute regression test suite”), uncertainty occurs in most if not all of these elements. Software testing, like other development activities, is human intensive and thus introduces uncertainties and obeys Maxim of Uncertainty in Software Engineering-MUSE [16]. In particular, many testing activities, such as test result checking, are highly routine and repetitious and thus are likely to be error-prone if done manually, which introduces additional uncertainty. Humans carry out test planning activities at an early stage of development, thereby introducing uncertainties into the resulting test plan. Also, test plans are likely to reflect uncertainties that are, as described above, inherent in software artifacts and activities.

As we described in previous section that test enactment includes test selection, test execution, and test result evaluation (satisfaction). Test enactment is inherently uncertain, since only exhaustive testing in an ideal environment guarantees absolute confidence in the testing process and its results. This ideal testing scenario is infeasible for all but the most trivial software systems. Instead, multiple factors exist, discussed next that introduce uncertainties to test enactment activities:

▪ Test selection is the activity of choosing a finite set of elements (e.g., requirements, functions, paths, data) to be tested out of a typically infinite number of elements. The fact that only a finite subset of elements is selected inevitably introduces a degree of uncertainty regarding whether all defects in the system can be detected.

▪ Test execution involves actual execution of system code on some input data. Test execution may still include uncertainties, however, as follows: the system under test may be executing on a host environment that is different from the target execution environment, which in turn introduces uncertainty. Furthermore, observation may affect testing accuracy with respect to timing, synchronization, and other dynamic issues. Finally, test executions may not accurately reflect the operational profiles of real users or real usage scenarios.

▪ Test result evaluation is likely to be error-prone, inexact, and uncertain. Test result checking is afforded by means of a Test Oracle that is used for validating results against stated specifications.

Test oracles can be classified into five categories [7], offering different degrees of confidence. Specification-based oracles instill the highest confidence, but still include uncertainty stemming from discrepancies between the specification and customer's informal needs and expectations. The completely trusted i.e. with no uncertainty test oracle is Perfect Oracle or Ideal Oracle – P | I O. P | I O would be behaviorally equivalent to the implementation SUT. In effect, it would be a defect-free version of the SUT. It would accept every input specified for the SUT and would always produce a correct result. Developing a P | I O is therefore at least as difficult as solving the original design problem and imposes the additional constraint of correctness. If a P | I O was available and portable to the target environment, we could dispense with the SUT and fulfill the application requirements with the P | I O. In reality we test software under TC constraints that require property of the final test set TSP, H to be practicable (see section 2.3) so Test Oracle also would be practicable as defined by next definition:

Definition 6. Practicable Test Oracle

Given a specification SP ∈ SPEC, a test context (H, TSP, H)O is defined by a set of hypotheses H about a program under test P ∈ PROG, a test set TSP, H ⊆ TEST and an oracle OP ⊆ O ∈ PROG is practicable if:

Under TC there is OP that satisfy practicable test context (H, TSP, H)OP ⊆ (H, TSP, H)O. ⋄

Test validity (satisfaction) is a result of afore mentioned several factors. A buggy oracle can cause correct results to be evaluated as no pass (spurious no pass). Time will probably be wasted trying to debug a nonexistent problem and the resulting “fix” may cause further distortions. Or, a test with incorrect actual results can be evaluated as pass (spurious pass). This kind of error typically goes undetected until a user identifies the supposedly correct output as incorrect. So test validity can be characterized as valid pass, valid no pass, spurious pass, spurious no pass, coincidental pass, and coincidental no pass [11, see examples on page 922]. The design, implementation, and output of an executable oracle should be verified. As noted earlier, the problem of oracle verification is at least as hard as application verification. From a theoretical perspective, we can cannot be completely sure that a valid pass or valid no pass has obtained without also proving the correctness of both the SUT and the oracle. From a practical perspective, the traditio nal effort required for complete certainty would be prohibitively expensive even when technically feasible. The risk of spurious or coincidental results can be reduced to an acceptable level by verifying the oracle and meeting the exit criteria of the appropriate test design pattern.

1. The test oracles characteristics

When we think of software testing and the test results, it’s usually in a binary sense: the result is right or wrong; the test passed or failed. We generally don’t allow for “maybe it’s OK” or “possibly right” as outcomes. We consider the test outcomes to be deterministic; there is one right answer (and we know what it is). Below is a list of some examples of such deterministic verification strategies.

Deterministic Strategies

• Parallel function

– Previous version

– Competitor

– Standard function

– Custom model

• Inverse function

– Mathematical inverse

– Operational inverse (e.g. split a merged table)

• Useful mathematical rules (e.g. sin2(x) + cos2(x) = 1)

• Saved result from a previous test.

• Expected result are encoded into data.

Each example provides some means for determining whether or not a test result is correct. It is useful to note that, although, the strategy allows us to pass or fail a particular result, in many cases there are ways that the SUT can give us a wrong result, and yet the test can pass (e.g., an undetected error in the previous version or the competitor’s product because of uncertainty maxim [4]). There are several interesting characteristics relating an oracle to the SUT. Hoffman [14] provides a list of some useful characteristics based on the correspondence between the oracle and the SUT. The results predicted by an oracle can range from having almost no relationship to exact duplication of the SUT behaviors. Completeness, for example, can range from no predictions (which may not be very useful) to exact duplication in all results categories (an expensive reimplementation of the SUT).The following patterns for test oracles are presented in Table 1., based on test oracle design strategy [11].

The oracle patterns can be ranked by fidelity, generality, maintainability, and cost. Fidelity is based on next oracle characteristics: completeness of information, temporal relationships, and accuracy of information. Generality is based on usability of the oracle or of its results, and complexity characteristics. Maintainability, and cost characteristics are explained above. The characteristics such as control and early availability, visibility, extensibility, modularity in system simulation have great value in system/software requirement, design and test oracle solution as oracle patterns of highest rank such as in case study presented in [2,17] papers.

|Oracle Patterns |

|Approach |Pattern Name |Intent |

|Judging |Judging |The tester evaluates pass/no-pass by looking at the output on a screen, a listing, using a debugger, |

| | |or another suitable human interface. |

|Pre-Specification |Solved Example |Develop expected results by hand or obtain from a reference work. |

| |Simulation |Generate exact expected results with a simpler implementation of the SUT (e.g., a spreadsheet.) |

| |Approximation |Develop approximate expected results by hand or with a simpler implementation of the SUT. |

| |Parametric |Characterize expected results for a large number of items by parameters |

|Gold Standard |Trusted System |Run new test case against a trusted system to generate results. |

| |Parallel Testing |Run the same live inputs into the SUT and a trusted system. Compare the output. |

| |Regression Testing |Run an old test suite against a partially new system. |

| |Voting |Compare the output of several versions of the SUT. |

|Organic |Smoke Test |Use the basic operability checks of the run time environment. |

| |Reversing |Reverse the SUT's transformation. |

| |Built-in Test |Don't develop expected results. Implement assertions that define valid and invalid results. |

| |Executable Specification |Actual input values and output values are used to instantiate the parameters of an executable |

| | |specification. A specification checker will reject an instantiation, which is inconsistent, |

| | |indicating incorrect output. |

| |Built-in Check |Compare expected and actual total, checksum, or similar encoding. |

| |Generated Implementation |Generate a new implementation from a specification; compare output from the SUT and the generated SUT|

| | |for the same test case. |

| |Different But Equivalent |Generate message sequences that are different but should have the result; run on separate objects and|

| | |compare for equality. |

Table 1. Patterns for test oracles

4. Test oracle problems

The oracle should be designed as integral part of a test suite and its test harness. Problems of oracle design include the following:

➢ Given the responsibilities and implementation of the SUT, how can expected results be produced? Can they be produced at all?

➢ Is there any advantage to combining (separating) generation of test inputs and expected results?

➢ Should the expected results be available before, during, or after a test run?

➢ How can expected and actual results be stored to support regression testing?

➢ Is evaluation done by direct comparison or by indirect checking?

➢ What should be the scope of checking? Just the public interface of a class, all of the objects that compose a class, or the entire address space of the process in which the SUT is running?

➢ Can evaluation be partially manual and partially automatic?

➢ What is the test run frequency? Would a fast, reliable oracle cat test cycle time, allowing many more tests to run? Would this improvement in test effectiveness offset any increased cost to develop the faster oracle?

➢ What level of precision is required of scalar variables? Is exact equality required to pass? Do you understand exactly how floating-point operations work for the SUT?

➢ What kind of probes, traces, or access methods are necessary to obtain the actual results?

➢ How much platform independence is necessary? Must the oracle work on many different computers?

➢ What are the observable output formats – GUI bitmap, GUI widgets, character strings on standard output, file system, database system, network packets, device-level bitstreams, analog (voice, data, audio, video, or other RF signals), mechanical movements, or other physical quantities?

➢ Is the observable output persistent or must provisions be made to capture and save it?

➢ Will the observable results be produced on a single system or must the test harness coordinate collection of results from nodes in a distributed system?

➢ Will localization (translation between human languages) be required?

➢ To what extent will the state of the runtime environment (apart from the SUT) need to be evaluated?

➢ How much time is available to generate and evaluate the results?

➢ Should expected results be usable by humans? By subsequent test cases? In subsequent regression test runs?

5. Test oracle solutions

After having faced the oracle design problem many times in practice and published literature, there are some suggestions and tips for tackling this difficult problem:

✓ If possible, review some expected results produced by your oracle and your assumptions with system users. Values that may look correct to a developer or tester may be subtly incorrect due to some special constraints.

✓ The more complex the oracle, the greater the chance of spurious test results. Try for the simplest solution.

✓ If the oracle is specification-based, do not forget to verify the specification. Scrutinize the specification for omissions, contradiction….

✓ Try the oracle for test cases that have obvious expected results – for example, all zeros in, all zeros out. Such test cases check the oracle and comparator.

✓ If practical and feasible, try using several independent sources. For example, if you are picking values from a table in a reference work, try to find several other reference works that provide the same information. Interleave values from these sources. If you are usin an existing system as the oracle, try running the system in different configurations or platforms, varying the time of day, altering the background load, and so on

✓ Although writing a program to generate millions of test case inputs is usually not difficult, producing their expected results is often equivalent to developing the SUT. Two of the oracle patterns can partially overcome this limitation. Built-in Test Oracle will detect some, but not all, incorrect output from any input. You may be able to use an existing system as gold standard oracle. Run the existing system with your test inputs. It will automatically produce some or all of your expected results (Trusted System Oracle pattern).

✓ Design-for-testability tip: Consider abandoning a test strategy or test case if it requires a very difficult or costly oracle. Try to use existing code, files, or test suites as much as possible.

✓ Design-for-testability tip: Consider reworking an application specification or requirement if its oracle would be very difficult or costly to develop.

✓ Consider a partial or approximating oracle. Don’t assume that your oracle must generate complete expected results for every possible input and state. Concentrate on generating outputs that must be correct or that are difficult and/or time-consuming to check by hand.

✓ Consider using several kinds of oracles to offset weakness. For example, you can use an existing system to generate about half of the critical outputs for a new system. You could implement built-in test assertions to check relationships on the newer output.

Oracle development can represent significant effort that may increase design and implementation cost; however, overall testing and maintenance costs should be reduced. Oracle development must therefore be carefully integrated into the software development life cycle. Oracles must be designed for unit testing, subsystem testing, and integration testing in a disciplined manner.

Oracles can be designed directly from formal design specifications [7]. The purpose of a formal specification is to describe the behavior of a software system at the highest level of abstraction without including implementation details. A specification that is executable can be tested to determine if its behavior satisfies informal requirements. The term “executable specification" usually refers to specifications that produce rather than check output. It is possible for incorrect output to be accepted by an incorrect oracle, both the oracle and program can fail on the same data. Such coincidental correctness a serious problem recognized in studies of n-version programming [18,19]. In n-version programming several implementations are developed for the same specification. The execution behavior of the different versions is compared. Coincidental correctness between program and its test oracle is unlikely because of different objectives: a program must produce an answer, while an oracle simply checks that answer. If an incorrect oracle is used as a specification then the errors may be propagated to the implementation. Such errors will not be discovered during oracle-based testing.

Also note that oracles and programs should not share subroutines. A common problem in software maintenance occurs when program code is modified but the documentation is not. Over time, engineers learn not to trust documentation and, as a result, only the program code can be used to help understand the system. Test oracles are actually a formalized form of documentation. Consistency between test oracle documentation and the program implementation can be insured. Programs can be tested using the oracles after modifications. The oracles will be updated as required to complete this regression testing process. With such an arrangement, test oracles can provide another accurate description of a software system. As a conclusion, test oracle can be devised from requirement specification, from design documentation, former version of the same program or similar another program or a human being and from our experience computer based simulation.

6. Conclusion

Automated test oracles are critical to any automated software testing process. Without test oracles, rigorous testing is not possible. Oracles allow the testing of large amounts of data. Therefore, they are appropriate for use with the most discriminating testing criteria, random testing techniques, and reliability measurement. Effective test oracles must be designed before (or concurrently with) the implementation of software. Main problems and corresponding solutions of Oracle development, which should be part of the software development process is described. Future research should be focused on reduction hypotheses HR to limit the size of the test set, by reducing the number of cases that must be tested, while preserving the integrity of the test set. These hypotheses are the uniformity hypotheses, which are strong and correspond to a 1:n generalization of the program behavior, the regularity hypotheses, which are weaker and correspond to a n:m generalization of the program behavior, and the incremental hypotheses, which take into account the property preservation etc.

References:

[1] Elaine J. Weyuker: The oracle assumption of program testing. In 13th International Conference on System Sciences, 44-49, Hawaii, USA, 1980.

[2] Ljubomir Lazić, D. Velasević, N. Mastorakis: A framework of integrated and optimized software testing process, WSEAS TRANSACTIONS on COMPUTERS, Issue 1, Volume 2, January 2003.

[3] Debra J. Richardson: TAOS: Testing with Analysis and Oracle Support, In Proceedings of the 1994 International Symposium on Software Testing and Analysis (ISSTA), 138-153, Seattle, August 1994.

[4] H. Ziv and D.J. Richardson, Constructing Bayesian-network Models of Software Testing and Maintenance Uncertainties, International Conference on Software Maintenance, Bari, Italy, September 1997.

[5] William E. Howden and Peter Eichhorst: Proving properties of programs from program traces. In Tutorial: Software Testing an Validation Techniques, IEE Computer Society Press, New York, 1978.

[6] Elaine J. Weyuker: On testing non-testable programs. Computer Journal 25(4):465-470,1982.

[7] Deborah J. Richardson, S. L. Aha, and T.O. O’Malley:Specification-Based test oracles for reactive Systems. In Proceedings of the 14th International Conference on Software Engineering, Los Alamitos, Calif. IEE Computer Society Press, May 1992.

[8] Stephane Barby: Test Selection for Specification-Based unit testing of object-oriented software based on formal specification. PhD. Diss.,Laussane, Switzerland, Swiss Federal Institute of Technology, 1997

[9] Giles Bernot, M. C. Gaudel, and B. Marre. Software testing based on formal specifications: a theory and tool. Software Engineering Journal 6(6), 337-405,November, 1991.

[10] P. Dauchy, M. . C. Gaudel, and B. Maree. Using algebraic specifications in software testing: a case study on the software on an automatic subway. Journal of Systems and Software 21(3),229-244, June 1993.

[11] Robert V. Binder: Testing Object-Oriented Systems: Models, Patterns, and Tools. Addison-Wesley, 2000.

[12] Robert M. Poston: Automated testing from objects models. Communications of the ACM 37(9), 48-58, September 1994.

[13] James M. Bieman and Hwei Yin:Designing for software testability using automated oracles. Proc. International Test Conf., 900-907, Sept. 1992.

[14] Douglas Hoffman: Heuristic test oracles. Software Testing & Quality Engineering 1(2), 28-32, March/April 1999.

[15] Robert M. Poston: A taxonomy of Test Oracles. Quality Week 1998.

[16] Elaine J. Weyuker and Thomas J. Ostrand: Theories of program testing and the application of revealing subdomains. IEEE Transactions on Software Engineering, SE-6(3), 236-246, May 1980.

[17] Lazić Lj., Velašević D. Applying simulation to the embedded software testing process, to be published.

[18] J. Knight and N. Leveson. An experimental evaluation of the assumption of independence in multi-version programming. IEEE Trans. Software Engineering, SE-12(1):96{109, Jan. 1986.

[19] B. Littlewood and D. R. Miller. Conceptual modeling of coincident failures in multiversion

software. IEEE Trans. Software Engineering, 15(12):1596{1614, Dec. 1989.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download