Program Analysis Lecture 9



Program Analysis Lecture 9 (30/5/02)

Shape Analysis via 3-Valued Logic TVLA + Applications cont.

Mooly Sagiv

Lecture notes by Eran Globen

Additional notes are in blue

Part 1: Shape Analysis via 3-Valued Logic Cont.

As we saw last class the 3-valued logic abstract interpretation is sound but not very precise. In order to refine the abstraction and gain a more precise abstract interpretation we use the following principle.

The Instrumentation Principle

Increases precision by storing the truth-value of some designated formulae. The value of those extra predicates will allow us to differentiate between cells with different predicate values. By the embedding theorem (see previous class) we know that the value of these predicates is kept sound in the abstract world as well (this is demonstrated in Figures 1, 2)

Introduce predicate-update formulae to update the extra predicates. In order to increase efficiency the extra predicates are updated with each step as opposed to being calculated from scratch with each change.

The following is an example of such a predicate.

IS(v) – Is Shared – This predicate evaluates to 1 if the object v is pointed to by more than one object.

is[sel](v) = (v1,v2: sel(v1,v) ( sel(v2,v) ( (eq(v1, v2)

[pic]

Figure 1

In the following figure one of the cells is shared. This change is reflected in the abstraction resulting in a more precise representation of the data structure.

[pic]

Figure 2

Updating sharing x.sel:=y

Following is the predicate update formulate for is

is [sel]’(v) :=

((v1:x(v1)?

(y(v)?

(v2:sel(v2, v) ((x(v2)

:(sel(v1, v)?

(v2, v3: (is[sel](v2, v3, v) (

(x(v2) ((x(v3)

: is[sel](v))

:is[sel](v))

with the following:

(is[sel](v2, v3, v) = sel(v2, v)(sel(v3, v) ( (eq(v2, v3)

(is[sel] is true if v is pointed to by v2 and v3 and v2, v3 are different cells.

Other Instrumentation

For doubly linked lists:

c[cdr,car](v)=(v1: cdr(v, v1)(car(v1, v)

c[car,cdr](v)=(v1: car(v, v1)(cdr(v1, v)

Notice that in doubly-linked lists there are several ways to reach a node. Still when

c[cdr, car](u) and c[car, cdr](u) are both 1, we see that u is an element of a doubly linked list.

reachability:

r[sel](v1, v2) = sel*(v1, v2)

r[x, sel](v) = (v1: x(v1)(sel*(v1, v)

r[x](v) = (v1: x(v1)((car|cdr)*(v1, v)

The importance of reachability will be explained later.

For sorted lists:

inOrder[sel,dle](v) = (v1: sel(v, v1) (dle(v, v1)

inROrder[sel,dle](v) = (v1: sel(v, v1) (dle(v1, v)

Semantic Reduction

• Improves the precision of the analysis by recovering properties of the program semantics.

• Given a Galois connection (L1, (, (, L2)

An operation op:L2(L2 is a semantic reduction

if (l(L2 op(l)(l

and ((op(l)) = ((l)

• Can be applied before and after basic operations

• Preserves soundness

• The decision to apply or not apply a given semantic reduction is mostly performance based.

• In a generic system like TVLA semantic reduction is very important as it allows defining the meaning of program operations in a modular way by ignoring the interdependencies between different predicates and then using a semantic reduction to improve the precision by observing certain dependencies.

• Semantic reductions need not lead to the best (induced) abstract interpretation. That is it may be that ((concrete-op(((l)) ( ((op(((l)) for a concrete operation concrete-op and a semantic reduction (. The intuitive reason is that ( may not take into account all the information which is available on the concrete semantics and how it affects the abstraction.

Materialization

An example of a semantic reduction is the materialization operation as shown in Figure 4. In structure 3 the value of x(u) is ½. In structure 4 we see that the summary node has been split into two and the value for x(u2) is 1, yielding a more precise abstraction. Materialization “creates” the cells with a definite value for the core variable predicates so that those values are more in focus.

Notice this operation is not formally a semantic reduction since structure 2 represents some concrete stores that 4 does not represent. For example, structure 2 also concrete stores in which x points to the 3rd element. This problem is avoided in TVLA by decoupling the semantic reduction from the abstract meaning of operations

[pic]

Figure 3

The Focusing Principle

• The purpose of the focusing principle is to increase precision.

• More specifically we wish to “Bring the predicate-update formula into focus” (Force 1/2 to 0 or 1) and only then apply the predicate-update formulae.

• Generalizes materialization

• Focusing is sound in the sense that concretization of structure S and focus(S) is the same.

• This is a true a semantic reduction and it can be changed by using different formulae.

Following are a few samples of the focus operation on different formulae.

Step 1: Focus on (1(v) = ( v1: x(v1) ( cdr(v1,v)

[pic]

Figure 4

All three right hand structures have definite values for the focused formula.

The second structure actually represents a list with only 2 cells since cdr(u1, u)=1 and surely u1 can only point to one cell. The third structure is a list with u1 pointing to only part of the list.

Notice that the 3 structures on the right represent exactly the same stores as the ones represented in the left.

After focusing on a formula we can update the appropriate predicate. In this case x is a pointer traveling on a list.

Step 2: Evaluate Predicate-Update Formulae

x’(v) = ( v1: x(v1) ( cdr(v1,v)

[pic]

Figure 5

As explained before summary node u in the second figure can only represent one actual cell. In the same way we can see that summary node u.1 in the third structure can only represent one node. Also we know that if the original structure was a simple linked list then some of the cdr links between u.1 and u.0 are impossible. We will see these problems handled later when coercion is introduced.

The Focus Operation

Focus: Formula((P(3-Struct) (P(3-Struct)) Focus is a partial function. For some formulae it might return an infinite number of structures and so is undefined for those formulae. For every formula (

Focus(()(X) yields structure in which ( evaluates to definite values in all assignments.

Focus(() is a semantic reduction but Focus(()(X) may be undefined for some X

For more details about the focus operation see Tal Lev-Ami’s Master thesis at

Focus on ( v1: cdr(v1,v)

[pic]

Figure 6

Focusing on the structure in Figure 8 will result in an infinite amount of structures as shown in Figure 10. . The main problem is that the summary node u cannot be explicitly included since the formula evaluates to ½ on this structure. Thus, we have to keep around all the structures that represent the same concrete elements and have definite (0 or 1) values. TVLA detects such situation at analysis time and yields an exception. A TVLA user needs to guarantee that her/his analysis will not reach such situations.

For example, in shape analysis we only focus on the L- and R-values of pointer expressions and thus guarantee that infinite structures cannot occur.

[pic]

Figure 7

As was shown in figure 6 there might be additional information about the data structures which we can use. We can use that information to further improve our semantics so it can rule out inconsistent structures or sharpen values of instrumentation predicates.

The Coercion Principle

• Another Semantic Reduction

• Can be applied after Focus or after Update or both

• Increases precision by exploiting some structural properties possessed by all stores (Global invariants) (e.g. a variable can only point to one cell).

• Structural properties captured by constraints (e.g. a certain structure is a acyclic list).

• Apply a constraint solver

A sample result after applying a Constraint Solver:

[pic]

Figure 8

After applying the constraint solver we can easily see that variable x is now pointing at the next element in a list.

Sources of Constraints

Properties of the operational semantics [x(v1) (x(v2)(eq(v1, v2)]

• Domain specific knowledge

• Instrumentation predicates [is[sel](u.1) = 0]

• User supplied

Format of Constraints

((p(v1, v2, …, vk) i(j ( vi(vj

(((p(v1, v2, …, vk) i(j ( vi(vj

The right hand side of a constraint can only contain an atomic formula.

Interpretation: If LHS is 1 so is RHS. this is a strong definition as the RHS cannot be 1/2 if the LHS is 1.

Constraints are preserved under tight embedding

Example Constraints

x(v1) (x(v2)(eq(v1, v2) – A variable can only point to one cell.

sel(v, v1) (sel(v,v2)(eq(v1, v2) – A cell can only point to one cell.

sel(v1, v) (sel(v2,v)((eq(v1, v2)(is[sel](v) – A cell is pointed to by 2 different cells iff it is shared.

Example Constraints

x(v1) (x(v2)(eq(v1, v2) (1)

x(v1) ((eq(v1, v2) ((x(v2) (1a)

sel(v, v1) (sel(v,v2)(eq(v1, v2) (2)

sel(v, v1) ( (eq(v1, v2) ( (sel(v,v2) (2a)

sel(v1, v) (sel(v2,v)((eq(v1, v2)(is[sel](v) (3)

sel(v1, v) (sel(v2,v)((is[sel](v)( eq(v1, v2) (3a)

sel(v1, v) ((eq(v1, v2) ((is[sel](v)( (sel(v2,v) (3b)

is[sel](v) (sel(v1, v) (sel(v2,v)((eq(v1, v2) (4)

From each of the original constraints (1, 2, 3 and 4) additional constraints were created, these constraints are the logical consequences of the original one and can be created automatically.

Examples for applying a Constraint Solver

[pic]

Figure 9

Applying constraint (1) we can see that summary node u.1 is in fact just one cell.

Applying constraint (3b) we can eliminate some of the cdr edges with indefinite predicate values.

Part 2: TVLA

From “TVLA: A framework for Kleene logic based static analysis”: TVLA is a “YACC”-like framework for automatically constructing static-analysis algorithms from an operational semantics where the operational semantics is specified using logical formulae.

The TVLA System

• Input:

o SOS using Predicate Logic.

o Control Flow Graph.

o A set of 3-valued structures at the start node.

o Implementation hints.

o First two are supplied in .tvp (three value program) file. Last two in a .tvs (three value structure) file.

• Output

o The set of 3-valued structures at every control flow node (labeled directed graphs)

o Textual error messages

o Output is created in a postscript file.

Partial info on TVLA command line

tvla [-d]

–d sets debug mode. Using the –d flag shows the structure in different phases of execution.

For more information see the TVLA user manual.

Some sample analyses

Creating a list of unknown (at compilation time) size.

|/* list.h */ |/* create.c */ |

|typedef struct node { |Elements create() { |

|struct node *n; |Elements *f, *x; |

|int data |int i, size; |

|} *Elements; |for(i=0; in = x; |

| |x = f; |

| |} |

| |return x; |

| |} |

Running TVLA with p and an empty structure supplied by s.

tvla create include\empty –d (in debug mode)

tvla create include\empty

[pic]

Figure 1: Control flow graph

|[pic] | | |

| |[pic] |[pic] |

|Figure 2.b: Location n_4 |Figure 2.a: location n_5 |Figure 2.b: Location Exit |

Figure 1 shows the control flow graph as supplied to TVLA.

Figure 2 shows some of the structures reached during the execution. Dotted lines have a predicate value of ½. Double edged nodes are summary nodes.

Reversing a linked list.

|typedef struct node { |Elements * reverse(Elements *x) { |

|struct node *n; |Elements *y, *t; |

|int data; |y = NULL; |

|} *Elements; |while (x != NULL) { |

| |t = y; |

| |y = x; |

| |x = x ( n; |

| |y ( n = t; |

| |} |

| |return y; |

| |} |

Running TVLA on p with an initial structure of a list (s)

tvla reverse include\list

[pic]

Figure 3

|[pic] |[pic] |[pic] |

|Figure 4a: Location n_1 |Figure 4b: Location n_2 |Figure 4c: Location exit |

Figure 4a shows the initial structure supplied to TVLA in s.

Figure 4b shows a possible fixed point for the loop. Following the flow of the program we can see that this is not a very precise abstraction. x and y actually point at disjoint lists. We can also see that the program does not leak memory as in Figure 4c. Both these problems will be remedied later when we introduce the reachability predicate.

Some information on tvp files

For more information see the TVLA user manual.

Structure of a tvp file

tvp files are structured much like a YACC file using %% to separate sections and %x for declarations, actions, etc. A tvp file has three parts:

Predicate definitions – declare core predicates, instrumentation predicates and consistency rules (constraints) here.

Action definitions – declare actions here. Actions are used later in the CFG part.

Control flow graph (CFG) – The program to be analyzed is composed of CFG nodes with edges connecting between them and actions to be performed on these edges.

General

The tvp file goes through a C pre-processor so usual pre-processor command can be used (e.g. #include).

Comments are in C++ style. // comments until end of line. /* comments until */.

foreach – As expected foreach performs action for each predicate in the supplied predicate set.

A - (

E - (

Declarations

%s – Set declaration.

%p – Core predicate. These predicates can be declared one for each programming language and reused with each program analyzed.

%i – Instrumentation predicate. These are more program specific predicates.

%r – Consistency rule. Most consistency rules are generated automatically but additional rules can be defined manually.

Predicate properties

unique – This predicate is true for at most one node.

function – This predicate is a partial function.

box – display this predicate in a box.

First two properties imply consistency rules. Third property is used only for display.

Actions

Actions are used much like functions in other programming languages. The main part of an action is the update formula.

%action – Action declaration.

%t – Action title, used when printing the action's structures.

%f – Focus formulae. Applied before the precondition.

%p – Precondition. The precondition formula is evaluated to check whether this action should be performed.

%message – Report a message if the formula given is potentially satisfied.

%new – A mechanism for creating new nodes.

isNew() – A special unary predicate created when %new is used and set true only for the nodes created in this action.

Control Flow Graph

The CFG is built from edges and nodes. A cfg.node is declared implicitly by its usage in an edge. The syntax for a CFG edge is:

This creates and edge between the two nodes with action performed to move from first node to second node.

For nodes with no semantic meaning the action uninterpreted can be used

When translating a destructive update into a CFG node it is convenient to use two actions. First explicitly set the pointer to null then assign it the new value.

Following is a sample tvp file used for analyzing reverse.c.

|/* p */ | |

|%s PVar {x, y, t} |Predicate definitions |

|#include "include\p" | |

|%% | |

|#include "include\p" |Action definitions |

|#include "include\p" | |

|%% | |

|// y = NULL |Control Flow Graph (tvp syntax is preceded by the commented C |

|n_1 Set_Null_L(y) n_2 |code) |

|// while (x != NULL) { | |

|n_2 Is_Null_Var(x) exit | |

|n_2 Is_Not_Null_Var(x) n_3 | |

|// t = y | |

|n_3 Copy_Var_L(t, y) n_4 | |

|// y = x | |

|n_4 Copy_Var_L(y, x) n_5 | |

|// x = x->n | |

|n_5 Get_Next_L(x, x) n_6 | |

|// y->n = NULL | |

|n_6 Set_Next_Null_L(y) n_7 | |

|// y->n = t | |

|n_7 Set_Next_L(y, t) n_2 | |

|// } | |

Following is the code used to analyze create.c

/* p */

%s PVar {x, f}

#include "include\p"

%%

#include "include\p"

#include "include\p"

%%

// for(i=0; in = x;

n_3 Set_Next_Null_L(f) n_4

n_4 Set_Next_L(f,x) n_5

// x = f;

n_5 Copy_Var_L(x, f) n_1

n_2 uninterpreted() exit

// }

Predicates are defined in p

/* p */

/*************** Core Predicates *************/

foreach (z in PVar) {

%p z(v_1) unique box

}

%p n(v_1, v_2) function

/**********************************************/

/*********** Instrumentation Predicates *************/

%i is[n](v) = E(v_1, v_2) ( v_1 != v_2 & n(v_1, v) & n(v_2, v))

Conditional actions are defined in p

/* p */

%action uninterpreted() {

%t "uninterpreted"

}

%action Is_Not_Null_Var(x1) {

%t x1 + " != NULL"

%f { x1(v) }

%p E(v) x1(v)

}

%action Is_Null_Var(x1) {

%t x1 + " == NULL"

%f { x1(v) }

%p !(E(v) x1(v))

}

%action Is_Eq_Var(x1, x2) {

%t x1 + " == " + x2

%f { x1(v), x2(v) }

%p A(v) x1(v) x2(v)

}

%action Is_Not_Eq_Var(x1, x2) {

%t x1 + " != " + x2

%f { x1(v), x2(v) }

%p !A(v) x1(v) x2(v)

}

All other actions are defined in p

/* p */

%action Set_Null_L(x1) {

%t x1 + " = NULL"

{

x1(v) = 0

}

}

%action Copy_Var_L(x1, x2) {

%t x1 + " = " + x2

%f { x2(v) }

{

x1(v) = x2(v)

}

}

%action Malloc_L(x1) {

%t x1 + " = (L) malloc(sizeof(struct node)) "

%new

{

x1(v) = isNew(v)

}

}

%action Get_Next_L(x1, x2) {

%t x1 + " = " + x2 + "->" + n

%f { E(v_1) x2(v_1) & n(v_1, v) }

%message (!E(v) x2(v)) -> "an illegal dereference to\n" +

n + " component of " + x2 + "\n"

{

x1(v) = E(v_1) x2(v_1) & n(v_1, v)

}

}

%action Set_Next_Null_L(x1) {

%t x1 + "->" + n + " = null"

%f { x1(v) }

%message (!E(v) x1(v)) -> "an illegal dereference to\n" +

n + " component of " + x1 + "\n"

{

n(v_1, v_2) = n(v_1, v_2) & !x1(v_1)

is[n](v) = is[n](v) & (!(E(v_1) x1(v_1) & n(v_1, v)) |

E(v_1, v_2) v_1 != v_2 &

(n(v_1, v) & !x1(v_1)) & (n(v_2, v) & !x1(v_2)))

}

}

%action Set_Next_L(x1, x2) {

%t x1 + "->" + n + " = " + x2

%f { x1(v), x2(v) }

%message (E(v, v1) x1(v) & n(v, v1)) ->

"Internal Error! assume that " + x1 + "->" + n + "==NULL"

%message (!E(v) x1(v)) -> "an illegal dereference to\n" +

n + " component of " + x1 + "\n"

{

n(v_1, v_2) = n(v_1, v_2) | x1(v_1) & x2(v_2)

is[n](v) = is[n](v) | E(v_1) x2(v) & n(v_1, v)

}

}

Reachability

• Concrete semantics which records reachability from stack (global) variables

• Useful for:

o Compile-time Garbage Collection. If we can prove a structure is garbage at compile time we can insert the call to free it.

o Saving calls to Dynamic Garbage Collection. If we can prove there are no garbage items at a specific point we can save calls to the garbage collector.

▪ No memory leaks. We might be able to prove a program creates no garbage and so no memory leaks.

o Speeding-up static analysis. Despite of making the analysis more complex it takes less time to complete.

• Instrumentation predicate

r[n, x](v) = E(v1) x(v1) & n*(v1, v) (v is reachable from x with n steps).

Analyzing create.c again with reachability, we are now able to prove that create.c does not leak any memory.

tvla rcreate include\empty

p is the reachability enabled tvp file for create.c.

Analyzing reverse.c again with reachability.

tvla rreverse include\rlist

[pic]

Figure 5: location n_2

Figure 5 shows one of the possible structure at location n_2 (loop conditional) in the new analysis for reverse. We can now see that x and y point at disjoint lists.

Analyzing fumble

This is a faulty version of reverse which causes a memory leak.

|typedef struct node { |Elements* reverse(Elements *x) |

|struct node *n; |{ |

|int data; |Elements *y,*t; |

|} *Elements; |y = NULL; |

| |while (x!= NULL) { |

| |t = x(n; |

| |y = x; |

| |x(n = y; |

| |x = t; |

| |} |

| |return y; |

tvla rfumble include\rlist

In the next page you can see the control flow graph (Figure 6) for rfumble above. You can also one of the warning messages issued for location n_4 (y = NULL). Notice the message saying “potential memory leak”, because of the soundness of the analysis we cannot be certain a memory leak will occur in this location.

[pic]

Figure 6: CFG for rfumble

[pic]

Figure 7: Location n_4

Some of the code used for analyzing with reachability. The reachability predicate is defined for each of the program core variables. r[] uses an additional predicate c[] which checks for cycles.

/* p */

/*************** Core Predicates *************/

foreach (z in PVar) {

%p z(v_1) unique box

}

%p n(v_1, v_2) function

/**********************************************/

/*********** Instrumentation Predicates *************/

%i is[n](v) = E(v_1, v_2) ( v_1 != v_2 & n(v_1, v) & n(v_2, v))

foreach (z in PVar) {

%i r[n,z](v) = E(v_1) (z(v_1) & n*(v_1, v))

}

%i c[n](v) = n+(v, v)

/* p */

%action Set_Null_L(x1) {

%t x1 + " = NULL“

%message !(|/{A(v)(x1(v) -> r[n,z](v)) : z in PVar -{x1}}) -> "potential memory leak of element pointed-to-by\n" + x1 + "\n"

{

x1(v) = 0

r[n, x](v) = 0

}

}

%action Copy_Var_L(x1, x2) {

%t x1 + " = " + x2

%f { x2(v) }

{

x1(v) = x2(v)

r[n,x1](v) = r[n,x2](v)

}

}

%action Malloc_L(x1) {

%t x1 + " = (L) malloc(sizeof(struct node)) "

%new

{

x1(v) = isNew(v)

r[n, x](v) = isNew(v)

}

}

Benefits of Reachability

• Predict memory leaks

• Separate disjoint data structures

• More precise semantic reduction

• x = x->n

TVLA Design Mistakes

• The operational semantics is written in too low level language

o No types

o No local updates to specific predicate values

o No constants and functions

o No means for modularity

o No local variables and sequencing

o Combines UI with functionality

• TVP can be a high level language

• TVLA might have been better named 3VLA

TVLA Experience

• Quite fast on small programs

o But runs on medium programs too (Around 5k lines)

• Not a panacea

o panacea \pan-uh-SEE-uh\, noun:

A remedy for all diseases, problems, or evils; a universal medicine; a cure-all.

• More instrumentation may lead to faster (and more precise) analysis

-----------------------

is = 0

is = 0

91

71

31

x

(

is = 0

is = 0

is = 0

is = 0

x

u

u1

x

u

u1

is = 1

is = 1

is = 0

is = 0

x

u

u1

91

71

31

x

(

is = 0

is = 0

is = 0

cdr

cdr

cdr

cdr

x

u

u1

cdr

cdr

cdr

cdr

y

u2

u3

u1

x

y

x := x ( cdr

cdr

cdr

u

u1

u

u1

y

x

cdr

cdr

x

u

u1

x

u

u1

y

x := x ( cdr

u.0

x

u.1

u1

y

cdr

cdr

x

u

u1

x

cdr

u1

y

cdr

x

u

u1

x

u

u1

y

cdr

cdr

x

u

u1

cdr

cdr

cdr

cdr

y

(

(

(

(

(

(

(

(

(

(

(

u.1

u1

y

u.0

x

cdr

cdr

cdr

cdr

cdr

x

u

u1

u

u1

y

cdr

cdr

u.1

u1

u.0

y

cdr

cdr

x

u

u1

y

cdr

u

u1

u

u1

y

cdr

x

u

u1

x

u

u1

y

y

.

.

.

u1

u1

x

u1

y

x

y

u

x

cdr

cdr

cdr

cdr

u.1

u1

y

u.0

x

cdr

cdr

cdr

cdr

cdr

u.1

u1

y

u.0

x

cdr

cdr

cdr

cdr

cdr

u.1

u1

y

u.0

x

sel(v1, v) ((eq(v1, v2) ((is[sel](v)( (sel(v2,v) (3b)

cdr

cdr

cdr

u.1

u1

y

u.0

x

cdr

cdr

cdr

cdr

cdr

u.1

u1

y

u.0

x

x(v1) (x(v2)(eq(v1, v2) (1)

cdr

cdr

cdr

cdr

cdr

u.1

u1

y

u.0

x

1

2

3

4

(1(u) = 0

(1(u) = ½

(1(u) = 1

(1(u.1) = 1

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download