Using Industrial Tools for Software Feature Location and ...



Using Industrial Tools for Software Feature Location and Understanding[1]

|Sharon Simmons, Dennis Edwards, Norman Wilde, Josh Homan |

|Department of Computer Science, University of West Florida |

|Michael Groble |

|Motorola, Inc. |

Executive Summary

Software Engineers supporting a large software system often need to locate the code that performs a specific user feature. One method to solve this problem is software reconnaissance, which compares execution traces taken when the feature was active with background execution traces when it was not. Software components executed in the first set but not in the second tend to be involved in the feature of interest.

The software reconnaissance method has been tried in a number of contexts and academic software tools, such as the Recon3 toolset, are freely available. However companies might be more willing to apply this method if they could use commercial, industrial-strength tools, of known reliability.

This report describes a study performed with Motorola, Inc. to see if Metrowerks CodeTEST and Klocwork inSight could be used for feature location. Both tools are currently in use in Motorola and are know to be robust and effective. CodeTEST is a dynamic analysis tool and can produce traces of execution, while inSight is a static analysis tool which allows browsing and architectural analysis of a large system.

The two tools were combined with TraceGraph, a trace comparison tool from the Recon3 toolset, in a case study of four features in a large open-source software system. The study showed that the tool combinations were effective for feature location, though about 180 hours of effort was needed for tool adaptations to get them to work together. Tool integration was still less than optimal, with manual steps being required to get data from one tool to the next.

The typical time to locate, understand and document each feature was only about 4 hours. In most cases the software engineer only had to study a few hundred lines out of the more than 200,000 lines making up the system.

We conclude that CodeTEST and inSight can be used effectively for feature location. We plan enhancements to the TraceGraph component to improve the ease of use of the combination.

Table of Contents

1 Introduction 2

2 Industrial Tools and Software Reconnaissance 3

3 Case Study Goals and Organization 5

4 Tracing Long-Running Systems 6

5 Using CodeTEST 7

5.1 Initial Design 7

5.2 Second Design 9

5.3 Third Design 10

5.4 Performance Analysis 12

6 Using Recon3 14

7 Using TraceGraph 15

8 Using inSight 16

9 Feature Location Results 17

10 Conclusions 19

11 References 20

Appendix 1 - Trace Collector and Python Code for the Third Design 23

Appendix 2 - Sample Scenario and Feature Report 33

Scenario 3 - Date 33

Feature Report 34

1. Introduction

Industrial software systems are so large that they often are no longer completely understood, especially after initial development is complete. Further evolution and servicing may then become the responsibility of a team which has lost some of the knowledge generated in system development [BENN.02].

A common problem facing the members of such a team is to understand how a specific feature of the software has been implemented. Large systems do not do just one thing, but instead offer many interrelated features to their users. A word processor, for example, will offer features for editing text, changing fonts and text styles, inserting tables and images, saving in several different file formats, etc. A change request for such a system may mention a bug or an enhancement related to a particular feature, e.g. "Setting the text style within a table heading does not work".

A software engineer assigned to this change request would have to track down the code involved in table headings and text style changes. Traditionally such feature exploration is done using a combination of text search utilities such as Unix's grep, a debugger (if a suspicious variable or code segment can be identified) and lengthy perusal of the source code. Such methods obviously become more and more difficult to use as the size of the system increases so faster feature location approaches would clearly be desirable.

Several researchers have studied dynamic analysis methods for feature location. These involve running the program using different sets of test data, some with the feature and some without. The program is instrumented to keep track of the software components that are executed in each test (modules, subroutines, or lines of code depending on the study). For example Wong et al. [WONG.99] use a metrics approach to quantify the disparity between a feature and a component, the concentration of a feature in a component, and the dedication of a component to a feature. An extended version of this approach characterizes the distance between features using both a static method and a dynamic one which takes into account a system operational profile [WONG.05]. A somewhat different method has been described by Eisenbarth et al. [EISE.02]. In this method the program is executed under several scenarios, each of which has been tagged with the features it involves. A trace of subroutine calls is taken, and these traces are analyzed to categorize the subroutines according to their degree of specificity to a given feature. The analysis also automatically produces a set of concepts for the program. The concepts are subsets of the subroutines that tend to execute together and are presented organized in a lattice.

The simplest, and perhaps the oldest, dynamic analysis approach to feature location has been called software reconnaissance [WILD.95]. In this method the program is executed for a few test cases with the feature, and for similar cases without the feature. The marker code for the feature is defined as the set difference:

[pic]

There have been a number of case studies of the reconnaissance method [WILD.96, WILD.98]. In general the method usually is usually good at feature location; it identifies a small number of "good places to start looking" to jump start the exploration of a large program. Feature understanding may be more difficult. The marker code is often a series of separated fragments that need to be understood in their context. The task of putting the pieces together ranges from trivial through very difficult, depending on the structure and clarity of the code [WILD.01].

2. Industrial Tools and Software Reconnaissance

Software reconnaissance can, in principle, locate marker code using any combination of a tool that produces a trace of execution combined with a differencing tool to compare the traces. However most of the published studies of reconnaissance have been done in an academic context. It is well known that academic tools often have difficulty dealing with real industrial software due to problems of scale and to the complexity of the heterogeneous environments found in the real world. A company may well prefer to work with industrial tools, which may be expensive but have been proven in use at the same company.

Motorola Labs was interested in knowing if tools currently in use at Motorola could be adapted for feature location in telecommunications systems. Motorola products include cellular phones, two-way radios, telecommunications switches, etc. that run on highly specialized platforms with many processors. Many of these systems are message oriented; a software component listens for incoming messages, processes them, and produces messages in response.

Motorola asked the the University of West Florida and the Software Engineering Research Center (SERC) to perform the pilot case study reported in this paper to see if reconnaissance could be performed effectively on this kind of software, using tools already in use in the company. The specific tools suggested were CodeTEST from Metrowerks[2] and inSight from Klocwork[3].

CodeTEST instruments software so that, when the compiled program runs, information can be collected on performance, test coverage, and memory usage [METRO]. It can be used either in native mode, with all functions performed in software, or in a hardware-in-circuit mode in which data collection is performed in real time using specialized hardware. Motorola has used the hardware-in-circuit mode to collect code coverage and performance data of a number of different embedded systems. The hardware collection mechanism has proved invaluable when analyzing device driver code and other packet processing software routines that are called thousands of times a second and have processing times in the range of 10 to 0.1 microseconds.

The inSight tool is part of a broader Klocwork suite that provides static analysis of large systems [KLOC]. Klocwork parses the source code and builds a data repository that has several different uses, including metrics generation, identification of potential defects and security issues, and enforcement of architectural constraints. inSight provides a visual interface to view software dependencies, construct architectural models, and explore possible refactorings.

It was immediately evident that these tools provide support for important parts of the feature understanding problem, but do not cover it completely. CodeTEST instruments the target program and produces traces, but it does not include at the moment any mechanism for comparing execution from different tests. inSight provides very powerful code browsing and abstraction capabilities, but has no direct way of importing or using trace data.

We decided to use the TraceGraph tool from our Recon3 toolset as an intermediary to do the comparison step. The Recon3 toolset is a free academic product for tracing software [RECON]. TraceGraph, the trace comparison tool of Recon3, provides a visual image of a set of traces that lets the eye pick out differences between them [LUKO.00].

The complete case study setup is thus as shown in Figure 1. To generate traces of execution subjects used both CodeTEST and the Recon3 tracing tool. While Recon3 is an academic tool with all the limitations that implies, it was designed specifically to support feature location so we thought it would be interesting to compare its results with CodeTEST.

[pic]

Figure 1 – Tools Used in the Case Study

Subjects in the case study ran test cases and used Code TEST or Recon3 to produce traces, both with and without the feature being sought. Then they fed these traces to TraceGraph to visually identify differences between them, thus giving the marker code for the feature. Finally, they used inSight to explore the C functions containing marker code and build a model of the feature, consisting of a diagram of the most important code elements and supporting text describing the way the feature works.

3. Case Study Goals and Organization

The primary goal of the case study was to evaluate two tool combinations as to their usefulness for feature location in systems typical of the Motorola domain:

1. CodeTEST + TraceGraph + inSight

2. Recon3 + TraceGraph + inSight

One question was about the usability of each tool by itself for a task that was probably not contemplated by the tool's designers. A second focus was to identify and, if possible, overcome expected difficulties in getting different tools to work together. A third question concerned the role of inSight. As far as we know this study represents the first attempt to combine dynamic feature location with a powerful static analysis tool to achieve feature understanding. How well would that combination work?

A secondary goal of the case study was to explore the process of adopting feature location technology in an industrial setting. Software managers at other companies may not have the same tools used at Motorola, but could reasonably be interested in the process of technology adaptation, and especially in its cost! Thus the case study tried to track fairly accurately the steps involved in adopting this new technology, as well as the person-hours expended and the efficiency of the resulting feature location process.

The case study required a realistic target system with a range of features to explore. Unfortunately both system complexity and hardware constraints made it impossible to study an actual Motorola system. Instead Motorola representatives and the researchers selected the Apache web server as an acceptable proxy [APACHE]. This open-source system is large (about 227 KLOC raw line count), multi-process and in widespread use. Its architecture is similar to many Motorola systems in that it is based on message passing; as described in the HTTP protocol, Apache waits for an incoming message, processes it, and returns a response message. Apache is written in standard C so all of the case study tools were applicable. For the case study we ran Apache under Solaris on a dual processor (2 x 360 MHz) Sun Sparc Ultra 60 that was available at the University of West Florida.

The case study followed a three phase technology adoption process similar to that which would reasonably be expected in industry:

1. Adaptation: New tools are installed along with existing tools. If necessary some "glue" code is written to tie them together appropriately. Trials are made using small programs.

2. Trial: The toolset is tried on a production system and any kinks are worked out.

3. Use: The toolset is used "live" as part of normal software change activity. Change requests are received, the relevant code is located, and it is analyzed before making the change.

To facilitate objective collection of data, work on the "use" phase of the case study was divided. One experimenter acted as controller. He prepared four plausible scenarios derived from the HTTP/1.1 protocol description [RFC2616], each with a change request that would affect a particular feature of Apache. (See Appendix 2 for an example of a scenario.) Two other experimenters acted as subjects, playing the role of a software engineer attempting to resolve the change request. Each subject worked on all four scenarios, two with the Recon3 - TraceGraph - inSight combination and two with the CodeTEST - TraceGraph - inSight combination. Assignment and sequencing of scenarios was randomized to minimize any individual bias or learning effects.

4. Tracing Long-Running Systems

A particular issue arises in tracing systems such as those produced by Motorola that take some time to start up and shut down. For feature location it is obviously more convenient to start up, then run the tests with the feature and without, and only shut down at the end of the sequence. A feature test case thus corresponds to an interval of time (Figure 2).

The broad arrow in Figure 2 represents the target system (Apache in the case study) executing in time. Trace events occur continually as different software components are executed (small arrows). Trace event collection usually has to take place in a separate process, typically called something like a "trace manager". To run a test the software engineer: (1) informs the trace manager that a test is starting; (2) sends a message to the target to start the feature; (3) receives back the response message; (4) tells the trace manager that that the feature has completed.

[pic]

Figure 2 - Tracing a Long-Running System

Ideally the final trace file should contain just the events from the interval in which the feature test was executing, with as little extraneous data as possible. However a difficulty may arise in synchronizing the trace manager and the target system. Both processes can be quite machine intensive, and if either is allowed to get ahead of or behind the other the captured events may not accurately represent a true trace of the feature.

5. Using CodeTEST

1. Initial Design

Considerable difficulties were encountered in using CodeTEST to generate the traces, mainly due to the problem of timing described in the previous section. It should be remembered that CodeTEST is intended as a test coverage and performance evaluation tool, not a feature location tool. Thus its design is not tuned for the production of traces tightly delimited in time.

[pic]

Figure 3 - CodeTEST Architecture

The architecture of CodeTEST is roughly as shown in Figure 3. The target program (Apache in this study) is compiled with CodeTEST instrumentation. The instrumentation identifies each code element with a unique integer, and creates the instrumentation data base (IDB) which maps these integers to source file, line number, etc.

As the target program executes, trace events are recorded. Approximately every 10 ms events are sent to the CodeTEST server (ctserver) through a pipe. The ctserver component can collect traces from several target processes simultaneously. Trace events are buffered until ctmanager stops the collection process and requests the data.

The CodeTEST manager (ctmanager) communicates with ctserver via a socket network connection to start tracing, to stop tracing, and to extract trace data. While ctserver must be on the target machine running the target processes, ctmanager can be running elsewhere. ctmanager can look up the identifying integers in the IDB and display the trace, coverage data, or performance data. It also has an application programmer interface (API) for the Python language [PYTH] so that the user can write a Python program to control the whole process.

Our first design looked like Figure 3 and used a Python program that provided several commands to let the user control tracing manually. The user would:

1. instruct ctmanager to start tracing

2. run a test of the target program

3. instruct ctmanager to stop tracing

4. extract the trace data from ctmanager and save it to a file.

While this worked acceptably on a small program in the adaptation phase of our study, it ran into problems as soon as we moved to programs that generate events very rapidly. With a higher volume of events, we found that the process of extracting the trace data for a single test generally took two minutes or more to complete.

The situation shown in Figure 2 thus has widely differing time scales. The target program is working at computer time scale and may be generating events at a rate of many thousands per second. The user is running tests on a human scale and comfortably takes a few seconds to a few minutes to set up and run a test. Standing between the two we have ctmanager, with delays of more than two minutes to collect events and turn them into Python/Java objects for analysis and display.

2. Second Design

Using our first design, it was very hard to get consistent trace results. One run of a test would give us many more events than the next, probably because the user's commands to stop and start tracing were taking variable times to reach ctserver. It is possible that trace data was sometimes being lost in the stop and start process. Such variations can be overcome by repeating the feature many times and applying statistical methods to analyze the trace [EDWA.05]. However the extra work required by that method is considerable so it would be desirable to come up with something better for routine software maintenance work.

[pic]

Figure 4 - Trace Capture Using a Tag Process

Our second design addressed these problems, as shown in Figure 4. Testing was automated using a simple test driver program to step through a sequence of tests, some with and some without the feature. To get around the ctmanager delays, its processing was deferred until the end of the test sequence by using tags inserted into the trace to separate the tests. The additional "tag" process shown in Figure 4 simply waited while listening on a socket. It was instrumented with CodeTEST just like the target program so that when it received a "ping" it generated a trace event.

The driver cycled through the following process:

1. send the test data for one test to the target program and wait for a response

2. wait for 5 seconds

3. send a "ping" to the tag program and wait for a reply

4. wait for 5 seconds

The ctserver component is thus receiving trace events from both the target program and the tag program so there are known events in the trace to mark the beginning of each test. The ctmanager/Python code ran only after all testing was complete and generated a single long trace, which was then broken apart at the tags to get the trace segment for each test.

3. Third Design

This second design worked and was used for the case study. However it could be improved for industrial use. One problem was that the buffer between ctserver and ctmanager has a fixed size, so the user needs to estimate in advance how many trace events will be generated. A second problem is that the additional process to generate the tags may, in some systems, impact scheduling. The biggest problem, however, is that the single long trace may, for some systems, contain far more data than is really needed. Apache generates relatively few events when not processing an HTTP message, so it did not particularly matter that we were allowing a total of 10 seconds of wait time for an Apache feature that probably completed in much less than a second. High speed telecommunications systems may generate background events at a much higher rate, and all of these would go into the trace. That would make processing even more time consuming and would also increase the risk of overflowing buffers and losing events.

So we wanted a third design that would:

1. Not require arbitrary assumptions about the size of the trace.

2. Not require an additional process

3. Delimit the trace for each test as tightly as possible, thus capturing a minimum amount of extraneous background.

[pic]

Figure 5 - Trace Capture Using a Specialized Collector

This third design, shown in Figure 5, refines the second approach by taking advantage of an important hook provided by CodeTEST. The instrumentation in the target program always calls a function called ctTag() to actually record each trace event. The user can provide his own ctTag() to handle special circumstances such as ours. We wrote a simple version of ctTag() that directs its pipe output to our specialized trace collector instead of ctserver. This trace collector also listens on a socket for a "start" or "stop" message from the trace driver. Rather than doing any fancy buffering, the trace collector simply writes to a file the CodeTEST identifying integer for all trace events received between a start and a stop and adds a tag at each stop. Trace events received at other times are just discarded[4].

The sequence for a single test is thus:

1. test driver sends "start" to the trace collector and waits for a response.

2. trace collector receives the start and starts saving events to its file.

3. test driver sends test data to the target program and waits for a response or sufficient time for the test to complete.

4. test driver sends "stop" to the trace collector.

5. trace collector receives the stop and waits briefly to make sure all events from the target program have reached it, and then adds a tag and stops saving events to its file.

The target process and the trace collector must be on the same machine to be able to use a pipe connection. However the test driver may be located elsewhere.

Processing in the Python code is now done off line after the tests have been run, so performance is less relevant. However the new simplified version ran in just a few seconds in our tests. The simplified code loads the trace file with the integer identifiers, splits the file at the tags, looks up each identifier in the IDB to formats a trace record and writes out a series of trace files ready for processing by TraceGraph.

Appendix 1 gives the trace collector and Python code for the third design. While this version was not available for the case study we used it to rerun one feature as a test. Essentially the same marker code was located, but the trace files were considerably smaller; about 1.4 MB in total for the third design compared to 2.4 MB for the second design.

We believe that this third design would probably be the most appropriate for a company wanting to do feature location using the software version of CodeTEST. If the hardware-in-circuit version is needed then it may not be possible to substitute a specialized trace collector for ctserver so something similar to our second design might be better. As always, the final choice will depend on the constraints of the hardware and software environment.

4. Performance Analysis

The parameters of the third design can be set to guarantee that all the events from a particular feature will, in fact, be captured. We assume that, like Apache and like much of the Motorola software, the target program processes messages, so that a test consists of sending one or more messages and waiting for an appropriate response. Figure 6 shows the interaction for a test.

[pic]

Figure 6 - Timing Diagram for a Test

The interaction begins when the driver sends the start message to the trace controller and gets its response. It then sends test data to the target program. As the target program executes the feature, it generates a series of events e, and each of these is written to the pipe's buffer. We may define:

τ(e) The time at which event e occurs

TD Transfer delay, the minimum time for a message to move between the test driver and the target machine.

SD Stop delay, the time the trace collector waits before processing the stop command.

Suppose event er is the response to the test driver that marks the end of feature execution and tells the driver that it can send the stop message. As can be seen from Figure 6, the trace collector stops storing events at the earliest at time

τ(er) + TD + TD + SD

The diagram of Figure 6 also shows in more detail how events from the target program reach the trace collector. When an event e occurs in a target process, ctTag() writes to the pipe, which really means that it writes to the pipe's internal buffer. Periodically, the pipe's buffer is flushed to the trace collection process.

We may additionally define the following:

BD Buffer delay, the time to append a trace record to the pipe's buffer

BF Buffer flush, the interval between buffer flushes (currently 10 ms in CodeTEST)

TPD Trace pipe delay, the maximum time for data to move from the buffer to the trace collection process

Then from Figure 6 it is clear that the time that the trace record from event e arrives at TC is, at worst:

τ(e) + BD + TPD + BF

We want to ensure that all events preceding er are at the trace collector before it stops collecting, that is, that

τ(er-1) + BD + BF + TPD < τ(er) + TD + TD + SD

which, since τ(er-1) < τ(er), will be true if

BD + BF + TPD ≤ TD + TD + SD (1)

We may make some reasonable assumptions about delay and processing times. Pipe communication should be no slower than socket communication, and will be considerably faster if the driver and the target are on different machines, so TPD ≤ TD. Also, the time to append a trace record to the pipe buffer should be no slower than socket communication, that is, BD ≤ TD. With these substitutions (1) becomes

BF ≤ SD (2)

Thus as long as the writer of the test driver program sets the stop delay to be greater than the buffer flush interval, he can be sure that he will collect all the events of the feature in the trace collection process. In CodeTEST the buffer flush is currently set at 10 ms.

6. Using Recon3

The Recon3 instrumentor and trace manager were designed with the goal of producing clear traces from a long running program, so it is not surprising that they showed none of the problems encountered with CodeTEST trace collection. However there were difficulties in handling the large base of Apache code.

The Recon3 instrumentor for C does not do a complete parse, but instead simply scans the code for surface features indicating where instrumentation should be inserted. Instrumentor flags allow the user to specify function entry/return instrumentation or decision instrumentation.

Many files in the current version of Apache make very heavy use of macros and conditional compilation which are handled by the C pre-processor. These structures confused the instrumentor, especially when it was trying to identify function entry and return points. We found that we could only use decision instrumentation (of if, switch, while, for, etc.) in the study. That does not seem to have seriously hampered feature location, since all of the features we sought were large enough to contain at least one decision that was marked by TraceGraph. However inSight organizes data by C function, so the lack of function entry and return data required extra hand work by the case study subjects. They had to locate each marked decision in the code, find which C function it was in, and then navigate to that C function in inSight.

One dramatic difference between CodeTEST and Recon3 is in performance. We ran a few small performance tests to compare the two tools, with the test driver on the same node as the instrumented Apache to minimize network time delays. Table 1 shows a representative result for a series of tests in which Apache served up 100 different web pages.

Table 1 - The Performance Impact of Instrumentation - Execution Time to Serve a Web Page

| |Uninstrumented Apache |Instrumented with CodeTEST |Instrumented with Recon3 |

|Average time per page in |1709 |2135 |173,399 |

|microseconds | | | |

|Relative Time |100% |125% |10,146% |

As can be seen from the table, while CodeTEST was not really designed for feature location, Recon3 was not really designed to be fast. The wall-clock time to run the test increased by a factor of 100! This difference would not be significant in analyzing a conventional program in a laboratory setting using small test data sets; after all, the time to serve a single page is still less than a fifth of a second. However it could obviously be very important for a system with real-time constraints where missed deadlines could modify behavior.

7. Using TraceGraph

The TraceGraph tool, part of the free Recon3 tool suite, provides an intuitive and visual way for a software engineer to compare traces and spot the marker code for a feature.

[pic]Figure 7 - TraceGraph Screen Shots

The software engineer feeds TraceGraph first the traces without the feature, then a trace with the feature. TraceGraph displays each trace as a compact column of rectangles, each only a few pixels in size (Figure 7). Each horizontal row represents a specific trace event, such as the execution of a particular block of code. The rectangle is colored in the first trace containing the event, then black afterwards[5].

Since the last column is the first trace with the feature, colored rectangles in that column are the "marker" code that was executed only when the feature was present. The software engineer can get a quick impression of how the marker code is distributed. Are there just a few markers or many? In one source file or in many? He can click on a rectangle to get more information about the type of event, the source file and line number, etc.

It proved to be easy to convert the traces from both CodeTEST and Recon3 into a format that TraceGraph could read. However one annoying problem was that TraceGraph had no way of easily exporting the information about the marker code. The case study subjects had to click on each rectangle, copy the data from the pop-up window (Figure 7), click on the next rectangle, etc. They then opened up inSight to study each bit of marker code.

The rather tedious process of clicking on rectangles and copying information from the pop-up window took a significant fraction of the time required to study each feature. While the marker code was always a small fraction of Apache, there were often dozens of rectangles to be investigated. TraceGraph clearly needs a better way of exporting a list of marker code to another tool.

8. Using inSight

As mentioned previously, inSight is part of the Klocwork static analysis tool suite. It provides a graphic environment for browsing and analyzing large software systems (Figure 8). One application of inSight is architectural modeling of such a system, in which one or more high level diagrams are prepared to illustrate the architectural relationships between the many software components.

[pic]

Figure 8 - inSight Screen Shot

Since our purpose was to understand and document a specific system feature, we decided to create an architectural model showing just those code components that are relevant for one feature. While most program comprehension studies indicate that programmers tend to combine a top-down and a bottom-up approach (e.g. [VONM.95a]), we found that the "custom diagram" facility of inSight let us use a nearly pure bottom-up method[6]. First the case study subjects created a new empty custom diagram with the right-hand panel of the display blank. The list of marker code from TraceGraph was then reduced to a list of the C functions containing the markers (since inSight works at the function level). Each of these functions was then located in the hierarchy of the left panel of inSight and dragged into the right panel. Then in the right panel inSight shows each function as a rectangle and adds arrows to show the relationships between them (Figure 8).

The case study subjects then used inSight to explore Apache's code until they felt they had a sufficient understanding of the feature. inSight allowed them to browse the code or a flowchart representation of it, to add functions, for example those that call the marker code, or to subtract functions if examination showed them to be irrelevant. Finally, the subject saved the resulting diagram and wrote a brief feature report containing the figure and an explanation of the feature (Appendix 2).

9. Feature Location Results

The case study simulated the process of feature location in a company as part of normal software change activity. Table 2 lists the four scenarios of software change that were used. (See Appendix 2 for a sample of a full scenario and the resulting feature report.). The scenarios are in the form of a problem report or change request related to a specific feature of Apache. The two case study subjects were not previously familiar with Apache code, though they were experienced in C programming and in networking. Each subject located the code for all four scenarios, using CodeTEST for two of them and Recon3 for the other two. Assignments were made at random and subjects alternated between tools to reduce bias due to learning effects.

Table 2 - Scenarios Used in the Case Study

|Scenario 1: The “OPTIONS *” request currently provides several pieces of information about the http server. We want an |

|additional piece of information to be added to the returned information. Specifically, we want the date of the next |

|scheduled shutdown to be read from a file and returned. |

|Scenario 2: We assume that a security problem has been discovered in the code used to handle the “?” syntax in a GET request.|

|We want to add security measures to the code that will provide additional checks on the URI before allowing processing to |

|continue. |

|Scenario 3: We assume that an error has been reported in the processing of certain date formats used in the |

|“If-Modified-Since:” request structure. The error occurs only in the RFC 850 form but does not occur with the other two |

|forms. We are looking for the code specific to the RFC 850 date form. |

|Scenario 4: Currently, the http server only accepts the “100-continue” parameter to the “Expect” header as specified in RFC |

|2616. We want to add the ability to service an “Expect: PRIORITY = HIGH” header as well. Location of the code responsible |

|for implementing the currently accepted header is needed in order to facilitate the addition of the new header. |

The results of the study are summarized in Table 3. Subjects were asked to record approximately how much time it took to locate the feature up to the point where they would be fairly confident that they could successfully make the code change. They were also asked to estimate how much Apache code they had to scan and how much they studied in detail to complete their analysis.

Table 3 - Feature Location Results

[pic]

While the amount of code studied varied, only in one case, scenario 2, did the two subjects come to a different understanding of the feature. In the other scenarios similar diagrams were constructed in inSight with pointers to consistent locations in the code.

The problem with scenario 2 was not really in the tools, but rather the test cases. The scenario concerns a URI that specifies a cgi script and contains the query "?" character. One subject compared cgi URI's to non-cgi URI's and thus identified a large body of code related to cgi processing. The other subject compared cgi URI's containing a "?" to cgi URI's with no "?" and found a much smaller amount of code. The second approach was much more effective. This difference shows the importance of choice of test cases in using dynamic methods of feature location. When a first set of tests identifies a lot of marker code it is probably best to go back and refine the test cases to see what additional code can be subtracted.

The amount of code studied for each feature was quite small considering that the Apache system is over 200 KLOC. The time required to locate and understand a feature was variable, with about 3 to 4 hours being typical.

10. Conclusions

The case study showed that both the tool combinations (CodeTEST plus TraceGraph plus inSight and Recon3 plus TraceGraph plus inSight) were effective in locating features in code typical of the Motorola domain. The main differences between the commercial CodeTEST instrumentor and the free Recon3 were in the much greater efficiency of CodeTEST instrumentation and its better robustness in dealing with the C preprocessor. For systems in which timing behavior is important, CodeTEST is clearly the better alternative but for other systems being tested manually the delays introduced by Recon3 may not be significant.

Table 4 - Approximate Person-Hours for the Adaptation and Trial Phases of the Case Study

|  |CodeTEST |KlocWork |Recon3 |Other |Total |

|Adaptation hours |77 |18 |8 |4 |107 |

|Trial hours |67 |0 |2 |7 |76 |

Table 4 shows the approximate time required for the technology adaptation process to get the two tool combinations working. The effort required by another company or using different tools would obviously be different since the time is strongly affected by the depth of previous experience with each tool. However an investment of around 180 person hours would seem to be reasonable to bring a new technology into a large project.

As shown in Table 3, the time required to locate and understand a feature was typically around 4 hours, which would seem to be quite fast for a software engineer dealing with an unfamiliar system of over 200 KLOC. Better tool integration could improve this time still further, since we observed informally that around half the time was spent hand transferring the TraceGraph markers into inSight. We plan to add several improvements to TraceGraph to make data export much more user friendly.

As far as we know, this is the first study to combine dynamic analysis for feature location with a high-end static analysis tool for feature understanding. The combination seems to work quite well. Dynamic analysis is a powerful tool for locating a few feature markers in the code, but it only shows what happened on the specific test cases that were run.

For a software engineer, there is obviously some danger in making a code change based only on the dynamic picture; the code to be changed may be being called from other locations in other contexts so a fix in one place might break code in another. Static analysis, using a tool such as inSight, allows the software engineer to see the context of each of these marker fragments, and also shows code relationships from all possible executions, not just those from the specific test cases.

Another advantage of a tool such as inSight is the ability to document a feature graphically. Since documentation is costly and often becomes obsolete it is frequently missing in legacy systems. One approach is progressive redocumentation, in which new documentation is written as each change is made, thus codifying knowledge about those parts of the system that change most frequently [RAJL.00]. The diagrams produced by inSight would make it easy to generate such documentation on a feature-by-feature basis.

It is interesting to speculate on what might be achieved by a tool that integrated the dynamic and static views more completely. It would be interesting to see, for example, the execution of one specific trace as an overlay on the static view, perhaps by turning on and off color highlighting of the graph. For multi-process systems such as those developed at Motorola, the trace data could also allow an overlay showing a specific process or thread.

We hope that future developers of commercial tools will give some thought to the ways that dynamic and static analysis can best work together. Meanwhile, a combination of tools seems to be a viable approach to understanding features of large software systems.

11. References

[APACHE] "The Apache Software Foundation", URL current as of June, 2005.

[BENN.02] K. H. Bennett, V. T. Rajlich, N. Wilde, "Software Evolution and the Staged Model of the Software Lifecycle", in Advances in Computers, ed. Marvin Zelkowitz, Vol. 56, pp. 1 - 54 (2002).

[EDWA.05] D. Edwards, S. Simmons, N. Wilde, "An approach to feature location in distributed systems", to appear in the Journal of Systems and Software.

[EISE.02] Eisenbarth, T., Koschke, R., Simon, D., "Incremental location of combined features for large-scale programs." In: Proceedings of the International Conference on Software Maintenance - ICSM 2002, Montreal, Canada, pp. 273–282. (2002)

[KLOC] Klocwork: Automated solutions for understanding and perfecting software, URL current as of April, 2005

[LUKO.00] Lukoit, K, Wilde, N., Stowell, S., Hennessey, T., "TraceGraph: Immediate Visual Location of Software Features." In Proceedings of the International Conference on Software Maintenance - ICSM 2000, pp. 33 - 39, October, 2000.

[METRO] Metrowerks CodeTEST Software Analysis Tools, URL current as of April, 2005.

[PYTH] What is Python? , URL current as of April, 2005.

[RAJL.00] Rajlich, Vaclav, "Incremental Documentation Using the Web", IEEE Software, Vol. 17, No. 5, September/October 2000, pp. 102 - 106.

[RECON] Recon Tools for Software Engineers, , URL current as of April, 2005.

[RFC2616] Hypertext Transfer Protocol - HTTP/1.1, , URL current as of June, 2005

[VONM.95a] Von Mayrhauser, Annelise and Vans, A. Marie, "Program Comprehension During Software Maintenance and Evolution", IEEE Computer, Vol. 28, No. 8, August, 1995, pp. 44 - 55.

[WILD.95] Wilde, N., Scully, M., "Software reconnaissance: Mapping program features to code." Journal of Software Maintenance: Research and Practice, Vol. 7, pp. 49–62 (1995)

[WILD.96] Wilde, N., Casey, C., "Early field experience with the software reconnaissance technique for program comprehension." In: Proceedings of the International Conference on Software Maintenance - ICSM-96, Monterey, California, pp. 312–318.

[WILD.98] Wilde, N., Casey, C., Vandeville, J., Trio, G., Hotz, D., "Reverse engineering of software threads: A design recovery technique for large multi-process systems." Journal of Systems and Software, vol. 43, 11–17.

[WILD.01] Wilde, N., Buckellew, M., Page, H., Rajlich, V., 2001. "A case study of feature location in unstructured legacy Fortran code." In: Proceedings of the Fifth European Conference on Software Maintenance and Reengineering - CSMR'01. IEEE Computer Society, pp. 68-76. (2001).

[WONG.99] Wong, W.E., Gokhale, S.S., Horgan, J.R., "Metrics for quantifying the disparity, concentration, and dedication between program components and features." In: Sixth IEEE International Symposium on Software Metrics, p. 189. (1999)

[WONG.05] Wong, W. Eric, and Gokhale, Swapna, "Static and dynamic distance metrics for feature-based code analysis", Journal of Systems and Software, vol. 74 (2005), pp. 283-295.

Appendix 1 - Trace Collector and Python Code for the Third Design

File libPGIFFctTag.c - Specialized version of ctTag()

#include

#include

#include

#include

#include

#define DEFAULT_PORT 3020

#define PORT_VAR_NAME "PGIF_PORT"

typedef unsigned int _amc_tag_t ;

//Globals

static int pipe_tested = 0;

int pipe_ptr = 0;

int init_ctTag_pipe(){

int pgif_port = 0;

char *port_var = getenv(PORT_VAR_NAME);

char pipe_name[255];

if( port_var )

pgif_port = atoi(port_var);

else

pgif_port = DEFAULT_PORT;

snprintf(pipe_name, 255, "/tmp/PGIF%d/fifo", pgif_port);

return open(pipe_name, O_NDELAY|O_WRONLY);

}

_amc_tag_t ctTag(_amc_tag_t tag){

//Check if event pipe has been created by the server

//If not, no logging will be done.

if(!pipe_tested){

pipe_tested = 1;

pipe_ptr = init_ctTag_pipe();

}

//For the sake of simplicity, event buffering has not been

//added when writing to the pipe.

//For cross platform testing, proper byte ordering should be delt with here.

if(pipe_ptr > 0)

write(pipe_ptr, &tag, sizeof(_amc_tag_t));

return tag;

}

File PGIFserver.c - The Trace Collector

#include

#include

#include

#include

#include

#include

#include

#include

#include

#include

#include

#define FIFOBUFFER 512

#define DROPEVENTS 1

#define LOG_EVENTS 0

#define START_CMD "START"

#define STOP_CMD "STOP"

#define MARKER_TAG 0xffffffff

#define SLEEP_TIME 10

//GLOBALS

int readFifo;

int serverSocket;

int clientSocket;

int writeFile;

unsigned int markerTag = 0xffffffff;

int dropEvents;

int outputFileEnabled;

//Socket listening thread.

//Commands, such as start and stop logging, are received here

//and the appropriate global variables are set.

void *SocketListener(void *port){

struct sockaddr client_addr;

int addrlen = sizeof(struct sockaddr);

int listen_port = *(int *)port;

int readlen;

char buffer[255];

while(1){

listen(serverSocket, 5);

clientSocket = accept(serverSocket, &client_addr, &addrlen);

if(clientSocket < 0){

perror("Server Socket connection destroyed, exiting");

close(serverSocket);

pthread_exit(NULL);

}

if(!outputFileEnabled)

writeFile = clientSocket;

while(1){

readlen = read(clientSocket, buffer, 255);

if( readlen == 0){

printf("Socket connection closed, logging stopped\n");

dropEvents = DROPEVENTS;

break;

}

if(strncasecmp(START_CMD, buffer, strlen(START_CMD)) == 0){

printf("Logging Started\n");

if(outputFileEnabled)

write(writeFile, &markerTag, sizeof(markerTag));

dropEvents = LOG_EVENTS;

continue;

}else if(strncasecmp(STOP_CMD, buffer, strlen(STOP_CMD)) == 0){

printf("Logging Stopped\n");

//Avoid losing events that may be buffered by the OS

//by adding a few extra sleeps.

usleep(SLEEP_TIME * 3);

dropEvents = DROPEVENTS;

//Ensure the output file is completely written

if(outputFileEnabled)

fsync(writeFile);

continue;

}else

printf("Unknown cmd %s readlen: %d\n",buffer, readlen);

}

}

pthread_exit(NULL);

}

//Thread that reads events from the pipe and writes them to

//either the socket or a file.

void *fifoReader(void *port){

unsigned int fifoBuffer[FIFOBUFFER];

int readlen;

while(1){

readlen = read(readFifo, fifoBuffer,

FIFOBUFFER*sizeof(unsigned int));

//If nothing is being done, sleep for a bit.

if(readlen > 24

#Event ID, unique to each function. See IDB.

eventID = event & 0x000fffff

self.setDefaults()

#Handle Function Entry.

if (eventCat == 0x70):

curFunc = self.funcDICT[eventID]

self.ID = "E"

self.line = curFunc.firstline

self.filename = self.fileDICT[curFunc.filenum].ppath+\

self.fileDICT[curFunc.filenum].pname

self.message = curFunc.short_name

#Handle Function Return.

elif (eventCat == 0x20):

curCov = self.covDICT[eventID]

self.ID = "R"

self.line = curCov.lastline

self.filename = self.fileDICT[curCov.basefilenum].ppath+\

self.fileDICT[curCov.basefilenum].pname

self.message = self.funcDICT[curCov.covnum].short_name

#Handle control flow events.

elif (eventCat == 0x40):

curCov = self.covDICT[eventID]

self.ID = "B"

self.line = curCov.firstline

self.filename = self.fileDICT[curCov.basefilenum].ppath+\

self.fileDICT[curCov.basefilenum].pname

#Unknown event category. Should never get here.

else:

self.ID = "U"

self.filename = event

self.Trace2str()

self.eventDst.write(self.traceStr + "\n")

#Handles the output formating once all variables have been set.

def Trace2str(self):

self.traceStr = "%s %d %d %d %d %d %d %d %s %d %s %d %s" %\

(self.ID, self.count, self.time, self.processID, self.threadID,\

self.line, self.value, len(self.hostname), self.hostname,\

len(self.filename), self.filename, len(self.message), self.message)

#Handles user input.

class CMDprocessor(cmd.Cmd):

msg = ""

def do_conf(self, arg):

self.conf.printConf()

self.msg = ""

def help_conf(self):

print "conf: displays the current configuration"

def do_exit(self, arg):

self.msg = ""

def do_idbpath(self, arg):

self.conf.idbPath = arg

self.msg = ""

def help_idbpath(self):

print"idbpath sets the idb to filename"

def do_parseraw(self, arg):

self.outputFileIndex = 0

self.output_count = 0

self.RawEventFile = open(arg,'r')

self.eventWriter = EventToRT3()

self.eventWriter.loadIDB(self.conf.idbPath)

while(True):

self.event = self.RawEventFile.read(4)

if not self.event:

print "Finished: wrote", self.output_count, "events to",\

self.outputFileIndex, "files."

break

self.eventLong = struct.unpack("l", self.event)

if(self.eventLong[0] == 0xFFFFFFFF):

self.fname = "%s%05d%s" % (self.conf.filenamePrefix,\

self.outputFileIndex, self.conf.filenameSuffix)

print self.fname

self.TraceFile = open(self.fname,'w')

self.eventWriter.setEventLogFile(self.TraceFile)

self.outputFileIndex += 1

continue;

self.eventWriter.parseEvents(self.eventLong)

self.output_count += 1

def postcmd(self, stop, line):

if ( len(self.msg) > 0):

print self.msg

if line == "exit":

stop = True

return 1

def emptyline(self):

return

c = CMDprocessor()

c.prompt = "PGIF2> "

c.conf = CTConfig()

if __name__ == '__main__':c.cmdloop()

Appendix 2 - Sample Scenario and Feature Report

Scenario 3 - Date

Section 3.3.1 of RFC 2616 (HTTP 1.1 Protocol) allows several

different formats for dates:

Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123

Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036

Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format

See

for details.

Suppose a bug has been encountered in processing the RFC 850 form,

but which does not seem to occurr with the other two forms. The

bug has specifically been seen when the If-Modified-Since header

(section 14.25 of RFC 2616) is used, but may also be occurring in other

headers. It may have something to do with the processing of two-digit

years, which occur in this data form but not in the others.

Where in Apache is the code that is specific to the RFC 850 date form?

Alternatively, if there is no code specific to this form, where is

the code for the If-Modified-Since header and how does it work?

Current test case for this feature:

Open socket to wahoo.cs.uwf.edu port 7348 (or 2548 for CodeTest)

Then send:

GET /~dswg/simple_page1.html HTTP/1.1

Host: wahoo.cs.uwf.edu

If-Modified-Since: Sunday, 06-Nov-94 08:49:37 GMT

Current result is:

The specified page is returned.

Feature Report

Scenario: 3 (DATE)

We suppose that an error has been reported in the processing of certain date formats used in the "If-Modified-Since:" request structure.  The errors occurs in only one format, specifically the one that described in RFC 850 (obsoleted by RFC 1036).  We hypothesize that the code may be occurring in the code using the two-digit year provided in RFC 850 compliant dates.  We need to locate the code which handles the RFC 850 requests.

By, date:

Dennis Edwards, 03/31/2005

Times:

|(1)Task |(2)No of Software Engs |(3)Elapsed Time (e.g. 3.2 |(4)Person-Hours |

| | |hours) |(2) x (3) |

|Running tests and collecting traces |1 |0.75 |0.75 |

|Analyzing traces and studying the |1 |0.25 |0.25 |

|feature | | | |

Tools used:

RECON instrumentor, TraceGraph, and KlocWork

Test cases used:

I ran four test cases using the driver program in ~dswg/StudyDocuments/Dennis/Scenario3/Driver/driver.c as follows.

1. GET ... If-Modified-Since: Fri, 11 Mar 2005 10:24:42 GMT

2. GET ... If-Modified-Since: Fri, 11 Mar 2005 10:24:42 GMT

3. GET ... If-Modified-Since: Friday, 11-Mar-05 10:24:42 GMT

4. GET ... If-Modified-Since: Friday, 11-Mar-05 10:24:42 GMT

The execution created five partial trace files which were stored in the ~dswg/StudyDocuments/Dennis/Scenario3/Traces

directory. 

5. tr00000.r3t : Fri, 11 Mar 2005

6. tr00001.r3t : Fri, 11 MAR 2005

7. tr00002.r3t : Friday, 11-Mar-05

8. tr00003.r3t : Friday, 11-Mar-05

9. tr00004.r3t : termination code after last test case

Marker code identified:

6 decisions in 2 functions in a single source file

Summary description of the feature and the intended change:

As shown in the diagram, the default_handler() calls ap_meets_conditions() to determine if the file meets the conditions specified.  That function in turn calls apr_date_parse_http() which calls apr_date_checkmask() to obtain the answer.  The date parameter is parsed in the apr_date_parse_http() function so our investigation should begin at that point.  KlocWork identified apr_date_parse_rfc() as the only other function which calls apr_date_checkmask().  While the test cases didn't identify the apr_date_parse_rfc() function as being used in the feature, it should be examined before any changes are made that could alter the functionality of apr_date_checkmask().

Klocwork diagram(s):

[pic]

Code visited:

• Estimate of LOC scanned: 200

• Estimate of LOC studied in some detail: 100

Notes:

• This function didn't use any utility functions so the simple test cases were sufficient.

• A small number of functions were identified as important and the identification turned out to be accurate.

• This would make a good example for demonstration purposes.

• It is interesting to note that one decision was identified in both of the last two test cases, but with different truth values.  In the third test case it was recorded as a true decision, and in the forth test case it was recorded as a false decision.

-----------------------

[1] Support for this study was provided in part by Motorola, Inc. through the SoftwarFGHI‚¼½¾¿ÎÜÝÞßñò‘ ¨





³

´

íàÙÒÁ¯ ¯–Á‡¯–ƒuq`N`@`@`@hG4³CJOJ[2]QJ[3]^J[4]aJ#huøhuø6?CJOJ[5]QJ[6]^J[7]aJ huøhuøCJOJ[8]QJ[9]^J[10]aJhK- ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download