Software Project Cost Estimating



|uNIVERSITY OF aLASKA AT aNCHORAGE |

|Software Project Cost Estimating |

| |

| |

|Russell Frith |

|4/12/2011 |

| |

Abstract

Software engineering cost estimation is the process of predicting the effort required to develop a software system. Cost estimation techniques involve distinctive steps, tools, algorithms and assumptions. Many estimation models have been developed since the 1980’s due to the dyanmic nature of software engineering practices. Despite the evolution of new cost estimation techniques, fundamental economic principles underlie the overall structure of the software engineering life cycle, and its primary refinements of prototyping, incremental development, and advancement. This paper provides a general overview of software cost estimation methods, including recent advances in the field. Many of the models rely on a software project size estimate as input and this paper provides details for common size metrics. The primary economic driver of the software life-cycle structure is the significantly increasing cost of making software changes or fixing software problems as a function of a development phase in which the change or fix is made. Software engineering models are classified into two major categories: algorithmic and non-algorithmic. Each has its own strengths and weaknesses with regard to implementing modifications to software projects. A key factor in selecting a cost estimation model is the accuracy of its estimates, which can be very problematical.

1. Introduction

In recent years, software has become the most expensive component of computer system projects. The cost of software development is mostly sourced in human efforts, and most estimation efforts focus on this aspect and give estimates in terms of person-months. If one considers economics as the study of how people make decisions in resource-limited situations, then economics category of macroeconomics is the study of how people make decisions in resource-limited situations on a national or global scale. Macroeconomic decisions are influenced by tax rates, interest rates, foreign policy, and trade policy. Conversely, microeconomics is the study of how people make decisions in resource-limited situations on a more personal scale and it treats decisions that individuals and organizations make on such issues as how much insurance to buy, which word software development systems to procure, or what prices to charge for products and services.

Software engineering is an exercise in microeconomics in that it deals with limited resources. There is never enough time or money to encompass all the essential features software vendors would like to put into their products. Even with cheap hardware, storage, memory, and networks, software projects must always operate within a world of limited computing and network resources. Subsequently, accurate software cost estimates are critical to both developers and customers. Those estimates can be used for generating requests for proposals, contract negotiations, scheduling, monitoring, and control. Understimating software engineering costs could result in management approving proposed systems that potentially exceed budget allocations, or underdeveloped functions with poor quality, or a failure to complete a project on time. Conversely, overestimating costs may result in too many resources committed to a project, or, during contract bidding, result in losing a contract and loss of jobs.

Accurate cost estimation is important because:

• It can help to classify and prioritize development projects with respect to an overall business plan,

• It can be used to assess the impact of changes and support replanning,

• It can be used to determine what resources to commit to the project and how well those resources will be used,

• Projects can be easier to manage and control when resources are better matched to real needs, and

• Customers expect actual development costs to be in line with estimated costs.

Three fundamental estimates typically comprise a software cost estimate and these are effort in person-months, project duration, and cost. Most cost estimation models attempt to generate an effort estimate which is then converted into a project duration time-line and cost. The relation between effort and cost may be non-linear, although effort is measured in person-months of programmers, analysts, and project managers and effort estimates can be converted to a dollar cost figure by calculating an average salary per unit time of the staff involved and multiplying that number by the estimated effort required.

In constructing a software cost engineering estimate, three basic questions arise:

• Which software cost estimation model should be used?

• Which software size measurement should be used – lines of code (LOC), function points (FP), or feature points?

• What is a good estimate?

A widely used practice of cost estimation method is that of using expert judgment. Using this technique, project managers rely on experience and prevailing industry norms as a basis to develop cost estimates. Basing estimates on expert judgement can be somewhat error prone however:

• The approach is not repeatable and the means of deriving an estimate is subjective.

• The pool of experienced estimators of new software projects is very small.

• In general, the relationship between cost and system size is not linear. Costs tend to increase exponentially with size, which subsequently confines expert judgment estimates to those new projects with anticipated sizes of past projects.

• Budget alterations by management aimed at avoiding cost overruns make experience and data from previous projects questionable.

There exist alternatives to expert judgements, some theoretical and not very useful, others having more pragmatic value and they are presented in the software engineering cost estimation section.

In the last four decades, many quantitative software cost estimation models have been developed and they range from empirical models such as Boehm’s COCOMO models [4] to analytical models such as those in [7, 22, 23]. An empirical model uses data from previous projects to evaluate the current project and derives the basic formulae from analysis of the particular database available. Alternative analytical models use formulae based on global assumptions, such as the rate at which developers solve problems and the number of problems available.

A well-constructed software cost estimate should have the following properties [24]:

• It is conceived and supported by the project manager and the development team.

• It is accepted by all stakeholders as realizable.

• It is based on a well-defined software cost model with a credible basis.

• It is based on a database of relevant project experience (similar processes, similar technologies, similar environments, similar people and similar requirements).

• It is defined in enough detail so that its key risk areas are understood and the probability of success is objectively assessed.

Hindrances to developing a reliable software engineering cost estimate include the following:

• Lack of an historical database of cost measurement,

• Software development involving many interrelated factors, which affect development effort and productivity, and which relationships are not well understood,

• Lack of trained estimators with the necessary expertise, and

• Little penalty is often associated with a poor estimate.

2. Process of Software Engineering Estimation

Throughout the software life cycle, there are many decision situations involving limited resources in which software engineering techniques provide useful assistance. See Figure II in the appendix for elements of a computer programming project cycle. To provide a feel for the nature of these economic decision issues, an example is given below for each of the major phases in the software life cycle. In addition, refer to Figure III in the appendix for the loopback nature of computer programming process steps.

• Feasibility Phase: How much should one invest in information system analyses (user questionnaires and interviews, current-system analysis, workload characterizations, simulations, scenarios, prototypes) in order to obtain convergence on an appropriate definition and concept of operation for the system to be implemented?

• Plans and Requirements Phase: How rigorously should requirements be specified? How much should be invested in requirements validation activities (automated completeness, consistency, traceability checks, analytic models, simulations, prototypes) before proceeding to design and develop a software system?

• Product Design Phase: Should developers organize software to make it possible to use a complex piece of existing software which generally but not completely meets requirements?

• Programming Phase: Given a choice between three data storage and retrieval schemes which are primarily execution time-efficient, storage-efficient, and easy-to-modify, respectively; which of these should be implemented?

• Integration and Test Phase: How much testing and formal verification should be performed on a product before releasing it to users?

• Maintenance Phase: Given an extensive list of suggested product improvements, which ones should be implemented first?

• Phaseout: Given an aging, hard-to-modify software product, should it be replaced with a new product, should it be restructured, or should it be left alone?

Software cost engineering estimation typically involves a top-down planning approach in which the cost estimate is used to derive a project plan. Typical steps in a planning process include:

1. The project manager develops a characterization of the overall functionality, size, process, environment, people, and quality required for the project.

2. A macro-level estimate of the total effort and schedule is developed using a software cost estimation model.

3. The project manager partitions the effort estimate into a top-level work breakdown structure. In addition, the schedule is partitioned into major milestone dates and a staffing profile is configured.

The actual cost estimation process involves seven steps [4]:

1. establish cost-estimating objectives;

2. generate a project plan for required data and resources;

3. pin down software requirements;

4. work out as much detail about the software system as feasible;

5. use several independent cost estimation techniques to capitalize on their combined strengths;

6. compare different estimates and iterate the estimation process; and

7. once the project has started, monitor its actual cost and progress, and feedback results to project management.

Regardless of which estimation model is selected, consumers of the model must pay attention to the following to get the best results:

• Since some models generate effort estimates for the full software life-cycle and others do not include effort for the requirements stage, coverage of the estimate is essential.

• Model calibration and assumptions should be decided beforehand.

• Sensitivity analysis of the estimates to different model parameters should be calculated.

The microeconomics field provides a number of techniques for dealing with software life-cycle decision issues such as the ones mentioned early in this section. Standard optimization techniques can be used when one can find a single quantity such as rupees or dollars to serve as a “universal solvent” into which all decision variables can be converted. Or, if nonmonetary objectives can be expressed as constraints (system availability must be 98%, throughput must be 150 transactions per second), then standard constrained optimization techniques can be used. If cash flows occur at different times, then present-value techniques can be used to normalize them to a common point in time.

Inherent in the process of software engineering estimation is the utilization of software engineering economics analysis techniques. One such technique compares cost and benefits. An example involves the provisioning of a cell phone service in which there are two options.

• Option A: Accept an available operating system that requires $80K in software costs, but will achieve a peak performance of 120 transactions per second using five $10K minicomputer processors, because of high multiprocessor overhead factors.

• Option B: Build a new operating system that would be more efficient and would support a higher peak throughput, but would require $180 in software costs.

In general, software engineering decision problems are even more complex as Options A and B and will have several important criteria on which they differ such as robustness, ease of tuning, ease of change, functional capability, and so on. If these criteria are quantifiable, then some type of figure of merit can be defined to support a comparative analysis of the preference of one option over another. If some of the criteria are unquantifiable (user goodwill, programmer morale, etc.), then some techniques for comparing unquantifiable criteria need to be used.

In software engineering, decision issues are generally complex and involve analyzing risk, uncertainty, and the value of information. The main economic analysis techniques available to resolve complex decisions include the following:

1. Techniques for decision making under complete uncertainty, such as the maximax rule, the maximin rule and the Laplace rule [19]. These techniques are generally inadequate for practical software engineering decisions.

2. Expected-value techniques, in which one estimates the probabilities of occurrence of each outcome; i.e., successful development of a new operating system, and complete the expected payoff of each option: EV = Prob(success)*Payoff(successful OS) + Prob(failure) *Payoff(unsuccessful OS). These techniques are better than decision making under complete uncertainty, but they still involve a great deal of risk if the Prob(failure) is considerably higher than the estimate of it.

3. Techniques in which one reduces uncertainty by buying information. For example, prototyping is a way of buying information to reduce uncertainty about the likely success or failure of a multiprocessor operating system; by developing a rapid prototype of its high-risk elements, one can get a clearer picture of the likelihood of successfully developing the full operating system.

Information-buying often tends to be the most valuable aid for software engineering decisions. The question of how much information-buying is enough can be answered via statistical decision theoretic techniques using Bayes’ Law, which provides calculations for the expected payoff from a software project as a function of the level of investment in a prototype. In practice, the use of Bayes’ Law involves the estimation of a number of conditional probabilities which are not easy to estimate accurately. However, the Bayes’ Law approach can be translated into a number of value-of-information guidelines, or conditions under which it makes good sense to decide on investing in more information before committing to a particular course of action.

Condition 1: There exist attractive alternatives which payoff varies greatly, depending on some critical states of nature. If not, engineers can commit themselves to one of the attractive alternatives with no risk of significant loss.

Condition 2: The critical states of nature have an appreciable probability of occurring. If not, engineers can again commit without major risk. For situations with extremely high variations in payoff, the appreciable probability level is lower than in situations with smaller variations in payoff.

Condition 3: The investigations have a high probability of accurately identifying the occurrence of the critical states of nature. If not, the investigations will not do much to reduce the risk of loss incurred by making the wrong decision.

Condition 4: The required cost and schedule of the investigations do not overly curtail their net value. It does one little good to obtain results which cost more than those results can save for us, or which arrive too late to help make a decision.

Condition 5: There exist significant side benefits derived from performing the investigations. Again, one may be able to justify an investigation solely on the basis of its value in training, team-building, customer relations, or design validation.

3. Software Engineering Project Sizing

During the 1950’s and the 1960’s, relatively little progress was made in software cost estimation, while the frequency and magnitude of software cost overruns was becoming critical to many large systems employing computers. In 1964, the U.S. Air Force contracted with System Development Corporation for a landmark project in software cost estimation. The project collected 104 attributes of 169 software projects and treated them to extensive statistical analysis. One result was the 1965 SDC cost model which was the best possible statistical 13-parameter linear estimation model for the sample data:

MM (man-months) = -33.63 + 9.15(Lack of Requirements) + 10.73(Stability of Design) + 0.51(Percent Math Instructions) + 0.46*(Percent Storage/Retrieval Instructions) + 0.4(Number of Subprograms) + 7.28(Programming Language) – 21.45(Business Application) + 13.53(Stand-Alone Program) + 12.35(First Program on Computer) + 58.82(Concurrent Hardware Development) + 30.61(Random Access Device Used) + 29.55(Difference Host, Target Hardware) + 0.54(Number of Personnel Trips) -25.2(Developed by Military Organization) [5]. When applied to its database of 169 projects, this model produced a mean estimate of 40 MM and a standard deviation of 62MM; not a very accurate predictor. The model is also counterintuitive; a project will all zero values for variables is estimated at -33 MM; changing language from a higher order language to assembly adds 7 MM, independent of project size. One can conclude that there were too many nonlinear aspects of software development for a linear cost-estimation model to work.

Today, software size is the most important factor that affects software cost. There exist five fundamental software size metrics used in practice. Two of the most commonly used metrics include the “Lines of Code” and “Function Point” metrics. The Lines of Code metric is the number of lines of delivered source code for the software and it is known as LOC [9], and is programming language dependent. Most models relate this measurement to the software cost, but the exact LOC can only be obtained after the project has completed. Thus, estimating project costs becomes substantially more difficult.

One method for estimating code size is to use experts’ judgment together with a technique called PERT (Program Evaluation and Review Technique)[2]. The model is based upon three possible code-sizes: S1, the lowest possible size; Sh the highest possible size; and Sm, the most likely size. An esitmate of the code-size S may be computed as [pic].This formula is valid for modular code components and can be summed with other code components size values.

An alternative PERT proposed by Halstead [11] uses code length and volume metrics. Code length is used to measure sorce code program length and is defined as [pic]where N1 is the total number of operator occurances and N2 is the total number of operand occurances. Volume corresponds to the amount of storage space and is defined as [pic]where n1 is the number of distinct operators and n2 is the number of distinct operands that appear in the program. Counter-alternatives to Halstead may be found in [12, 25].

As mentioned previously, software size measurement may also be based on function points. This is a measurement based on the functionality of the program and was introduced by Albrecht [1]. The total number of function points depends on the counts of distinct logic types in the following five classes:

1. User-input types: data or control user-input types

2. User-output types: output data types to the user that leaves the system

3. Inquiry types: interactive inputs requiring a response

4. Internal file types: files that are used and shared inside the system

5. External file types: files that are passed or shared between the system and other systems.

Each of these types is individually assigned one of three complexity levels of {1 = simple, 2 = medium, 3 = complex} and given a weighting value that varies from three for simple input to 15 for complex internal files. The unadjusted function-point counts (UFC) is given as [pic]where Nij and Wij are respectively the number and weight types of class I and complexity j. For instance, if the raw function-point counts of a project are two simple inputs (Wij = 3), two complex outputs (Wij = 7) and one complex internal file (Wij = 15) then the UFC value is computes as 2*3 + 2*7 + 15 = 35. This initial function- point count is either directly used for cost estimation or is further modified by factors which values depend on the overall complexity of the project. The accounting consists of the degrees of distributed processing, the amount of resuse, performance requirements, and so on. The advantage of the function-point measurement is that it can be obtained based on the system requirement specification in the early stage of software development.

UFC may also be used for code size estimation using a linear formula:

LOC = a*UFC + b. The parameters a and b can be obtained using linear regression and previously completed project data. The latest Function Point Counting Practices Manual is maintained by the IFPUG (International Function Point Users Group) in .

An extension of the function point software measurement technique is the feature point measurement technique. Feature point extends function points to include algorithms as a new class [16]. An algorithm is defined as the set of rules which must be completely expressed to solve a significant computational problem. For example, a square root routine can be considered as an algorithm. Each algorithm used is given a weight ranging from one (elementary) to ten (advanced) and the feature point is the weighted sum of the algorithms plus the function points. This measurement is especially useful for systems with few inputs/outputs and high algorithmic complexity, such as mathematical software, discrete simulations, and military applications.

Real-time applications development cost estimation is based on full function point (FFP) analysis. It takes into account special hardware control of such applications. FFP introduces two new control data functions types and four new control transactional function types which are described in [28].

A final consideration for software project size estimation is the use of object points. While feature point and FFP extend function point estimates, object points measure sizes from a different dimension. These measurements are based on the number and complexity of the following objects: screens, reports, and 3GL components. Each of these objects is counted and given a weight ranging from one (simple screen) to ten (3GL component) and object points are the weighted sum of all these objects.

4. Software Engineering Cost Estimation

There are two major types of cost estimation methods: algorithmic and non-algorithmic. The former vary widely in mathematical sophistication and are based on simple arithmetic formulae using summary statistics [8]. Others are based on regression models [30] and differential equations [23]. To improve the accuracy of algorithmic models, there is a need to adjust or calibrate the model to specific circumstances, but even this added work can still lead to mixed accuracy. Table I in the appendix lists strengths and weaknesses of software cost-estimation methods. The first part of this comparative discussion will treat non-algorithmic costing.

4.1 Non-algorithmic Methods

Analogy Costing: This method involves reasoning by analogy with one or more completed projects to relate their actual costs to an estimate of the cost of a similar new project. This protocol may be used at either the total project level or at the subsystem level. The total project level has the advantage that all cost components of the system will be considered while the subsystem level has the advantage of providing a more detailed assessment of the similarities and differences between the new project and the completed project. Success factors using this technique are outlined in [26].

Expert Judgment: This method involves consulting one or more experts, perhaps with the aid of an expert-consensus mechanism. Experts provide estimates using their own methods and experience. The PERT technique can be used to resolve inconsistencies in estimates. The Delphi technique works as follows:

1. The coordinator presents each expert with a specification and a form to record estimates.

2. Each expert fills in the form individually and is allowed to ask the coordinator questions.

3. The coordinator prepares a summary of all estimates from the experts on a form requesting another iteration of the experts’ estimates and the rationale for the estimates.

4. Steps 2-3 may be repeated several times.

A modification of the Delphi technique proposed by Boehm and Fahquhar [4] has proven more effective. Before the estimation, a group meeting involving the coordinator and the experts is arranged to discuss the estimation issues. In step three the experts do not offer any rationale for their estimates. Instead, after each round of estimation, the coordinator calls a meeting to have experts reconcile their estimates.

Parkinson: In the Parkinson principle, work expands to fill the available volume. This principle is used to equate the cost estimate to available resources [21]. For instance, if the software has to be delivered in 12 months and five people are available, then the effort is estimated to be 60 person-months. This method is hazardous to use in that it has the potential to provide unrealistic estimates and it does not promote good software engineering practices.

Price-to-win: Using this method, the cost estimate is equated to the price believed necessary to win the job or to the schedule believed necessary to be first in the market with a new product. The estimate is based on the customer’s budget instead of the software functionality. For example, if a reasonable estimation for a project costs 100 person-months but the customer can only afford 60 person-months, it is common that the estimator is asked to modify the estimation to fit 60 person-months’ effort in order to win the project. A very poor practice indeed, but all too-often used.

Bottom-up: Each component of the software job is separately estimated, and the results aggregated to produce an estimate for the overall job. An initial design must be in place that indicates how the system is decomposed into different components.

Top-down: An overall cost estimate for the project is derived from global properties of the software product. The total cost is then split among the various components.

The main conclusions one can draw from Table I are the following:

• None of the alternatives is better than the others from all aspects.

• The Parkinson and price-to-win methods are unacceptable and do not produce satisfactory cost estimates.

• The strengths and weaknesses of the other techniques are complimentary.

• In practice, a combination of the viable techniques should be employed, their results compared, and iterations on them performed when they differ.

4.2 Algorithmic Methods

Algorithmic methods are based on mathematical models that produce cost estimates as a function of a number of variables, which are considered to be the major cost factors. Any algorithmic model has the form: [pic] where [pic] denotes the set of cost factors. The existing algorithmic methods differ in two aspects: the selection of cost factors and the form of the funciton f.

A clearer picture of the fundamental limitations of software cost estimation techniques is emerging. Despite the seven approaches to software cost estimation, there is no definitive way one can expect a particular technique to compensate for a lack of definition or understanding of the software job to be done. Until a software specification is fully defined, a software cost estimation technique represents a range of software development costs. Figure I in the appendix shows a limitation in cost estimation technology. In the figure, the accuracy of the cost estimates is shown as a function of the software life-cycle phase. The horizontal line labeled “x” is a convergent estimate for the cost of a human-machine interface for a hypothetical software project. The level of cost uncertainty is the y-axis and its range is between ¼ and four times the convergent cost. This range is somewhat subjective and is intended to represent 80% confidence limits, that is, “within a factor of four on either side, 80% of the time.” At the feasibility phase of the human-machine interface component, the software engineering estimator does not know what classes of people (clerks, computer specialists, middle managers, etc.) or what classes of data (raw or pre-edited, numerical or text, digital or analog) the system will have to support. Until those uncertainties are clarified, a factor of four in either direction serves as a best-guess for a range of estimates.

The uncertainty envelope contracts once the feasibility phase is completed and the operational concept is settled. At this stage, the range of estimates constricts to a factor of two on either side of the convergent estimate. Outstanding issues include the specific types of query to be supported or the specific functions to be performed within a potential client-server enterprise application. Those issues will be resolved by the time a software requirements specification has been developed, at which point the estimate of software costs are ranged within a factor of 1.5 in either direction of the convergent estimate.

Once the product design specification is completed and validated, design issues such as the internal data structure of the software product and the specific techniques for handling network input/output between the client computer and the web server will have been resolved. At this point the software estimate should be accurate to within a factor of 1.2 of the convergent estimate. The remaining discrepencies are caused by sources of uncertainty in specific algorithms to be used for database queries, internet error handling, network failure recovery, and so on. Those issues will be resolved at the detailed desing phase, but there will still be some residual uncertainty around ten percent and that is based on how well programmers understand the specifications to which they are to code, or possibly, personnel turnover during development and test phases.

4.3 Algorithmic Model Cost Estimation Formulation

4.3.1 Cost Factors

So far a substantial part of this discussion has treated software cost estimation in terms of software size. There exist, however, many additional cost factors that are worth mentioning. Table II in the appendix summarizes a set of cost factors proposed by Boehm et al in the COCOMO II model for software engineering cost estimation [5]. There are four types of cost factors. The first set of factors includes product factors and these are placeholders for required reliability, product complexity, database size, required reusability, and documentation matched to life-cycle needs. The second set includes computer factors which include execution time constraints, storage constraints, computer turnaround constraints, and platform volatility. Personnel factors consist of the capabilities of analysts, application experience, programming capabilities, platform experience, language and tool experience, and personnel continuity. The final set of factors includes project factors, the set of which is made up of multisite development, software tools used, and development schedule. Many of these factors are hard to quantify and in many models, some are combined, others are omitted. Furthermore, some factors take on discrete forms which result in an estimation function taking on piece-wise form.

4.3.2 Linear Models

Linear models have the form: [pic] where the coefficients are chosen to best fit the completed project data, as in Nelson’s work [19]. Needless to say, software development is mostly comprised of nonlinear interactions so this model is less than optimal.

4.3.3 Multiplicative Models

Multiplicative models have the form: [pic]. This model was used by Walston-Felix [30] with each xi taking on three possible values of -1, 0, and +1. These models have proven to be too restrictive to incorporate many cost factor values.

4.3.4 Power Function Models

Power function models have the general form: [pic]where S is the code size and a and b are functions of other cost factors. This class of modeling contains two popular algorithmic models.

COCOMO (Constructive Cost Model)

This family of models was proposed by Boehm [3, 4] and they have been widely accepted in practice. In these models, code-size S is given in thousand LOC (KLOC) and effort is in person-months. The primary motivation for the COCOMO model has been to help people understand the cost consequences of the decisions they will make in commissioning, developing, and supporting a software product. COCOMO is a hierarchy of three increasingly detailed models which range from a single macro-estimation scaling model as a function of product size to micro-estimation model with a three-level breakdown structure and a set of phase-sensitive multipliers for each cost driver attribute. COCOMO applies to three classes of software projects:

1. Organic projects: “small” teams with “good” experience working with “less than rigid” requirements,

2. Semi-detached projects: “medium” teams with mixed experience working with a mix of rigid and less than rigid requirements, and

3. Embedded projects: developed within a set of “tight” constraints such as hardware, software, operational demands, and so on.

A) Basic COCOMO: This model uses three sets of {a, b} depending on the complexity of the software. Typical parameter values include the following:

(1) for simple, well-understood applications, a = 2.4, b = 1.05;

(2) for more complex systems, a = 3.0, b = 1.15;

(3) for embedded systems, a = 3.6, b= 1.20.

The basic COCOMO model is simple and it excludes using many cost factors.

B) Intermediate COCOMO and Detailed COCOMO: In intermediate COCOMO, a nominal effort estimation is obtained using the power function with three sets of {a, b}, with coefficient a being slightly different from that of basic COCOMO:

(1) for simple, well-understood applications, a = 3.2, b = 1.05;

(2) for more complex systems, a = 3.0, b = 1.15;

(3) for embedded systems, a = 2.8, b = 1.2.

Next, fifteen cost factors with values ranging from 0.7 to 1.66 drawn from Table II are determined [4]. The overall impact factor M is obtained as the product of all individual factors, and the estimate is obtained by multiplying M to the nominal estimate. While both basic and intermediate COCOMOs estimate software costs at the system level, the detailed COCOMO works on each sub-system separately and has an obvious advantage for large systems that contain non-homogeneous subsystems.

Intermediate COCOMO estimates the cost of a proposed software product in the following way.

1) A nominal development effort is estimated as a function of the product’s size in delivered source instructions in thousands (KDSI) and the product’s development mode, which is described in Table III.

2) A set of effort multipliers are determined from the product’s ratings on a set of 15 cost driver attributes.

3) The estimated development effort is obtained by multiplying the nominal effort estimate by all of the product’s effort multipliers.

4) Additional factors can be used to determine dollar costs, development schedules, phase and activity distributions, computer costs, annual maintenance costs, and other elements from the development effort estimate.

C) COCOMO II: In this contemporary modification, the exponent b in the earlier COCOMO models changes according to the following cost factors: precedent, development flexibility, risk, team cohesion, and process maturity. Newly added cost factors are thrown in for additional measure.

Putnam’s Model and SLIM

The Putnam model is based on Norden/Raleigh manpower distribution and his findings in analyzing many completed projects [23]. The software equation forms the main part of the model and is: [pic]where [pic]is the software delivery time; E is the environment factor that reflects the development capability, which can be derived from historical data using the software equation. The size S is in LOC and the Effort is in person-years. Another important relation found by Putnam is [pic] where [pic]is a parameter called manpower build-up which ranges from eight; i.e., entirely new software with many interfaces, to 27 for rebuilt software. Combining the above equation with the software equation, we obtain the power function form: [pic]and [pic]. SLIM is a software tool based on this model for cost estimation and manpower scheduling ().

4.3.5 Model Calibration Using Linear Regression

A direct application of the above models does not take local circumstances into consideration. One can adjust the cost factors using local data and a linear regression method. Let the cost estimation power formula be: [pic]. Taking the logarithm of both sides and transforming the result into a linear equation, one gets: [pic]where Y = log(Effort), A = log(a) and X = log(S). By applying the least square method to a set of previous project data [pic] one obtains parameters b and A for the power function.

4.3.6 Discrete Models

Discrete models have a tabular form which usually relates effort, duration, difficulty, and other cost factors. This class of models contains models from [2, 3, 31]. The models have gained recent popularity due to their simplicity.

4.3.7 Price-S Model

The Price-S is a proprietary software cost estimation model developed and maintained by RCA [20]. It is a macro cost-estimation model developed for embedded system applications which formulation has evolved from subjective complexity factors to equations based on the number of computers/servers, personnel, and project attributes that modulate complexity. The program provides a wide range of useful outputs such as activity distribution analysis and cost-schedule-expected progress forecasts.

In the 1980’s, PRICE-S added a software life-cycle support cost estimation capability called PRICE SL and it involved the definition of three categories of support activities.

• Growth: The estimator specifies the amount of code to be added to the product. PRICE SL then uses its standard techniques to estimate the resulting life-cycle-effort distribution.

• Enhancement: PRICE SL estimates the fraction of the existing product which will be modified.

• Maintenance: The estimator provides a parameter indicating the quality level of the developed code. PRICE SL uses this to estimate the effort required to eliminate remaining errors.

4.3.8 The Doty Model

This model is the result of extensive data analysis collected by the Air Force in the 1960’s and 1970’s. A number of models of similar form were developed for different application areas. For a general application,

[pic]

The effort multipliers are shown in Table V. This model has numerical stability issues because it exhibits a discontinuity at KDSI = 10, and produces widely varying estimates via the f factors. For instance, answering “yes” to “first software developed on CPU” adds 92% to the estimated cost.

4.4 Measurement of Model Performance

One common error measure for software engineering cost estimation is the Mean Absolute Relative Error (MARE). The formula is: [pic]where the ith estimate is the estimated effort from the model, the ith actual is the actual effort, and n is the number of projects. To establish whether models are biased, the Mean Relative Error (MRE) can be used. Its formulation is: [pic]. A large positive MRE suggests that the model overestimates the effort, while a large negative value indicates the reverse.

The following criteria can be used for evaluating cost estimation models [4]:

1. Definition – Has the model clearly defined the costs it is estimating and the costs it is excluding?

2. Fidelity – Are the estimates close to the actual costs expended on the projects?

3. Objectivity – Does the model avoid allocating most of the software cost variance to poorly calibrated subjective factors such as complexity? Is it hard to adjust the model to obtain any result the user wants?

4. Constructiveness – Can a user tell why the model gives the estimates it does? Does it help the user understand the software job to be done?

5. Detail – Does the model easily accommodate the estimation of a software system consisting of a number of subsystems and units? Does it give accurate phase and activity breakdowns?

6. Stability – Do small differences in inputs produce small differences in output cost estimates?

7. Scope – Does the model cover the class of software projects whose costs the user needs to estimate?

8. Ease of Use – Are the model inputs and options easy to understand and specify?

9. Prospectiveness – Does the model avoid the use of information that will not be well known until the project is complete?

10. Parsimony – Does the model avoid the use of highly redundant factors, or factors which make no appreciable contribution to the results?

5. Performance of Estimation Models

Many studies have attempted to evaluate cost estimation models and the results are discouraging in that many cost estimation techniques were found to be inaccurate. Some studies found in a literature search include:

1. Kemerer performed an empirical validation of four algorithmic models (SLIM, COCOMO, Estimacs, and FPA)[17] where no recalibration of models was performed on the project data, which was different from that used for model development. Most models showed a strong over- estimation bias and large estimation errors, ranging from a MARE of 57% to 800%.

2. Vicinanza, Mukhopadhyay, and Prietula used experts to estimate project effort using Kemerer’s data set without formal algorithmic techniques and found the results outperformed the models in the original study [29]. The MARE, however, ranged from 32% to 1107%.

3. Ferens and Gurner evaluated three models (SPANS, Checkpoint, and COSTAR) using 22 projects from Albrecht’s database and 14 projects from Kemerer’s data set. The estimation errors were found to be large, with MAREs ranging from 46% for the Checkpoint model to 105% for the COSTAR model.

4. Jeffery and Low investigated the need for model calibration at both the industry and organization levels [15]. Without model calibration, their estimation error findings were large, with MAREs ranging from 43% to 105%, They later compared the SPQR/20 model to FPA using data from 64 projects from a single organization [15]. Their models were recalibrated to the local environment to remove estimation biases. Some improvements in their estimates show MARE observations of 12%, thus reflecting the benefits of model calibration.

5. Sheppard and Schofield found that estimating by analogy outperformed estimation based on statistically derived algorithms [26].

6. Heemstra surveyed 364 organizations and found that only 51 used models to estimate effort and that model users made no better estimates than non-model users [12.5]. Also, use of estimation models was no better than expert judgment [12.5].

7. A survey of software development within JPL found that only 7% if estimators use algorithmic models as a primary approach of estimation [13].

6. New Approaches

Software engineering cost estimation remains a complex problem and it continues to attract considerable research and attention. Recently, Finnie and Wittig applied artificial neural networks (ANN) and case-based reasoning (CBR) to estimation of effort [10] on a data set from the Australian Software Metrics Association. ANN was able to estimate development effort within 25% of the actual effort in more than 75% of the projects, and with a MARE of less than 25%. The results from CBR were less encouraging. In 73% of the cases, the estimates were within 50% of the actual effort, and for 53% of the cases, the estimates were within 25% of the actual.

Srinivasan and Fisher used machine learning approaches based on regression trees and neural networks to estimate costs [27]. The learning approaches were found to be competitive with SLIM, COCOMO, and function points, compared to the previous study by Kemerer [17]. A primary advantage of learning systems is that they are adaptable and nonparametric.

7. Conclusion

As of today, almost no software engineering cost estimation model can predict the cost of software development with a high degree of accuracy. This state of the practice is created because of the following reasons:

(1) there are a large number of interrelated factors that influence the software development process of a given development team and a large number of project attributes, such as number of web pages, volatility of system requirements, and the use of reusable software components,

(2) software engineering development environments are evolving continuously, and

(3) there is a lack of measurement that truly reflects the complexity of software systems.

To produce better estimates, estimators must improve their understandings of those project attributes and their causal relationships, model the impact of the evolving environment, and develop effective ways of measuring software complexity.

At the initial stage of a project, there is high uncertainty about many project attributes. Estimates produce at early stages of development are inevitably inaccurate, as the accuracy depends highly on the amount of reliable information available to the estimator. As more project details emerge during analysis and later design stages, uncertainties are reduced and more accurate estimates can be made. Most models produce exact results without regard to this uncertainty. They need to be enhanced to produce a range of estimates and their probabilities.

To improve algorithmic models, there is a great need for the industry to collect project data on a wider scale. The recent effort of ISBSG is a step in the right direction [14]. This standards group has established a repository of over 790 projects, which can serve as a potential source for builders of cost estimation models.

With new types of applications, new development paradigms, and new development tools, cost estimators are facing great challenges in applying known estimation models in the 21st century. Historical data may prove to be irrelevant for future projects. The search for reliable, accurate, and low cost estimation methods must continue. Several areas are in need of immediate attention and these include the need for models based on development using formal methods or those based on iterative software processes. Also, more studies are needed to improve the accuracy of cost estimates for maintenance projects. Although a good deal of progress has been made in software cost estimation, a great deal remains to be done. Outstanding issues needing further research include:

1. Software size estimation;

2. Software size and complexity metrics;

3. Software cost driver attributes and their effects;

4. Software cost model analysis and their refinements;

5. Quantitative models of software project dynamics;

6. Quantitative models of software life-cycle evolution;

7. Software data collection.

References

1. A. J. Albrecht, and J.E. Gaffney, “Software function, source lines of codes, and development effort prediction: a software science validation,” IEEE Trans Software Eng. SE-9, 1983, pp. 639-648.

2. J.D. Aron, Estimating Resource for Large Programming Systems, NATO Science Committee, Rome, Italy, October 1969

3. R.K.D. Black, R.P. Curnow, R. Katz and M.D. Gray, BCS Software Production Data, Final Technical Report, RADC-TR-77-116, Boeing Computer Services, Inc., March 1977.

4. B.W. Boehm, Software engineering economics, Englewood Cliffs, NJ: Prentice-Hall, 1981

5. B.W. Boehm et al “The COCOMO 2.0 Software Cost Estimation Model,” American Programmer, July 1996, pp.2-17.

6. L.C. Briand, K. El Eman, F. Bomarius, “COBRA: A hybrid method for software cost estimation, benchmarking, and risk assessment,” International conference on software engineering, 1998, pp. 390-399.

7. G. Cantone, A. Cimitile and U. De Carlini, “A comparison of models for software cost estimation and management of software projects,” in Computer Systems: Performance and Simulation, Elisevier Science Publishers B.V., 1986

8. W.S. Donelson, “Project planning and control,” Datamation, June 1976, pp. 73-80.

9. N. E. Fenton and S. L. Pfleeger, Software Metrics: A Rigorous and Practical Approach, PWS Publishing Company, 1997

10. G.R. Finnie, G.E. Wittig, AI tools for software development estimation, Software Engineering and Education and Practice Conference, IEEE Computer Society Press, pp. 346-353, 1996.

11. M. H. Halstead, Elements of software science, Elsevier, New York, 1977

12. P.G. Hamer, G.D. Frewin, “M.H. Halstead’s Software Science – a critical examination,” Proceedings of the 6th International Conference on Software Engineering,Sept. 13-16, 1982, pp. 197-206.

12.5. F.J. Heemstra, “Software cost estimation,” Information and Software Technology vol. 34, no. 10, 1992, pp. 627-639.

13. J. Hihn and H. Habib-Agahi, “Cost Estimation of software intensive projects: a survey of current practices,” International Conference on Software Engineering, 1991, pp. 276-287.

14. ISBSG, International software benchmarking standards group, .

15. D.R. Jeffery, G. C. Low, “A comparison of function point counting techniques,” IEEE Trans on Soft. Eng., vol. 19, no. 5, 1993, pp. 529-532.

16. C. Jones, Applied Software Measurement, Assuring Productivity and Quality, McGraw-Hill, 1997.

17. C.F. Kemerer, “An empirical validation of software cost estimation models,” Communications of the ACM, vol. 30, no. 5, May 1987, pp. 416-429

18. R.D. Luce and H. Raiffa, Games and Decisions. New York: Wiley, 1957.

19. R. Nelson, Management Handbook for the Estimation of Computer Programming Costs,AD-A648750, Systems Development Corp., 1966

20. R.E. Park, “PRICE S: The calculation of within and why,” Proceedings of ISPA Tenth Annual Conference, Brighton, England, July 1988.

21. G.N. Parkinson, Parkinson’s Law and Other Studies in Administration, Houghton-Miffin, Boston, 1957.

22. N. A. Parr, “An alternative to the Raleigh Curve Model for Software development effort,” IEEE on Software Eng. May 1980.

23. L.H. Putnam, “A general empirical solution to the macro software sizing and estimating problem,” IEEE Trans. Soft. Eng. May 1980

24. W. Royce, Software project management: a unified framework, Addison Wesley, 1998

25. V. Y. Shen, S. D. Conte, H. E. Dunsmore, “Software Science revisited: a critical analysis of the theory and its empirical support,” IEEE Transactions on Software Engineering, 9,2,1983, pp. 155-165.

26. M. Shepperd and C. Schofield, “Estimating software project effort using analogy,” IEEE Trans. Soft. Eng. SE-23:12, 1997, pp. 736-743.

27. K. Srinivasan and D. Fisher, “Machine learning approaches to estimating software development effort”, IEEE Trans. Soft. Eng., vol. 21, no. 2, Feb. 1995, pp. 126-137.

28. D. St-Pierre, et al, Full Function Points: Counting Practice Manual, Technical Report 1997-04, University of Quebec at Montreal, 1997

29. S. S. Vivinanza, T. Mukhopadhyay, and M. J. Prietula, “Software-effort estimation: an exploratory study of expert performance,” Information Systems Research, vol. 2, no. 4, Dec. 1991, pp. 243-262.

30. C.E. Walston and C.P. Felix, “A method of programming measurement and estimation,” IBM Systems Journal, vol. 16, no. 1, 1977, pp. 54-73.

31. R.W. Wolverton, “The cost of developing large-scale software,” IEEE Trans. Computer, June 1974, pp. 615-636.

Appendix

Figure I: Software cost estimation accuracy versus phase [4]

[pic]

Figure II: Computer Programming Project Cycle [19]

[pic]

Figure III: Computer Programming Processing Steps [19]

[pic]

Figure IV: Early Sample Cost Justification Form [19]

[pic]

Figure V: Early Sample Project Description Form [19]

[pic]

Figure VI: Early Software Budget Form [19]

[pic]

Table I: Strengths and Weaknesses of Software Cost-Estimation Methods [4]

|Method |Strengths |Weaknesses |

|Algorithmic Model |Objective, repeatable, analyzable formula |Subjective inputs |

| |Efficient, good for sensitivity analysis |Assessment of exceptional circumstances |

| |Objectively calibrated to experience |Calibrated to past, not future |

|Expert Judgment |Assessment of representativeness, interactions,|No better than participants |

| |exceptional circumstances |Biases, incomplete recall |

|Analogy |Based on representative experience |Representativeness of experience |

|Parkinson |Correlates with some experience |Reinforces poor practice |

| | |Generally produces large overruns |

|Top-down |System level focus |Less detailed basis |

| |Efficient |Less stable |

|Bottom-up |More detailed basis |May overlook system level costs |

| |More stable |Requires more effort |

| |Fosters individual committment | |

Table II: Cost factors and their weights in COCOMO II [4]

|Cost Factors |Description |Rating |

| | | |

| | |Very Low |

| | |Low |

| | |Nominal |

| | |Hight |

| | |Very High |

| | | |

| |Product | |

|RELY |Reuquired software reliability |0.75 |

| | |0.88 |

| | |1 |

| | |1.15 |

| | |1.4 |

| | | |

|DATA |Database size |- |

| | |0.94 |

| | |1.00 |

| | |1.08 |

| | |1.16 |

| | | |

|CPLX |Product Complexity |0.7 |

| | |0.85 |

| | |1.00 |

| | |1.15 |

| | |1.30 |

| | | |

| |Computer | |

|TIME |Execution time constraint |- |

| | |- |

| | |1.00 |

| | |1.11 |

| | |1.3 |

| | | |

|STOR |Main storage constraint |- |

| | |- |

| | |1.00 |

| | |1.06 |

| | |1.21 |

| | | |

|VIRT |Virtual machine volatility |- |

| | |0.87 |

| | |1.00 |

| | |1.15 |

| | |1.30 |

| | | |

|TURN |Computer turnaround time |- |

| | |0.87 |

| | |1.00 |

| | |1.07 |

| | |1.15 |

| | | |

| |Personnel | |

|ACAP |Analyst capability |1.46 |

| | |1.19 |

| | |1.00 |

| | |0.86 |

| | |0.71 |

| | | |

|AEXP |Applicaton experience |1.29 |

| | |1.13 |

| | |1.00 |

| | |0.91 |

| | |0.82 |

| | | |

|PCAP |Programmer capability |1.42 |

| | |1.17 |

| | |1.00 |

| | |0.86 |

| | |0.70 |

| | | |

|VEXP |Virtual machine experience |1.21 |

| | |1.10 |

| | |1.00 |

| | |0.9 |

| | |- |

| | | |

|LEXP |Language experience |1.14 |

| | |1.07 |

| | |1.00 |

| | |0.95 |

| | |- |

| | | |

| |Project | |

|MODP |Modern programing practice |1.24 |

| | |1.1 |

| | |1 |

| | |0.91 |

| | |0.82 |

| | | |

|TOOL |Software tools |1.24 |

| | |1.1 |

| | |1 |

| | |0.91 |

| | |0.82 |

| | | |

|SCED |Development schedule |1.23 |

| | |1.08 |

| | |1 |

| | |1.04 |

| | |1.1 |

| | | |

Table III: COCOMO Software Development Modes [4]

|Feature |Mode |

| | |

| |Organic |

| |Semi-detached |

| |Embed |

| | |

|Organization understanding of product objectives | Thorough Considerable General |

|Experience in working with related software systems |Extensive Considerable Moderate |

|Need for software conformance with pre-established requirements |Basic Considerable Full |

|Need for software conformance with external interface specifications |Basic Considerable Full |

|Concurrent development of associated new hardware and operational procedures |Some Moderate Extensive |

|Need for innovative data processing architectures, algorithms |Minimal Some Considerable |

|Premium on early completion |Low Medium High |

|Product size range | ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download