Using Unique Scorecard Measures in Multi-Divisional ...



Using Unique Scorecard Measures in Multi-Divisional Corporations | |[pic]

By Prof. Kip Krumwiede and Prof. Monte Swain

Brigham Young University

Executive Summary

In this report, we present the results of an online experiment recently conducted on tackling a perplexing challenge for multi-divisional companies trying to implement the Balanced Scorecard (BSC) approach. A previous academic study showed evidence that corporate executives evaluating division scorecards with both common and unique measures tended to rely almost solely on the common measures. Presumably, this occurred because it is easier for the mind to compare “apples to apples” than “apples to oranges.” That study is a potential blow to the usefulness of unique measures tied to a division’s strategic focus, which is a key aspect of the Balanced Scorecard approach. Our study suggests that unique measures are not always ignored but their use depends on certain factors, including the feedback value of the unique measures over time and the similarity of the divisions.

Join us at our free web seminar, How to Use Unique Scorecard Measures in a Multi-Division Corporation, where we will discuss our findings.

Introduction

In July 2000, two researchers, Marlys Lipe and Steven Salterio, published the results of their study on the effects of common and unique measures on evaluations of two divisions of a clothing firm.1 Although evaluators were not asked to compare or rank the divisions, the researchers found that they rated the divisions almost solely based on the common measures and state that “performance on unique measures has no effect on the evaluation judgments” (p. 284). These findings are troubling for proponents of the Balanced Scorecard approach who contend that evaluations of performance should include measures derived from an organization’s own vision and strategy.

Improved performance evaluation is only one goal of the Balanced Scorecard (BSC) framework. According to Kaplan and Norton’s book, The Strategy-Focused Organization, the real impact of the BSC approach is strategic alignment and focus within an organization. However, if a division’s management develops performance measures that are unique to its strategic focus but is evaluated solely on measures that are common to other divisions in the corporation, it will probably focus its efforts on excelling in those common measures. Therefore, evaluators who ignore the unique measures, which are often leading measures of future performance, may not evaluate the division fairly or optimally.

This study tests two factors that may affect evaluators’ usage of unique scorecard measures: similarity of divisions and multiperiod evaluations with feedback. First, as divisions become more similar in products and strategies, the cognitive load on the mind is reduced, which may allow evaluators to utilize more information—including unique measures—in their decision making. Second, asking evaluators to judge performance over multiple periods with unique measures that provide information about future performance may lead to increased reliance on the unique measures over time due to the impact of the unique measures’ feedback value. Results from this study provide support for these ideas.

Lipe and Salterio Study

Traditionally, conglomerate corporations have managed their diverse divisions using common financial benchmarks like return on investment thanks to early successes by companies like Du Pont and General Motors. However, problems have always existed when using common financial measures to evaluate multiple divisions, such as stifled risk taking, unfair comparisons of units with unequal potential, and short-sighted decision making.2 A Balanced Scorecard approach strives to overcome these issues by incorporating financial measures of past performance with nonfinancial measures that communicate current efforts to pursue unique organization strategies, which then lead to improved future financial performance.3 Comparing different divisions using only common measures (which typically focus on financial performance) defeats the purpose of the BSC approach.

In the Lipe and Salterio [L&S] study, participants take the role of a senior executive of a firm specializing in women’s apparel. The task is to evaluate two of the firm’s largest divisions relative to its own scorecard. The two divisions are Radwear, a retailer specializing in clothing for the “urban teenager,” and WorkWear, which sells business uniforms through direct sales to business clients. RadWear has set an aggressive strategy of growth through opening new stores and increasing brands that cater to low-mobility teenage girls. WorkWear focuses on growth by adding a few basic uniforms for men and also offering a catalog to facilitate repeat orders. Although both of these divisions sell clothing, their customer segments, types of clothing, and sales methods differ substantially. Each division’s scorecards have an equal mix of common and unique measures, and even though the evaluators were not asked to compare firms, the evaluators relied almost solely on measures common to both divisions in making their decisions.

L&S attribute their findings to prior research showing that decision makers look for the most direct comparison between two choices when possible. Such an approach is often referred to as a “simplification strategy” or a “heuristic.”4 Using unique information would have required the evaluators to deal with questions of relative weights or “tradeoffs” between different measures.

Similarity of Divisions

However, other prior experimental research on decision making suggests that as the similarity of alternatives increases, decision makers tend to increase the number of attributes they consider.5 The theory is that it is easier for the mind to consider more attributes as a result of the relatively few distinct dimensions that will have to be considered. When the alternatives are very dissimilar, the mind tends to use a simplifying strategy and focus on one or two dimensions that are more easily comparable. If this theory applies to multi-division evaluators, then we might expect evaluators analyzing very similar divisions to rely on more information, including unique measures, than evaluators comparing very dissimilar divisions.

Multiperiod decisions with feedback

In contrast to L&S’s one-period study, more effective use of division performance data may occur after several evaluation periods with feedback occurring between evaluations. Prior research suggests that as a decision maker’s prior experience with a particular decision increases, the more likely it is to impact the decision.6 Thus, we expect evaluators to rely more on unique measures as they have experience showing that the unique measures indicate future performance.

Simulation

In this study, we replicate the L&S study with some modifications to test our theories. The participant takes the role of company president of AXT Corporation, a diversified company with two major business groups, and evaluates two of its larger divisions. A computer program is set up as an electronic Balanced Scorecard available to company management and decision makers. In the main menu of the computer program, participants are provided access to background material providing some company history and explaining the Balanced Scorecard approach and reasons for using it. To help prevent one division’s strategic information being weighted more towards unique or common measures than other divisions’ information, the text of each division’s information has been carefully balanced so that it links to the same number of common and unique measures for each of the divisions.7

Division Scenarios

Three division scenarios are being used in this study: somewhat similar [SS] divisions, quite dissimilar [QD] divisions, and quite similar [QS] divisions. The SS program includes the same two clothing divisions as in the L&S study (i.e., RadWear and WorkWear) and serves as a replication of their study and a benchmark for this study. Based on experimental materials provided by L&S, the background information, measures, and data in this study are almost identical to help control for possible confounds due to contextual differences. The QD program includes the same RadWear Division but replaces WorkWear with National Bank Online Services [NBOS], based on the National Bank Online Financial Services example cited in Kaplan and Norton’s book, The Strategy-Focused Organization.8 The QS program pairs the same RadWear Division with RealWear, a clothing division for women in their twenties based on Express, which is The Limited’s biggest division. Our intent is to create close similarity in products and strategies between the RadWear and RealWear divisions and extreme differences between the RadWear and NBOS divisions to provide a basis for testing the “similarity of divisions” theory. We expect participants to rely more on unique measures when evaluating QS divisions than when evaluating QD divisions.

Multiple Periods

Table 1 (Panel A) shows how the performance measurement data are manipulated. The unique measures in the four performance data sets are designed to “lead” to the common measure performance in the next year. For example, as shown in Panel B, the total excess actual performance over target of unique measures for Division 1 in Year 1 is 85.0%, which closely relates to the total excess performance of the common measures in Year 2 (85.2%). The performance measures’ outcomes in subsequent years were varied based on one of four patterns, allowing for the following scenarios: common measures favor Division 1 (i.e., RadWear), common measures favor Division 2 (i.e., RealWear, WorkWear, or NBOS), unique measures favor Division 1, and unique measures favor Division 2.

Table 1: Manipulation of Performance on Common and Unique Measures

Panel A: Manipulations for each performance set

|Performance Set |2000 |2001 |2002 |2003 |

| |C1 |U1 |

| |Div. 1 |Div. 2 |Div. 1 |Div. 2 |

|  |Excess % |Excess % |Excess % |Excess % |

| |better |better |better |better |

|Common measures (e.g., Return on |52.0% |85.2% |85.2% |52.0% |

|sales for both divisions) | | | | |

|Unique measures (e.g., Mystery |85.0% |52.0% |85.0% |52.0% |

|shopper rating for RadWear) | | | | |

Feedback is provided to participants in this study by receiving the previous year’s scorecard data before making an evaluation. First, similar to L&S, participants evaluate the two divisions on a 100-point scale (“Very Poor” to “Excellent”) after receiving the scorecard data for Year 1. Next, the second year’s data is added to the first year’s data and participants evaluate the two divisions for Year 2. This process continues through the fourth year; each time participants make an evaluation after receiving the previous year’s scorecard results. They are not allowed to go back and change their evaluations for a prior year, but they can access either division’s scorecard data (for previous periods), measurement definitions, and strategy descriptions at any time. We expect participants to rely more on unique measures as they gain experience using them and observe that these measures indicate future financial performance. By the fourth year, participants should have sufficient practice and feedback to see the value of the unique measures.

Similar to L&S, the common and unique measurement groups are each set at either “more favorable” or “less favorable” levels and within two divisions, except that this study adds multiple periods. Also following L&S, all measurement data actual outcomes are better than target and both common and unique items have the same total excess performance (i.e., percentage better than target) for the less favorable (52%) and more favorable manipulations (85%). Although it is probably not realistic for all outcomes to be above target, it eliminates the possible overreaction to not meeting targets.9 The “percentage of actual better than target” is provided to make all measures comparable on the same scale.

Panel of Judges

A panel of 43 researchers and managers who have experience and perspective on performance evaluation each evaluated paper versions of one of the division scenarios. These “Panel of Judges” scores were put into the computer program and served as a basis for providing feedback to simulation participants. In addition, all 43 of the judges rated each of the performance measures on its relevance for evaluating the performance of the division manager using a scale of 1 (low relevance) to 10 (high relevance). No statistically significant differences were found in mean relevance ratings between common and unique measures for each of the three division scenarios. Thus, any difference in participants’ use of common and unique measures is not likely due to differences in the perceived relevance of these sets of measures.

Participants

Experienced executives recently participated in this experiment via the web site (an organization devoted to providing information on various management innovations such as scorecarding). Participating executives either downloaded or accessed a web-only version of one of the three program scenarios from the web site.

Participants were randomly assigned a three-digit code identifying the specific evaluation context for each participant. The first digit identifies the two divisions they will evaluate (QS, SS, or QD scenarios). The second digit identifies the degree of strategic information they are given (i.e., more articulated or less articulated). The third digit indicates one of five possible performance data sets for the two divisions. Individual data sets were designed for each division, each with performance data tailored to the different targets and unique measures for each of the divisions.

The editor of the website included advertising and a link to our study in exchange for a web-based seminar reporting the results. is used for this study because its audience is knowledgeable and experienced in performance measurement and the Balanced Scorecard approach. To encourage these executives to participate, we provided instant feedback at the end of the program showing how closely their evaluations matched our panel of judges’ scores, an early copy of the research report, and a chance to win gift certificates. The participant with the smallest average difference from the panel of judges’ score received a $75 certificate, and a random winner received a $50 certificate.

A total of 292 responses to the simulation were received over the site. One hundred and twenty-five responses were eliminated because of incomplete responses, multiple responses from the same individual (used the first one), and responses from a “control group” program version using a fifth data set not considered for this report. In addition, an additional 25 responses were eliminated for those spending less than ten minutes on the simulation. An analysis of scores compared to the panel of judges shows a clear difference at around ten minutes.10 Thus, 142 usable responses were received.

Results

Table 2 provides the impact of common and unique measures on mean evaluation scores for the judges over the four evaluation periods. Figures 1A and 1B illustrate graphically the impact across years 1 through 4.

Table 2: Impact of Common and Unique Measures on Mean Evaluation Scores for Executives

|"SS" Divisions (n=15) |

|  |2000 |2001 |2002 |2003 |

|Measures |

|  |2000 |2001 |2002 |2003 |

|Measures |

|  |2000 |2001 |2002 |2003 |

|Measures |

Figure 1B: Quite Similar (QS) vs. Quite Dissimilar (QD) division comparisons

|[pic] |

Note: “Impact on score” is computed by taking the increase in mean scores for the two divisions when the common (unique) measures favor RadWear and adding it to the increase in mean scores when the measures favor RealWear (QS) or NBOS (QD).

Multiperiod feedback results

Both Figures 1A and 1B show that in all scenarios, the impact of unique measures was greater in year 4 than in year 1. Thus, it appears that executives made more effective use of unique division performance data as time went on with feedback occurring between evaluations. They relied more on unique measures as they had experience showing that the unique measures lead to future performance. These results support the “multiperiod evaluations with feedback” theory.

Implications and Limitations

The idea that multiperiod evaluations with feedback increase the impact of unique measures has been supported in this study. Unique measures appear to have more influence than common measures over time, assuming they are perceived to be leading indicators of future financial results. As time goes on, evaluators can benchmark unique measures over time, giving them a more common basis for comparison. This finding supports the use of leading measures closely tied to an organization’s individual strategies, as recommended by Kaplan and Norton. This finding also suggests that decision making in a Balanced Scorecard context may be more effective over time than for a single period. Finally, this study adds to the L&S study, which was based on only a single evaluation period.

The results of this study also suggest as the similarity of divisions increases, evaluators increase the amount of information they consider. The theory is that it is easier for the mind to consider more information when a relatively smaller set of division characteristics is being considered. Evaluators were impacted strongly by both common and unique measures when the divisions being evaluated were most similar. Companies with relatively similar divisions may be more free to use common and unique measures freely than companies with more dissimilar divisions. These “conglomerate-type” companies should recognize that common measures may naturally be weighted more heavily by evaluators, at least in the early years of scorecard use. They may also want to consider developing a fixed weighting system for scorecard measures to make the evaluation process more objective.

There are several limitations to this study that should be noted. Experimental studies attempting to model real-world phenomenon require simplifying assumptions that impair realism. The various pressures and incentives that would be present for real-world decision makers were not present in this simulation. This study did strive to create motivation for the participants by providing feedback on evaluation scores and token prizes for winners. Finally, this study focuses on the perspective of the evaluator rather than on the performer. Certainly the behavioral impact on the organization or people being evaluated is the major thrust of the Balanced Scorecard approach. Given these limitations, this study hopefully adds to available knowledge on the effectiveness of the Balanced Scorecard approach.

To find out more, attend our free web seminar How to Use Unique Scorecard Measures in a Multi-Division Corporation.

Footnotes

1Lipe, M. G., and S. E. Salterio, 2000, The balanced scorecard: judgmental effects of common and unique performance measures, The Accounting Review 75 (3), pp. 283-298.

2See Johnson, H. T. and R. S. Kaplan, 1991, Relevance Lost: The Rise and Fall of Management Accounting. Boston, MA: Harvard Business School Press.

3See Kaplan, R., and D. Norton, 1996, The Balanced Scorecard. Boston, MA: Harvard Business School Press, p. 8.

4See Payne, J. W., J. R. Bettman, and E. J. Johnson, 1993, The Adaptive Decision Maker, New York, NY: Cambridge University Press, p. 2, 51.

5See Payne et al. 1993, p. 55; Bockenholt, U., Albert, D., Aschenbrenner, M., and Schmalhofer, F., 1991, The effects of attractiveness, dominance, and attribute differences on information acquisition in multiattribute binary choice, Organizational Behavior and Human Decision Processes, 49, pp. 258-281; Biggs, S. F., Bedard, J. C., Gaber, B. G., and Linsmeier, T.J., 1985, The effects of task size and similarity on the decision behavior of bank loan officers, Management Science, 31, pp. 970-987; and Shugan, S. M., 1980, The cost of thinking, Journal of Consumer Research 7, pp. 99-111.

6Payne et al. 1993, pp. 172-177.

7RealWear and NBOS divisions both have text that can be linked to four unique measures and to three common measures (same as WorkWear, which is taken from Lipe and Salterio’s study). RadWear Division, which is used in all case scenarios and is the same as in Lipe and Salterio’s study, can be linked to three unique measures and two common measures. The strategic objectives designed so that each performance measure can be linked to a specific objective.

8See Kaplan and Norton’s book, The Strategy-Focused Organization, 200, Boston, MA: Harvard Business School Press, pp. 107-116.

9The fact that both divisions exceed targets suggests that the economy is good, at least for their markets. Because it would also not be realistic to leave the targets unaltered during such a time period, we increase the targets for certain measures during the four time periods. The increases are uniform and balanced among common and unique measures, and the total percentage increase is the same for each division for each year and for each scenario.

10The average difference from experts’ evaluation scores for those participants spending ten minutes or more is 12.01. The average difference increases to 16.97 for those spending less than ten minutes.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download