Columbia University in the City of New York



A College Fund Investment Decision[1]

Alex Ferrara has a 12-year-old daughter. He wants to make investment plans to pay the $17,500 tuition for her first year of college 6 years from now. Alex has gone through his budget and finds that he can invest $200 per month for the next 6 years. He has decided to follow a strategy in which he contributes a fixed fraction of the $200 to each of two mutual funds. The first fund follows an investment strategy designed to match the return of the S&P 500. The second fund invests in short-term Treasury bills. Both funds have very low fees, which can safely be ignored here.

Keith Bernard, an adviser from the first fund, suggests that each month he should invest 80% of the $200 in the S&P 500 fund and the other 20% in the T-bill fund. “Don’t be a wuss,” he explained. “The S&P 500 has averaged much larger returns than the T-bill fund. Even though stock returns are risky investments in the short run, the risk will be fairly minimal over the longer 6-year period”.

Emily Leonard, an adviser from the other fund, recommends just the opposite: she suggests 20% in the S&P 500 fund and 80% in T-bills, because Treasury bills are backed by the United States government and are very safe. “If you follow this allocation”, she said, “your average return will be lower, but at least you will have enough to reach your $17,500 target in 6 years”.

Not knowing which adviser to believe, Alex has come to you for help.

Your job is two develop two simulation models; a “parametric” one, and one based on the “resampling” approach. These two approaches and some Crystal Ball hints are provided as an appendix to this document. The spreadsheet COLLEGE.XLS contains 261 monthly returns of the S&P 500 and Treasury bills from January 1970 through September 1991, which you are to use as the basis for your analysis.

Develop your two spreadsheet models to simulate the two suggested investment strategies over the 6-year period. Use both of your models to answer each of these questions:

a. Simulate 10,000 iterations of the two strategies over the 6-year period. Create histograms of the final fund values for each strategy from each model. Based on your simulation results, which of the two strategies would you recommend? Why?

b. Suppose that Alex needs to have $19,500 to pay for the first year's tuition. Based on the same simulation results, which of the two strategies would you recommend now? Why?

c. Write something brief and intelligent about the differences and/or similarities in your conclusions between the two models.

Appendix: Using Historical Data in Crystal Ball

There are two basic approaches to using historical data in a simulation, which we will refer to here as the parametric approach and the resampling approach. Each has advantages and disadvantages, and the modeler will use one or the other depending on the circumstances.

The Parametric Approach

In this approach, we “fit” the data to some theoretical distribution (such as normal or exponential) and estimate the parameters appropriate to the distribution (such as mean and standard deviation for a normal distribution, or lambda for an exponential distribution). In Alex’s case, we will be able to define the returns on each mutual fund in terms of one of Crystal Ball’s built in probability distributions.

This approach has significant advantages in terms of simplicity, because a random variable can be described with a few parameters instead of all the data. The model will be relatively small, with each random variable described with a relatively modest amount of information.

The disadvantage is that we need some assurance that the theoretical distribution we choose is in fact a good “fit” to the data. This gives rise to a special kind of hypothesis test, called a goodness-of-fit test.

Simulation software frequently includes a distribution-fitting utility, and Crystal Ball is no exception.

Here we demonstrate the Crystal Ball fitting procedure on the data set shown in Table 1 below. We will need to find which theoretical distribution best fits each variable, estimate the proper parameters for each, and specify a correlation coefficient for the relationship between the two variables.

|X |Y |

|103 |71 |

|109 |60 |

|100 |76 |

|94 |92 |

|100 |79 |

|95 |90 |

|115 |49 |

|102 |73 |

|105 |68 |

|98 |82 |

|106 |66 |

|113 |52 |

|108 |62 |

|91 |105 |

|111 |55 |

|97 |84 |

|99 |81 |

|90 |77 |

|82 |95 |

|92 |101 |

|85 |45 |

|94 |98 |

|101 |74 |

|89 |112 |

|87 |88 |

|96 |86 |

|118 |38 |

|106 |64 |

|104 |69 |

|110 |58 |

Table 1: Data Set

Preliminary analysis reveals the sample means, sample standard deviations, and sample correlation coefficient as shown here:

[pic]

Now we make a place for our two assumption cells in E9:F9 (remember to enter a number first; here we have entered zeroes).

[pic]

Now we’ll use the Crystal Ball distribution-fitting procedure to find the distribution that most closely fits the X variable. We select cell E9 and click on the Crystal Ball Define Assumption button.

[pic]

Click on “Fit”.

Enter the range containing the X data and click “Next”. The default option is to use the Chi-square test to see which one out of all the continuous distributions best fits our data. The default procedure is fine, so just click “OK”.

|[pic] |[pic] |

At the end of the procedure, it ranks the various distributions in descending order of “fit” to our data. Here are the six best fits to our X variable (note the descending p-values)[2]:

|[pic] |[pic] |

|[pic] |[pic] |

|[pic] |[pic] |

In this case, all six of the distributions shown above “pass” the Chi-square test; we cannot reject the possibility that our X data came from any one of them.

For our purposes in this exercise, we’ll take the distribution that Crystal Ball ranks as #1 — the Extreme Value distribution, with Mode = 95.52 and Scale = 8.46. Click on the “Accept” button to finish defining this assumption.

[pic]

Using the same approach, we determine that the best fit to our Y data is a Logistic distribution with Mean = 75.0 and Scale parameter = 10.56.

|[pic] |[pic] |

Now both assumptions have been defined:

[pic]

The last step is to define the correlation between these two variables. You can specify the correlation between any pair of variables by selecting either assumption cell, clicking the Define Assumption button and then clicking on “Correlate”.

[pic]

Here we’ve opened the X variable. The menu lists all of the other assumption cells already defined in the model. We’ll choose Y, and enter the cell reference with our estimated correlation coefficient:

|[pic] |[pic] |

[pic]

Now Crystal Ball is ready to generate X and Y from the closest-fitting distributions to our data, and correlated with each other based on our sample. These two cells could be expanded into columns of X and Y variables through the use of the Crystal Ball copy and paste buttons.

The Resampling Approach

In this approach, we make no assumptions about any theoretical distributions that may or may not actually fit our data; we use the data themselves as the basis for our simulation.

This approach has the advantage of avoiding the problem of Type II errors in the Chi-square test[3]. It also spares us from dealing explicitly with correlation.

The disadvantage here is that our model may have to include a large set of data (as opposed to the few parameters we used in the parametric approach).

Here we use the 30 X and Y values from Table 1 to illustrate a method that uses two Excel tricks: the Custom distribution in Crystal Ball, and the VLOOKUP Excel function. We will (a) generate random integers between 1 and 30, and then (b) use these random integers to draw X, Y pairs from the historical data.

First, a basic explanation of the VLOOKUP function.

VLOOKUP

This Excel function searches for a value in the leftmost column of a table, and then returns a value in the same row from a column you specify in the table.

Syntax: VLOOKUP(lookup value, table array, column index number, range lookup)

Lookup value is the value to be found in the first column of the array. Lookup value can be a value, a reference, or a text string.

Table array is the table of information in which data is looked up. Use a reference to a range or a range name. The values in the first column of table array can be text, numbers, or logical values.

Column index number is the column number in table array from which the matching value must be returned. A column index number of 1 returns the value in the first column in table array; a column index number of 2 returns the value in the second column in table array, and so on.

Range_lookup is a logical value that specifies whether you want VLOOKUP to find an exact match or an approximate match. If TRUE or omitted, an approximate match is returned. In other words, if an exact match is not found, the next largest value that is less than lookup value is returned. If FALSE, VLOOKUP will find an exact match. If one is not found, the error value #N/A is returned. If range lookup is TRUE, the values in the first column of table array must be placed in ascending order: ..., -2, -1, 0, 1, 2, ..., A-Z, etc.; otherwise VLOOKUP may not give the correct value. If range lookup is FALSE, table array does not need to be sorted. (Juran strongly recommends using the FALSE option here, by entering a zero for the fourth argument.)

Example: Here we can type the name of a school in cell A2 and VLOOKUP will tell us how much tuition the school charges. The VLOOKUP function can be translated into English: “Look for NYU in the first column of A5:J31, and tell me what you see in the 6th column of that row.”

[pic]

Back to our example. Start the model with a spreadsheet similar to the parametric one. Note that we have added a column of integers in column A.

[pic]

Use the VLOOKUP function in E9 and F9 to “look up” the X, Y pair corresponding to the integer in E6.

[pic]

Now we need to define E6 as a Crystal Ball “Custom” assumption.

|[pic] |[pic] |

Click “Data” and enter the range containing our 31 integers. Click “OK”.

[pic]

[pic]

Click “Rescale” and “OK” to make the 30 probabilities all add up to 100%.

[pic]

Click “OK” and the cell turns green. Now the model is ready to sample X, Y pairs randomly from the data set.

[pic]

Using Crystal Ball copy and paste, we can set up a series of X, Y pairs, as shown here:

[pic]

In Alex’s problem, we can use this approach to generate random “scenarios” from the historical data.

-----------------------

[1] Adapted from the problem in Practical Management Science (2nd ed., Winston and Albright, 2001 Duxbury Press, p. 710).

[2] The details of the Chi-square procedure are not crucial here, as Crystal Ball will be doing the heavy lifting. The Chi-square test, like all hypothesis tests, is based on seeing how likely the observed data would be, if in fact some null hypothesis were true. In the first example here, the null hypothesis is that “these data came from an Extreme Value distribution with Mode = 95.52 and Scale = 8.46”. Crystal Ball cycles through all of the continuous distributions in its gallery, and performs a Chi-square test on each of them. The key statistic is the p-value, which may be interpreted as the likelihood of seeing the observed data if the null hypothesis were true. A small p-value (say, less than 0.05) indicates an unlikely occurrence (and therefore, a poor fit to our data). The large p-value observed here indicates that we cannot reject the possibility that these data came from the Extreme Value distribution. If in fact the true population of X were an Extreme Value distribution with Mode = 95.52 and Scale = 8.46, the data in Table 1 would not be surprising at all.

[3] As with all hypothesis tests, goodness-of-fit tests are susceptible to the possibility of incorrectly failing to reject a false hypothesis. In the context of goodness-of-fit, this would mean possibly using the “wrong” distribution in our simulation model.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download