Testing For the Most Significant Statistical Categories on ...



Testing For the Most Significant Statistical Categories on the PGA Tour

Ed Fritsch

Eco 328

December, 2003

Abstract

This model was set up with a professional golfer on the PGA tour’s dollars earned as the dependent variable. The seven independent variables were a player’s birdie average, driving accuracy, driving distance, final round scoring average, greens in regulation percentage, putting average, and sand save percentage. The goal of this study was to see which statistical categories in golf are the most significant with regard to money earnings.

One previous econometric study that used data from professional golfers was a regression that tested whether golf tournaments had incentive effects. The conclusion was that tournaments with greater prize money had greater incentive for players to perform well, and an effort variable takes affect in tournaments with larger purses. Another study found the same conclusion; that the structure of tournament prize money influences effort on the LPGA tour. A third study focused on what statistical categories most influenced four-round scores. This regression concluded that driving distance, average putts, and greens in regulation percentage were significant in explaining a golfer’s performance. My study is similar to the third regression, while the final round scoring average can expand on the theory that an effort variable exists in golf performance.

The first run of the regression showed significance with the variables: final round scoring average, greens in regulation, and sand saves. The model showed overall significance with an adjusted R-squared of approximately 0.40. This regression had problems with autocorrelation, heteroskedasticity, and multicollinearity. To solve these problems, I combined the driving variables into one total driving variable, eliminated the average putting variable, and added a variable for number of events played for a second run of the regression.

The second model showed significance with the variables: birdie average, final round scoring average, greens in regulation, and sand saves. The model again showed overall significance with an adjusted R-squared of approximately 0.39. The mulitcollinearity problem was solved, but events and total driving were insignificant, and the model had problems with both autocorrelation and heteroskedasticity. These may be impure from a misspecified model.

In conclusion, the significance of final round scoring average is consistent with previous econometrics models theory involving the incentive effects of tournaments, and the effort variable in golfer’s performance. Also, the variables that involved the short game showed significance, while driving variables never showed significance. So, a player improving their short game will result in the greatest increase in earnings.

Table of Contents

Section Page # section starts on

Title Page p. 1

Abstract p. 2

Table of Contents p. 4

Introduction p. 5

Literature Review p. 7

Model p. 12

Data and Empirical Results p. 16

Conclusion p. 24

Data, Tabular, and Graphical Appendices p. 27

Bibliography p. 31

Introduction

This econometric study intends to discover through regression analysis the most important statistical categories to a Professional Golf Association Tour player. The purpose of this is to give professional golfers and fans an idea of what parts of the game are the most important to producing the maximum amount of earnings. Golf is made up of driving the ball, iron play, putting, and sand play. Practice time is limited, so if through empirical analysis I can find that focusing on putting, for example, will produce the largest increase in earnings for a player, then a player’s practice time can be more efficient with regard to results and earnings. This can carry over to the average or recreational golfer. If driving, for example, is the most important statistic to professionals with regard to earnings, then the weekend player, whose practice time is even more limited, can use that data to focus their practice time, lower their scores, and maximize their enjoyment.

The dependent variable in this study is the amount of dollars earned by a PGA tour player, and the independent variables were the average number of birdies a player makes in a round, driving accuracy or the percentage of times a player drives the ball in the fairway, the average distance of a players drive, a player’s final round scoring average, a player’s percentage of greens in regulation hit, a player’s average number of putts per hole, and a player’s sand save percentage. The sample used was the top 131 money winners for the 2002 PGA Tour season, and the study was done for the 2002 PGA Tour season. The data was accumulated from the “stats central” section of . Part of my study particularly focuses on the final round scoring average variable. Analyzing this will show some evidence of whether or not an effort variable influences golf scores, which has been studied in previous econometrics analyses.

The variables described above produced a regression that showed that only final round scoring average; greens in regulation percentage, and sand save percentage were significant. The model showed overall significance, but had problems with serial correlation and heteroskedasticity. There were also slight multicollinearity problems between driving distance and driving accuracy, and putting average and birdie average. These problems caused me to run a second regression with putting average eliminated, driving distance and driving accuracy combined into total driving, and the addition of the variable of the number of events a player participates in. I also rearranged the order of the players in the spreadsheet to prevent autocorrelation.

This second run showed that birdie average, final round scoring average, greens in regulation percentage, and sand save percentage were significant variables, while the two new variables of events played and total driving were not significant. The new model also showed overall significance, but again had problems with autocorrelation and heteroskedasticity.

Literature Review

Ehrenberg and Bognanno (1990) set up a regression model to study whether tournaments have incentive effects. This means that they set up an experiment to test whether potential money earnings in a tournament cause an increase in effort among competitors. This data can be used for much more than measuring golf results. “The results can be used to describe the pay structures of workers such as: corporate executives, professors, salespeople, and professional sports tournaments” (Ehrenberg 1308). They use golf tournaments as their model, because it shows a very clear model of performance leading to higher pay. They use data form the 1984 men’s Professional Golf Association (PGA) tour, and the sample was all full time PGA tour golfers. The assumption in this experiment was that an individual’s score is linearly related to their effort or concentration level. The regression tested whether a golfer’s final score in a four round tournament will depend on: the prize differential for winning, the golfer’s and his opponents’ ability, and tournament-specific factors.

The dependent variable was the final round score. The independent variables were the total prize money rewarded, a vector of variables to control for the difficulty of the tournament course and weather conditions, a vector for the player’s ability, and a vector of variables to control for the quality of other players in the tournament. The course difficulty variable includes the par for the tournament course, the total course yardage, the PGA tour’s evaluation of playing difficulty, and three raters’ opinions on the number of days the tournament had bad weather that would influence scoring. The player’s ability is measured by score average for that player, and the percentage of tournaments that the player has made the two round cut for a tournament. The quality of other players is measured by how many players who made the two round cut were ranked in the top 160 in total prize money so far that year.

As predicted, more difficult courses lead to higher scores of 1.495 more strokes per round on average. Bad weather also contributed to slightly higher scores. An increase in prize money of $100,000 led to an average of 1.1 strokes lower per round. The player’s ability had a strong correlation with final round score, with the score average for the player being one stroke higher leading to an over 4-stroke increase in final round performance. The key result was that prize money strongly affected final round scoring (effort in this case). Ehrenberg and Bognanno also tested whether prize money influenced the second round scoring average, where players need to score at a certain level in order to make the cut and record any earnings (a missed cut results in $0 income for a player). This resulted in a $100,000 increase in prize money decreasing scores by only 0.1 strokes. The correlation is not as strong, possibly because the incentive in the second round is to earn any money, rather than a player focusing on maximizing his earnings. In conclusion, the study showed that tournaments do have incentive effects. There were many potential problems with the experiment though. First, it only focused on golf, when other sports could be relevant. Also, all of the analyses were derived from simple two person models, where only two golfers were analyzed in a particular model. And finally, it was assumed that the prize structure of tournaments influences scores based on effort levels.

In reaction to Ehrenberg and Bognanno’s study Peschiera, Matthews, and Sommers (2002) performed a similar incentive study on the Ladies Professional Golf Association (LPGA) tour. The dependent variable in their model was the four round score of a player in a tournament. Their independent variables included: total prize money disbursed to all players in the tournament, a dummy variable for whether or not a tournament was a major, the four-round par score for the tournament, the total course yardage, the player’s four-round scoring average for the year up to that point on the tour, and the mean value of all of the player’s four-round scoring average for the year. The sample was the 50 top money winners on the LPGA tour. They found that increasing the prize money by $100,000 actually led to higher scores. The other variables, except for a more competitive field increasing scores, held their predicted signs with majors decreasing scores and greater course yardage leading to higher scores. This experiment went a step farther than Ehrenberg and Bognanno’s model by separating prize money into two categories: the winner’s purse, and the difference between the total tournament purse and the winner’s purse. An increase in the first place prize money led to decreased scores, and the larger the difference between the purse and the first prize money, the higher the scores. This implies that if the first place prize money isn’t large enough relative to the entire purse, then effort decreases. This makes theoretical sense if we assume that lower separation in prize money for each position leads to decreased incentive, and therefore decreased effort. The decreased incentive comes from a lesser dollar penalty for finishing one spot lower in the field, for example second place prize money does not differ as much relative to first place prize money. The higher scores in relation to higher prize money can be explained by the model not taking into account course and playing difficulty as thoroughly as Ehrenberg and Bognanno did. This model did not have variables representing course difficulty or bad weather that could influence scores.

“The tentative conclusion of this model is that the structure of tournament prizes matters in the case of the LPGA tour” (Peschiera 10). The small positive coefficients on prize money can be explained in a few ways. First, financial rewards may increase effort, but among the top money winners, the increased money incentive has little effect over performance in a four-day tournament (Peschiera 10). The top money winners are not as money driven as other golfers may be. Also, the relationship between prize money and incentive for women golfers may be better studied over a lifetime or career, rather than one season (Peschiera 11). Finally, prize money available to women is much smaller than men’s potential earnings. Endorsements are more prevalent for men as well. This creates an incentive gap between the men’s tour and the women’s tour. This could lead to a decreased or negligible coefficient for women’s incentives compared to earnings (Peschiera 11). Peschiera, Matthews, and Sommers also studied the effects on scoring of having the number one ranked golfer in the field. They found some evidence of this “superstar” effect, but nothing conclusive. Their conclusion was that “the presence of the tour’s best player in a tournament appears to reduce, in some cases reverse, the responsiveness of others to financial incentives” (Peschiera 13). This could carry over to real world situations, where salespeople who are paid based on relative output could have their incentives reduced by the presence of a “superstar” salesman (Pesciera 13).

Kumar and Park (1999) did a study analyzing what statistical factors in golf affect a player’s 4 round scoring average. The dependent variable in their study was a players four round score. Some of the independent variables included: average driving distance, number of fairways hit, number of greens hit, and average number of putts per hole. The sample included the top 30 finishers in the 1999 U.S. Open Championship. They concluded that driver distance, average number of putts made in 4 rounds, number of greens hit, and average number of putts made per hole were significant variables in explaining the performance of a golfer. This experiment went away from trying to explain the player’s effort factor with money as incentive. Instead, it focused on what statistical categories are most important to a golfer when trying to finish towards the top of one single tournament.

My experiment will be similar to Kumar and Park’s, but I will study the significance of each statistical category on players’ money earnings over the course of an entire season. The Ehrenberg and Bognanno study along with Peschiera, Matthews, and Sommers experiment provide a strong foundation for the “X” factor involved with money earnings. That factor is increased effort due to money incentive at the higher purse tournaments. I will consider this in my study. My statistical analysis will basically be used to conclude which statistical categories are most important to a golfer trying to maximize his income. For example, if I find that putts per round is the most important statistic towards money earnings, this can advise players to focus their practice efforts on putting in order to maximize their earnings. The effort factor is involved, but not so much in performance as preparation.

Model

The dependent variable in my regression model is the number of dollars earned on the PGA (Professional Golf Association) Tour in the 2002 season. This money earnings variable will be tested against 7 independent variables. First, a player’s final round scoring average will be used. This is the average number of strokes for the final round of each tournament played. This will be used to measure the incentive effect described in previous golf econometric studies. The incentive effect involves a potential additional effort variable produced by players to maximize their earnings. Since players are paid after the final round, one could hypothesize that player’s effort increases in the final round, because in the final round players are thinking about earnings and will increase their effort in order to maximize their prize money. The incentive is the total purse or first place prize money. In theory, the effort variable increases a player’s performance, so that they can maximize their prize money. The second independent variable is driving distance. “'Driving distance is the average number of yards per measured drive. These drives are measured on two holes per round. Care is taken to select two holes, which face in opposite directions to counteract the effects of wind. Drives are measured to the point they come to rest regardless of whether they are in the fairway or not” (). The third independent variable is driving accuracy. Driving accuracy is the percentage of times a player is able to hit the fairway with his tee shot. The fourth independent variable is birdie average. Birdie average is the average number of birdies made per round played. Next, the fifth independent variable is greens in regulation percentage. “Greens in regulation is the percent of time a player was able to hit the green in regulation (greens hit in regulation/holes played). Note: A green is considered hit in regulation if any part of the ball is touching the putting surface and the number of strokes taken is 2 or less than par” (). The sixth independent variable is sand saves. “Sand saves is the percent of time a player was able to get 'up and down' once in a greenside sand bunker. Note: This up and down is computed regardless of score on the hole” (). Up and down refers to a player taking two strokes to get into the hole from a greenside (next to the green, not fairway bunkers) sand trap. The final independent variable is putting average. “Putting leaders measures putting performance on those holes where the green is hit in regulation (GIR). For these holes, the total putts are divided by the total holes played. By using greens hit in regulation we are able to eliminate the effects of chipping close and one putting in the computation” ().

The mathematical representation of my hypothesized model is as follows:

Di=ß0 - ß1Sa i + ß2 Dd i+ ß3 Da i+ ß4 Ba i + ß5 Gr i+ ß6 Ss i- ß7 Pa i+(I

i=1…n golfers

Di: The number of dollars earned for the 2002 season

Sa: Final round scoring average

Dd: Average driving distance

Da: Driving accuracy (percentage of fairways hit)

Ba: Birdie average per round

Gr: Greens in regulation percentage

Ss: Sand save percentage

Pa: Putting average per hole of green in regulation hit

The main hypothesis to this test is that all of these variables are significant. In other words, each independent variable significantly influences the amount of dollars earned. My model will use linear functional form. I don’t have any expectations for which variable will be the most significant, but I hope to find out. The main goal of the test is to find out which statistical categories in golf have the most influence over dollars earned. My hypothesis is that final round scoring average, and birdie average will be the strongest variables. I expect significance from final round scoring average, because of the “effort term” studied in previous golf econometric studies. I expect birdie average to be significant, because historically players who make the most birdies in a tournament usually finish towards the top of the money earnings.

I expect positive signs from five of the coefficients, and negative signs from final round scoring average and putting average. I expect that the longer a player drives the ball on average the better they will score with the growing length of tournament courses. Also, the better a player’s driving accuracy, the more money a player should earn. I hypothesize that the more often a player is in the fairway, the better they will score, and the more money they will earn. If birdie average increases, I expect dollars earned to increase as well. The more birdies a player makes per round, the lower they should score on average, and therefore they should earn more money. If greens in regulation percentage increases then a player should score better as well, because it is easier to shoot lower scores when a player is able to two-putt for par more often, or possibly one-putt for birdie. If a player’s sand save percentage increases, then it would be hypothesized that they would score better and earn more dollars. If a player can save strokes from the sand, then that should translate into more dollars earned. Putting average should have a negative relationship with dollars earned. If a player has less putts per hole, then it would be assumed that they have less strokes per hole on average. This leads to lower overall scores and more dollars earned. Final round scoring average should also have a negative influence on dollars earned. The lower a player shoots on average in the final round, then their four round total should decrease based on the fact that the four or five rounds are summated at the end of the tournament. Also, the final round is the last chance players have to position themselves, and a low scoring average in that round usually translates into high dollars earned.

The hypothesized signs can be expressed this way:

Ho: Dd ≤0 Ho: Da ≤0 Ho: Ba ≤0 Ho: Gr ≤0 Ho: Ss ≤0

Ha: Dd >0 Ha: Da >0 Ha: Ba >0 Ha: Gr >0 Ha: Ss >0

Ho: Sa ≥0 Ho: Pa ≥0

Ha: Sa ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download