The Nielsen ratings are used for television networks to ...



[pic] [pic] [pic]

“Must See TV”: Predicting the Nielsen Ratings

[pic]

“Must See TV”: Predicting the Nielsen Ratings

Introduction:

The Nielsen ratings gauge the number of people who are watching television programs and the characteristics of those audiences, which in turn is used by both advertisers and television programmers. Nielsen ratings are used as currency in the market of advertiser-paid television. When advertisers want to reach certain audiences, they place ads on television shows whose viewers display the characteristics of their target market. The larger the audience of a particular show, the more money the station can charge the advertisers – advertising rates are based on per thousand viewers. Hence, the networks have a serious interest in increasing their Nielsen share to generate a greater revenue stream from advertising. It is also helpful for programmers to know which shows are being watched so that they may discontinue shows that are not making money (by not drawing in enough advertising revenue). Concurrently, the advertisers are interested in the ratings so they can gauge the size and characteristics of their audiences. The most desirable audience for advertisers is the 18-49 demographic. This demographic, in turn, becomes the most desirable target audience for television programmers.

There are two major periods, called sweeps, in May and November, during which Nielsen conducts a complete diary measurement across the nation from all 210 markets. This assessment is completed by sending out a diary to viewers in which they record what show they watched, what channel they watched, as well as who was watching. Nielsen asks the participants to complete this survey per quarter hour of viewing and the participants then mail in their completed diaries. These sweeps periods are when the stations air their best programs and top news stories in an effort to increase viewership.

Nielsen selects its TV sample from a diverse group of people – from renters to homeowners, low income to high income, families with children versus no children, etc. Over 5,000 households are used in the sample, containing over 13,000 people. After this diverse sample is selected, the families must agree to participate in the research study. To ensure that Nielsen’s sample audience is reflective of the nation as a whole, they compare their audience’s characteristics to the US Census Bureau data.

The stations use Nielsen because it is not easy to determine how many viewers a show has at any one time as they are aired from a satellite or cable system. Nielsen estimates the audience by taking a sample and then counting the number of viewers in that sample. The ratings refer to the percent tuned to a particular program during the average minute. Successful programs are defined by having the larger audiences and the coveted 18-49 demographic.

Goals of the Analysis:

This paper will analyze the various components that could possibly predict the Nielsen ratings. We will look at network, type of show, awards nominations, day of the week and percent share within the program’s timeslot. Is there a particular station or network that is represented in the highest rated programs? Does day of the week factor into the audience size? Do awards nominations impact the amount of viewership? Does the type of show impact viewership? How important is it to win in your program’s timeslot to get better overall ratings? We anticipate that some of these variables will predict the likelihood of a program falling into the top ratings, which in turn will dictate the amount of advertising revenue a particular station can generate. As previously stated, these questions are of great relevance to both advertisers and television programmers.

Although the 18-49 demographic is not a predictor variable, since it is a simultaneous output of the Nielsen report, we will attempt to prove the correlation between high overall ratings with those of the 18-49 demographic. This will show that programmers schedule programs whose audiences reflect the characteristics desired by advertisers.

It is important to note that both advertisers and television programmers are frustrated with Nielsen’s data-gathering methods for a variety of reasons. However, they are stuck with Nielsen for a lack of anything better. We will mention the sources of such frustration, just to indicate that Nielsen ratings are perhaps not the best judge of whether a show deserves a top rating. Taking that into consideration, we will still try to model Nielsen ratings based on the predictor variables mentioned above.

Sample Bias:

Nielsen households must all agree to participate in the survey, which is a potential source of self-selection bias. We must ask ourselves if those households that agree to participate in the survey take on certain characteristics that other households do not. Although Nielsen compares its audience to the national population, this issue may still present a self-selection bias.

A second issue, that Nielsen readily admits, concerns whether the audience is actually watching the television program or if their televisions are merely turned on. There really is no way to determine the accuracy of this information except for the reliance on the diary that the audience fills out during the year. This not only frustrates the programmers, but the advertisers as well. While Nielsen measures the viewership of programs, it does not measure the viewership of commercials, which is what the advertisers are interested in.

Thirdly, the diaries that the selected participants fill out during the sweeps periods have issues of their own. The diaries are administered to supplement the metered data by obtaining a more precise breakdown of who is watching what shows. However, according to some critics, the diary favors programs that air at the beginning of the diary week – Thursdays. Viewers are apt to be more diligent about recording their viewing habits towards the beginning of the diary week versus the latter half. This might explain, which we will discuss in detail, the tendency for Thursday shows to be among the top-rated programs.

Finally, because Nielsen measures in-home viewing, rather than out-of-home viewing, the ratings of shows that are watched in groups at locations such as bars, dorms, health clubs, etc. could be deflated. Such programs that are under-measured are sports events, such as “Monday Night Football”, which are typically watched in a bar or large gathering area. Another problem regards the representation of certain audiences that tend to watch shows in large groups. For instance, Boston has the highest concentration of college students of any city in the country. Consequently, these students, who are within the desired 18-49 age demographic, are not fully represented in their viewing habits since Nielsen collects its data solely through in-home viewing.

The Data:

The data presented in our analysis is based on the 1998-1999 television season, starting in September of 1998 and ending in September of 1999. We have included the Nielsen top-rated 200 programs for that season. The ratings were obtained from the Nielsen Media Research report for prime-time, network-aired television programs. The response of interest is the overall Nielsen share (percent of the average audience) for each show. The share is the determinant of the actual number rating (1-200) for each show, however share is a more accurate target variable as it determines the relevant audience size – which is ultimately used for advertising rates. The predicting variables considered in the analysis are as follows:

1. Network: The networks represented are NBC, CBS, ABC, FOX, UPN and WB. These were obtained from the Nielsen report that corresponded each show to its network. Only prime-time shows on these networks were included in this Nielsen report for the top 200 shows.

2. New Show versus Recurring: A list of all new shows for the 1998 season was obtained from the Infoplease website (). This includes both new shows that started in September of 1998 as well as new shows that come in as replacements in January of 1999.

3. Golden Globe Nomination: All shows that had Golden Globe nominations for the following categories have been identified: Television Series – Drama; Actress in a TV Series – Drama; Actor in a TV Series – Drama; TV Series – Comedy; Actress in a TV Series – Comedy; Actor in a TV Series – Comedy. This data was obtained from the Entertainment Weekly Online website ().

4. Emmy Nomination: All shows that had Emmy nominations for the following categories have been identified: Outstanding Comedy Series; Outstanding Drama Series; Outstanding Lead Actor in a Comedy Series; Outstanding Lead Actress in a Comedy Series; Outstanding Lead Actor in a Drama Series; Outstanding Lead Actress in a Drama Series; Outstanding Supporting Actor in a Comedy Series; Outstanding Supporting Actress in a Comedy Series; Outstanding Supporting Actor in a Dramatic Series; Outstanding Supporting Actress in a Dramatic Series; Outstanding Guest Actor in a Comedy Series; Outstanding Guest Actress in a Comedy Series; Outstanding Guest Actor in a Drama Series; Outstanding Guest Actress in a Drama Series; Outstanding Variety, Music or Comedy Series; Outstanding Variety, Music or Comedy Special; Outstanding Performance in a Variety or Music Program; Outstanding Animated Program; Outstanding Writing for a Comedy Series; Outstanding Writing for a Drama Series; Outstanding Writing for a Variety or Music Program. This data was obtained from the Entertainment Weekly Online website ().

5. Day of the Week: The program schedule for the 1998-1999 season was obtained from the Infoplease website ().

6. Type of Show: The shows were categorized in the following categories: comedy; drama; news; sports.

7. Share in Program’s Timeslot: The Nielsen report provided each program’s share within its timeslot.

Descriptive Statistics:

Before modeling the regression, we looked at the general characteristics of the data. It is important to note that three of our predictors are categorical (day of week, type of show and network), so we were not able to produce descriptive statistics on these three variables. However, for the other 4 predictors and the target variable, the descriptive data is below. For a frame of reference, the highest share value achieved for the 1998-1999 season (which was the NBC television drama, E.R.) was 14.6%, which corresponds to an average audience size of 14.5 million viewers. This show had a 25% share in its Thursday night timeslot. The minimum share value was 1.2% (which was UPN’s Home Movies), which corresponds to an average audience size of 1.2 million viewers. This show had a 2% share in its Monday night timeslot.

Variable N Mean Median TrMean StDev SE Mean

HH AA % 200 5.576 5.950 5.441 2.896 0.205

HH US Sh 200 9.520 10.000 9.333 4.938 0.349

Premiere 200 0.2700 0.0000 0.2444 0.4451 0.0315

Globes N 200 0.0800 0.0000 0.0333 0.2720 0.0192

Emmy Nom 200 0.1200 0.0000 0.0778 0.3258 0.0230

Variable Minimum Maximum Q1 Q3

HH US AA 1.200 14.600 2.725 7.500

HH US Sh 2.000 25.000 5.000 13.000

Premiere 0.0000 1.0000 0.0000 1.0000

Globes N 0.0000 1.0000 0.0000 0.0000

Emmy Nom 0.0000 1.0000 0.0000 0.0000

Network:

We first compare overall share based on the network that the show is on. The following graph shows side-by-side boxplots of average share separated by the six possible networks. Relative frequencies can also be determined from these boxes. The three major networks, NBC, ABC and CBS, all seem to have the same average total median share within the top 200 programs. FOX’s median share is below those of the three major networks. These four networks have a moderate amount of variability from their median share, whereas the UPN and WB networks have little to no variability. This is because the UPN and WB do not have many top rated shows in terms of share. The “lion’s share” of ratings is held by the three major networks.

Type of Show:

Does the type of show have any impact on the total share? Sports and news programs have similar median shares, which is reflected in side-by-side boxplots in the graph below, and represent an extremely small portion of the shows in the top 200. This must mean that those news and sports shows that are included got very high ratings when they aired. Comedy and drama programs have the same median shares and represent the bulk of the shows represented. The boxplots show high variability because their representation covers the entire list – from highest-rated to least-rated. There is one outlier based on type of show in the drama category. This happens to be the top-rated show on the Nielsen ratings at 14.6% – E.R.

Day of Week:

There seems to be a fair representation of each day of the week’s programs throughout the entire list, except for Saturday programming. Saturday has the least variability and its programming seems to fall in the bottom half of the ratings list with little to no representation in the upper half. Most of the medians are within the same range of about 5-7%. The outlier in this graph is a highly rated (8.4%) Saturday show on CBS, which is Walker Texas Ranger.

Although there is no clear winner among day of the week, it is important to note that of the top nine rated programs, five of them fell back-to-back on the NBC Thursday night lineup – otherwise known as “Must See TV”. Unfortunately, we do not have this specific information for all of the programs included on the list, but the Thursday night lineup has been consistently present in the top of the Nielsen ratings. This indicates that from 8pm until 11pm, most viewers

start watching Friends and do not change the channel until after E.R., which airs at 10pm.

Emmy and Golden Globe Nominations:

The median overall share is higher for those shows that have either Golden Globe or Emmy nominations as seen below in the side-by-side boxplots. Those programs with nominations also have less variability around their medians. This seems like it would be a fairly accurate predictor given the differences in medians between shows with and without nominations. The outlier in the Emmy nominated category is, again, E.R. This seemed surprising at first sight, but there is only one show between the outlier and the upper limit of the third quartile.

There is one caveat to mention with respect to the awards nominations. For programs such as the CBS Sunday Movie and Dateline, awards are nominated based upon particular movies or news stories. These types of nominations were excluded from this analysis since awards are not granted for the overall programs and the inclusion would have overemphasized the award.

Premiere Status:

Whether or not the show is new that season seems to have no bearing on the Nielsen ratings, which is evidenced in the boxplots below. This makes sense because old shows that were not watched were most likely canceled, so they would not be in the pool to bring the ratings down for the old shows.

Share of Timeslot:

The variable with the most predictive power seems to be the share of the timeslot. This seems to indicate that if a network wins its timeslot, the higher its overall share will be. This makes sense as the highest overall rated shows grab almost a quarter of the total audience at that particular time. The most popularly viewed shows (according to Nielsen) attract a large percent of viewers during that timeslot. This might also indicate that the largest audiences are drawn to the same viewing hours. The correlation between overall share and share of timeslot is extremely high at .986 and has a p-value of zero, which indicates rejecting the null hypothesis that the two are not correlated. The scatter plot below shows the strong correlation as well.

Multiple Regression Models:

Multiple Regression Models Using 7 Predictor Variables

Our initial model uses all 7 predictor variables in the belief that some of the categorical variables could have some predictive power of the relative success of a show. The results follow:

Analysis of Variance for HH US AA, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Network 5 1101.709 0.342 0.068 0.45 0.811

HH US Sh 1 520.899 373.588 373.588 2468.53 0.000

Type 3 1.986 0.788 0.263 1.74 0.161

Premiere 1 0.610 0.298 0.298 1.97 0.162

Globes N 1 1.239 0.077 0.077 0.51 0.477

Emmy Nom 1 1.393 0.192 0.192 1.27 0.262

Day of W 6 13.976 13.976 2.329 15.39 0.000

Error 181 27.393 27.393 0.151

Total 199 1669.205

Term Coef StDev T P

Constant -0.1813 0.1356 -1.34 0.183

HH US Sh 0.58334 0.01174 49.68 0.000

Premiere 0.09378 0.06678 1.40 0.162

Globes N 0.0978 0.1371 0.71 0.477

Emmy Nom 0.1369 0.1217 1.12 0.262

Also, from the above data, we derived a standard error of 0.39, an R2 of 98.36% and an overall F-statistic of 604.05. All of these factors indicate a good fit in the regression, however, the network, type of show, premiere status, Golden Globes nomination and Emmy nomination all have high p-values which indicate that they are not strong predictors and should therefore be removed from the analysis. We subsequently performed the following analysis removing these variables.

Multiple Regression Model Using 2 Predictor Variables

Analysis of Variance for HH US AA, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

HH US Sh 1 1622.32 1600.78 1600.78 1.0E+04 0.000

Day of W 6 16.92 16.92 2.82 18.07 0.000

Error 192 29.96 29.96 0.16

Total 199 1669.20

Term Coef StDev T P

Constant -0.09756 0.06457 -1.51 0.132

HH US Sh 0.585795 0.005784 101.28 0.000

Day of W

F -0.45486 0.07626 -5.96 0.000

M 0.42188 0.06163 6.85 0.000

S 0.02782 0.06431 0.43 0.666

SA -0.5933 0.1102 -5.38 0.000

T 0.14330 0.06402 2.24 0.026

TH 0.21767 0.07018 3.10 0.002

Based upon the above data, we derived the standard error of the estimate to be 0.40, an R2 of 98.21% and an overall F-statistic of 1463.61. These statistics support an extremely strong regression. Both variables show small p-values which indicates that tail probabilities are sufficiently low to reject the null hypothesis that there is no relationship between the predictor variables and the response variable. As such, each predictor variable in this model is considered statistically significant.

In addition, we note that the general linear model does not provide us with VIF statistics therefore it is difficult for us to determine whether there is any multi-collinearity exhibited here. However, since all variables have low individual p-values and there is a high overall F-statistic, there is no indication that there is any multi-collinearity. This is a limitation in using this model.

Assumptions

We then tested the assumptions made with regard to the above model using 2 predictors. Based on the normal probability plot of the residuals, it appears that the residuals are normally distributed. In addition, the residuals versus the fitted values graph indicates no pattern to the data, therefore showing constant variance. When looking at the residuals versus the one continuous variable in the model, share of timeslot, it also appears that there is no pattern to the data. Therefore, our regression assumptions appear to be valid.

Implications of the Model

Based upon the above model, it appears that the only two relevant factors are the program’s share of its timeslot and the day of the week. However, considering the day of the week to be a strong predictor seems counterintuitive since the side-by-side boxplots showed that there wasn’t a significant difference in the medians of the different days of the week and that the only real difference was in the variability. We have previously shown that there is a very strong correlation between the average audience and the share of the timeslot, so it seems reasonable that it is a very strong predictor in our model. This model therefore answers the question of how important it is for a program to win its timeslot.

However, timeslot share and average audience are actually derived from the same data and may be so closely correlated that the timeslot share obscures the importance of the other variables and makes all but the day of the week seem irrelevant. This can be seen by the apparent importance given to the day of the week by this model, which had seemed to be one of the weakest predictors when we had simply observed the data.

Multiple Regression Model Using 6 Predictor Variables (excluding Share of Timeslot)

We then performed a regression on all the predictor variables, except share of timeslot, to determine whether other variables are significant in determining the Nielsen ratings. The results are as follows:

Analysis of Variance for HH US AA, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Network 5 1101.709 712.070 142.414 64.64 0.000

Type 3 45.813 55.330 18.443 8.37 0.000

Premiere 1 11.684 3.662 3.662 1.66 0.199

Globes N 1 68.949 12.843 12.843 5.83 0.017

Emmy Nom 1 22.789 21.624 21.624 9.82 0.002

Day of W 6 17.280 17.280 2.880 1.31 0.256

Error 182 400.980 400.980 2.203

Total 199 1669.205

Term Coef StDev T P

Constant 5.9713 0.2107 28.34 0.000

Premiere -0.3259 0.2528 -1.29 0.199

Globes N 1.2452 0.5157 2.41 0.017

Emmy Nom 1.4215 0.4537 3.13 0.002

Based upon the above regression, we calculated a standard error of the estimate of 1.48, an R2 of 75.98% and an overall F-statistic of 33.86. However, it appears that there are variables that should be removed based upon their high tail probabilities and low t-statistics, premiere status and day of the week. This seems more in line with what we would expect based upon the boxplots we looked at earlier. Therefore, we reran the regression eliminating these variables.

Multiple Regression Model Using 4 Predictor Variables (excluding Share of Timeslot)

We obtained the following results when we reran the regression:

Analysis of Variance for HH US AA, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

Network 5 1101.71 772.18 154.44 69.34 0.000

Type 3 45.81 71.82 23.94 10.75 0.000

Globes N 1 75.08 13.79 13.79 6.19 0.014

Emmy Nom 1 25.65 25.65 25.65 11.52 0.001

Error 189 420.95 420.95 2.23

Total 199 1669.20

Term Coef StDev T P

Constant 5.9623 0.1910 31.21 0.000

Network

ABC 1.4305 0.2322 6.16 0.000

CBS 2.4050 0.2373 10.13 0.000

FOX -0.1917 0.2453 -0.78 0.436

NBC 1.9114 0.2312 8.27 0.000

UPN -3.1593 0.2509 -12.59 0.000

Type

Comedy -1.0941 0.2235 -4.90 0.000

Drama -1.1076 0.2294 -4.83 0.000

News 0.1340 0.3193 0.42 0.675

Globes N 1.2835 0.5159 2.49 0.014

Emmy Nom 1.4890 0.4387 3.39 0.001

As we can see above, all of the predictor variables seem to be statistically significant at a 98% level. In addition, we calculated the standard error of the estimate to be 1.49, the overall F-statistic to be 55.98 and the R2 to be 74.78%. Again, these factors indicate that our model is statistically significant. Since the overall F-statistic increased from the previous model and the R2 decreased only minimally, this appears to be a more appropriate model. The R2 indicates that the model explains about 75% of the variability of the Nielsen ratings. The relevant variables seem to be the network, the type of show, and whether or not it was nominated for a Golden Globes award or an Emmy award. For the categorical variables, all categories seem to add some element of predictability to our model, with the exception of the FOX network and the News type of program, indicated by their high p-values. Therefore, we can be comfortable in using this model for all categorical variables except these two.

Again, we are not able to obtain VIF statistics for these variables since many are categorical. Therefore, we cannot determine if multi-collinearity is inflating the overall F-statistic. Since, however, individual F-statistics and the overall F-statistic are both sufficiently high, there is no indication that this is a problem. This is yet another limitation to the general linear regression model.

Assumptions:

We must now check the assumptions for this model as done for the previous model. The normal probability plot of the residuals and the histogram of the residuals indicate that the residuals are normally distributed. The residuals versus the fitted values however, seems to violate the non-constant variance and exhibits homoscedasticity. This assumption violation indicates that the results may not be trusted. The reason for the non-constant variance is not necessarily based upon more variability in the data when overall rating is higher, but indicates that these predictor values will play less of a role in the higher rated programs. This will be another limitation to this model.

Conclusion:

Removing the variable for share of timeslot seems to provide us with a less accurate but more intuitive model. If included in the model, the share of timeslot would be the only variable needed to predict the Nielsen ratings (since day of the week only adds 1% to our R2). The following model accounts for 98% of the variability in the Nielsen audience percentage.

Average Audience Percentage = -0.09756 + (0.585795 * HH Share) +

Coefficient for Day of the Week

Therefore, winning your timeslot appears to be critical to achieving high Nielsen ratings.

Once the share of timeslot variable is removed from the model, other variables, which were previously obscured, come to the forefront as significant. Without using the timeslot share variable, our model can be represented as follows:

Average Audience Percentage = 5.9623 + (1.2835 * Globes Nomination) +

(1.4890 *Emmy Nomination) + Coefficient for Network + Coefficient for Type

An estimated 75% of the variability is explained by this model which concludes that awards nominations, network and type of show are the most appropriate predictors (of those within our analysis) of a program’s success (as defined by high ratings).

As stated earlier, this model has certain limitations. The first rests in the initial accumulation of the data by Nielsen Media Research, which is inherently biased. The second is based upon the fact that we do not know the true VIF’s for the categorical variables in our model and cannot determine the presence of multi-collinearity. And finally, the third surrounds the violation of the constant variance assumption in this model.

Other variables definitely would have an impact in determining the success of a particular show. One that we believe would be particularly helpful involves the affect that one show has on the success or failure of surrounding shows. We have previously seen this phenomenon through the Thursday night NBC lineup which contains both Friends and E.R. and which, no doubt, has an impact on the success of other programs like Jesse and Veronica’s Closet – two shows which consistently get terrible reviews.

Critics’ reviews could have been useful in the model as possible predictor variables, however, in our humble opinion, we do not believe that this will hold much predictive power. Furthermore, we believe that there would be a high degree of non-constant variance as some of the top-rated shows (Veronica’s Closet and Jesse) were consistently panned by the critics and have even been cancelled for the 1999-2000 season. However, some of the lower rated shows, such as X-Files, 3rd Rock from the Sun, and The Practice, are critically acclaimed yet do not receive a large audience.

Another important point to note is the high correlation between the overall household ratings and the ratings for the 18-49 demographic. The correlation is almost 90%, which is not surprising given that the networks gear their shows toward this audience that is the most desired by advertisers.

Finally, the most important thing to keep in mind is that people are fickle, writers get hired and fired, stars come and go and particularly long-running shows lose their creative edge. While this model can use several variables to predict ratings, it cannot predict such subjective factors. We, therefore, suggest that one use caution when predicting the ratings based on the variables included in this model.

-----------------------

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download