ANNOUNCEMENTS updated on 1/19/2002 12:54 PM



ANNOUNCEMENTS

by Prof. Vinod for students of Statistical Decision Making or Stat II

Welcome to Stats 2. Statistical Decision Making Let us have fun with statistics this semester!

The due dates for lessons are found at the fordhamstat website

for ANY software problems contact support@

The software you buy is called statistics (Discovering Statistics) NOT business statistics



step 4 CourseID use Fordhamstat



discusses the sampling distribution of xbar=mean of x, Central Limit Theorem



has Mean Squared Error, unbiasedness, confidence intervals



has confidence intervals when ( is unknown (using t distribution)



has a better discussion of confidence intervals, discussion of point and interval estimation is useful for lessons 8.1, 8.2 and 8.3

We often need quantiles of the normal distribution in these lessons. (Find z from tail area) If left tail is 0.025 or 2.5% then the R command qnorm(0.025) gives 1.96 for a 95% confidence interval. For a 99% interval, we have 0.005 in the tail and accurate quantile is 2.5758. Hawkes software uses 2.575, which is less accurate than R. You will have to use the Hawkes values in webtests.

> qnorm(0.025, lower.tail=F)

[1] 1.959964

> qnorm(0.005, lower.tail=F)

[1] 2.575829



has hypothesis testing intro Lessons 9.1 to 9.5.



has one parameter hyp testing discussion



has two parameter hyp testing discussion

February 23, 2012 Midterm. Final Exams are May 10 or May 14 according to Univ. Calendar. For the final exam, we expect to do the WEBtest part of the final on a day closest to the last day of classes. (May 3, 2012)

What is on the Midterm?

There will be a webtest on the day of the mideterm:

For the midterm, we begin with computer lessons 8.1 (Estimating proportions), 8.2 and 8.3 (Estimating means with σ known), 8.4 (Estimating means with σ unknown),

8.5 (Estimating variance is included for midterm),



9.1 to 9.6 (hypothesis testing for means and proportions)

Chapter 5 of the textbook on correlations. You can refer to the 94 slides of a PowerPoint presentations are on my web page at for the term paper.



Also,



has over 80 slides for regression and multiple regression, forecasting, F distribution, testing examples, interpretation, etc. These are also useful for the term paper

Also useful for term paper are: Lesson 11.1 (correlations) is worth doing, even if the formulas used by the software are numerically not reliable in large data sets. They are OK for the small data sets used, however.



has Intro to Linear regression and correlation

Note that you must use R software for your term paper.

One of best examples of the term paper is at



It has the R program, term paper itself, checklist

Note that the filename is lastname.doc

All abstracts of last year's term papers are at



These might help you choose your topic

older examples are:





See the checklist below.

First part of your term paper is due before spring break (March 8, 2012) in the form of one piece of paper (hard copy) is due. Please provide your name,

possible Title of the term paper,

Your e-mail address,

Name of your dependent variable y (indicate your data source, e.g., URL of the website) Name of your first regressor variable x (data source)

Name of your second regressor variable z (data source)

At least 40 words Abstract (no more than 120) words stating why the choice might be interesting to any potential reader. You will be allowed to completely change this part of the term paper when you submit the final version. I just want to make sure that you have started thinking about it.

y= effect or response variable

x= first causal variable

z=second causal variable

Some Hints:

When you go to various data sources first find what is available, Is it enough (25 or so data points), are they numbers or categorical variable. We need only ratio type numerical variables. Now first pick some 6-7 variables, then think which might be the cause and which the effect. The data can be annual, monthly, weekly etc. The data need not be for the same year. Data can be for different states in USA in a particular time period. Data can be for different countries in a particular time period. For example, choose x=income, z=inflation y=personal consumption, using 1950 to 1999 data.

When you discuss why your topic is interesting, state why it might be interesting to the reader, not to you personally.



Is a good source for Bureau of Labor stats data



In the undergrad resources page at my website right column has several data sources.



Within R try packages called Ecdat, PASUR and AER. They have all kinds of data.

Also type library(help="datasets")

The page should be in the form of MS-Word file with ‘doc’ as extension.

The file name should be your last name followed by number 1 (no spaces) dot doc. For example, if your last name is smith the one page submission should be a file named smith1.doc. If two of you have the same last name, please insert enough initials (without spaces) to make it unique. For example if the names are John Johnson and Dick Johnson the file names should be johnsonj1.doc and johnsond1.doc respectively. Filenames should have no spaces.



has everything I know about R, including how to get it. As I learn stuff, I regularly update this file. It may be too long for your purposes but being a MS-word file you should use the search CTRL+f for key words of interest.

All students should visit:



and

It will guide you to everything in R in a systematic manner.

and then get your own R by visiting



click on CRAN under download

Then pick a mirror in USA

Click on Download and Install R for windows

Then click on base and you get R to your computer

Please get some packages (car, fBasics) also from the Internet, as we need them.

Use the packages menu of R-GUI or console⋄ set mirror from USAA then ⋄ Install Packages and choose car and fBasics from the alphabetic list. …

Lesson 11.2 and 11.3. Following is a hint on how to do it in R

library(car) #u need to install package called car

x=c(1,3,5,6)

y=c(3,6,9,11)

cor.test(x,y)



This has R commands for doing most of the stuff in ch13ppln.ppt

Lessons 11.2 and 11.3 are due soon

MUTLIPLE REGRESSION: (lesson 11.4)



has Multiple regression 2 or more regressors and model building

The numerical example of pie-sales is done in R at:



Midterm on February 25 (Thursday, worth 20% of your grade) will be a webtest (worth 15%) + jargon type test of your understanding of the material in PowerPoints. For example:

(1) Describe type I and Type 2 errors and probabilities in a Table, (2) plots of regression errors, (3) show confidence intervals for line of regression in a plot, (4) show the plot having analysis of variance (SSE, SSR and SST), etc.

The jargon test worth 8% will be on any day, un-announced (surprise) always covering the material till that day and may not even take place.

What is the test statistic for individual regression coefficients? (Hint: t test)

What is the test statistic for all regression slopes as a set? (Hint F test)

Note that the format of the chi-square test statistic for variance and F statistic for all slopes is different from most others studied so far.

The Midterm Webtest will be on Feb. 25, will cover computer lessons 8.1, 8.2, 8.3, 9.1, 9.2, 9.3, 9.4, and 9.5 only. Lessons 11.1 to 11.3 are not covered in the webtest, even though they were assigned. I will eventually test your knowledge of that material (regression analysis) through the term paper assignment.



has estimation and hyp testing for two population parameters

useful for lessons 10.2 (comparing two means large independent samples)

10.3 ( comparing same for small samples, only if we know that they have a common variance then we use pooled variance formula.) to 10.4 (comparing two means dependent

samples e.g., same person at two different times) 10.1 (comparing two proportions, large independent samples, filenames: lesson84c.doc, lesson84.doc). ch09.ppt also has

expected values of cell frequencies in a contingency table, Chi-sq test for it, etc.

x=c(1,3,5,6); y=c(3,6,9,11)

t.test(x,y)# does testing of the difference of two means by using the t distribution

A package named `PASWR' (prob and stats with R) package has tsum.test and zsum.test and tsum test when summary stats are arguments to the function

library(fBasics)

help(TwoSampleTests) # Two Sample Tests in fBasics package

before the Easter holidays is when the complete term paper is due, There is a lateness penalty of one point for each day after that date.

Please submit your paper in 2 forms (1) hard copy on paper (2) e-mail it to

vinod@storm.cis.fordham.edu

I have opened this e-mail for this purpose only. Do not use it to send me any other e-mail by using the storm account. I do not monitor this e-mail except during the next month or so for the term papers only.

The paper should be in the form of a MS-Word file with ‘doc’ as extension.

1) The file name should be your last name dot doc. For example, if your last name is smith the one page submission should be a file named smith.doc. If two of you have the same last name, please insert enough initials (without spaces) to make it unique. For example if the names are John Johnson and Dick Johnson the file names should be johnsonj.doc and johnsond.doc respectively. Filenames should have no spaces.

2) In the electronic version do please include all the data and computer outputs omitted from the hard copy version (to save trees).

3) Do please attach checklist to both versions and indicate the points you claim. False claims will result in penalty (like the way IRS works).

grading yourself subject to audit by me.

Checklist to be attached at the front of your hard copy of your paper

1) Is the title sufficiently descriptive? Is it typed centered, bold, properly capitalized, in large font and along the top line?

0.5 points. I claim full credit for this 

2) Is there your Name, affiliation and E-mail along second and third line?

0.5 points. I claim full credit for this 

3) Is there the (*) Footnote with date, the phrase “partial fulfillment,” etc.

0.5 points. I claim full credit for this 

4) Is the Abstract LESS THAN 125 words and descriptive of the model? (Please do give data sources in the initial version but delete it from the final version of the abstract as attached to the paper)

2.5 points. I claim full credit for this 

5) Are there exactly four numbered sections with bold titles present? (Cut half a point if raw data and R outputs are missing from the electronic version. These are Appendices to the paper).

0.5 points. I claim full credit for this section 

Points earned in PART 1 so far are: Out of 4.5 .

PART 2

6a) Are the data sources mentioned?

0.5 points I claim full credit for this 

6b)Are descriptive stats computed and explicitly mentioned?

0.5 points I claim full credit for this 

6c)Is outlier detection done for all three Y, X and Z variables? Are the number of outliers on each side are stated in the text of the paper?.

1 point I claim full credit for this 

7) Are 4 figures present? Fig 1 has Y vertical &X horizontal (Y vs. X), Fig 2 has Y vs. Z,

Fig 3 has residuals vs X and Fig.4 has residuals vs. Z.

0.5 points I claim full credit for this 

Note that if the residuals do not show any particular pattern, the model as a whole is good. Are similar observations about the residuals mentioned in the text of the paper?

0.5 points I claim full credit for this 

Are the four figures labeled correctly with titles and axes having labels?

0.5 points I claim full credit for this 

Are the Correct variables represented on the vertical axis ?

0.5 points I claim full credit for this 

Are they explicitly discussed in the text?

0.5 points I claim full credit for this 

Points earned in PART 2 so far are: Out of 4.5

PART 3

8) Is regression equation with equation numbers reported? e.g.:

| |Y = 0.0123 + 2.3345 X − 0.3567 Z + residual |(1) |

(coefficient numbers might be different for you)

Or = 0.0123 + 2.3345 X − 0.3567 Z

Is the equation centered? It should be centered, if not cut 0.25 points.

0.5 points I claim full credit for this 

9) Are there two t tests ? Did the student correctly look up critical t values from the t table in the textbook?

The NULL and ALTERNATIVE hypotheses should be correctly stated for each test.

1.5 points I claim full credit for this 

Is there one F test? Did the student correctly look up critical values for F from the F table in the textbook?

1 point I claim full credit for this 

Are they discussed somewhere in the text?

0.5 points I claim full credit for this 

10) Is there a punch line for conclusion?

1 point I claim full credit for this 

Points earned in PART 3 are: Out of 4.5

If you claim more points than you deserve after I audit your term paper, you will lose at least twice as many points as the task is worth. THIS is Honor system. Total points claimed out of 13 are:

If the coefficient of determination R2 is less than 0.2, you have chosen a very bad model and I may deduct points for the bad choice. But in general, I can understand if your model turns out to be not very good. As long as R2 > 0.2 you will not be penalized for a bad model. In fact you must have text in the paper acknowledging that the model needs revision if your R2 < 0.4 or if individual slopes are statistically insignificant..

COMPUTER FILE BACKUPS: I want you develop a habit of keeping backup copies of all your work (especially term paper). I will not accept any excuse that the computer broke down, had a virus, or the printer broke down, etc. The deadlines are firm. I suggest beating the deadline by a week to be safe and to have the time to do last-minute embellishments. E-mail your term paper versions to yourself to avoid possible breakdowns, loss of data or other mishaps.

All needed R commands are found at:



What are the R commands for?

1) Reading data

2) Running regressions including multiple regressions

3) Descriptive stats (which package do you use and how?)

4) Outlier detection

5) Plots and plot headings

6) Confidence intervals

7) Analysis of variance

8) Finding the residuals of a regression

9) Printing the results of regression to the screen.

10) Computing the portion of test stats which comes from z, t and F tables.

11) Computing the portion of p-value statistics which comes from z, t and F tables.

|9.1 Hypothesis Testing Proportions: P Value |

|9.2 Hypothesis Testing Proportions: z Value |

|9.3 Hypothesis Testing Means: P Value |

|9.4 Hypothesis Testing Means: z Value |

| |

|9.5 Hypothesis Testing Means: t Value |

| |

| |

|9.6 Direct Mail |

|9.7 Type II Errors |

|9.8 Hypothesis Testing about a Population Variance |

|9.9 Chi-Square Test for Association |

|9.10 Chi-Square Test for Goodness of Fit |

| |

|10.1 Hypothesis Testing - Two Proportions (Large Independent Samples) |

|10.2 Hypothesis Testing - Two Means (Sigma Known) |

|10.3 Hypothesis Testing - Two Means (Sigma Unknown) |

|10.4 Hypothesis Testing - Two Means (Dependent Samples) |

|10.5 Hypothesis Testing - Two Population Variances |

| |

|11.1 Scatterplots and Correlations |

|11.2 Fitting a Linear Model |

|11.3 Regression Analysis I |

|11.4 Multiple Regression |

|11.5 ANOVA Regression |

| |

| |

|12.1 ANOVA |



has Chi-square hypothesis testing for one or two population variances

Lesson 9.8 population variances

(lessons 9.6 on direct mail and 9.7 does Type II error simulation.

Lesson A.6 does two population variances and A.7 has

overall hypothesis testing

These are not formally assigned, but you are encouraged to try them)



has tests for goodness of fit and contingency analysis

Useful for Lessons 9.9& 9.10

Lesson 9.9 Chi-sq test for contingency tables is a bit complicated.

Lesson 9.10 goodness of fit is assigned before assigning lesson 9.9.

Lesson 10.5 uses F test = ratio of s1 squared to s2 squared see slide 17 of ch10ppln.ppt

First part of Final exam will be WEBtest on April 30, 2012 or May 3, 2012 (the last day of classes). It will be done right on the computer as a webtest. You will have only one hour and 10 minutes. You will have to work within Hawkes’ software. Be sure to bring your calculator along and your access code.

Bring an extra calculator if you have one to cover the possibility that yours does not work for some reason.



has forecasting and time series analysis for business



has nonparametric stats



has Intro to Decision Analysis

A second example with student choosing among jobs with R solution

|prob |0.3 |0.2 |0.2 |0.3 |

| |Boom |Growth |Slowdown |Recession |

|Large factory | 100 | 50 |-120 |-150 |

|Medium | 90 | 120 |-30 | -40 |

|Small | 40 | 30 | 20 | 10 |

maximax=120 med,

[1,] 0 70 140 160

[2,] 10 0 50 50

[3,] 60 90 0 0

-29, 33, 29 are expected values

Decision Problems

Q 1) Indicate whether the following behavior represents maximax or maximin criterion.

A trucker is driving in an area where the police radars are present. He still drives 15 miles above the speed limit. (Answer: Maximax)

A firm has trouble meeting its payroll. It hires a PR firm and advertises in the hope of turning things around when the managers know that if the ads fail, there is no money to pay the PR firm either. (Answer: Maximax)

An owner of trucking company buys liability insurance. (Answer: Maximin)

Q 2) A shipper wants to send a million dollars worth of goods. Insurance costs $50k. What is the decision if he uses maximin criterion (Answer: buy insurance)

if he uses maximax ? (Answer: Do not buy insurance)

Minimax regret? (Answer: buy insurance)

Q 3) Sal manages Stadium food. Sal must decide how many hot dogs to bring for sale. The number of hot dogs sold depends on the game’s attendance. Sal pays $1 for each hot dog and bun. Hot dogs are sold to fans during the game for $2. Any leftover hot dogs are sold to the mens’ dorms for $0.25 a piece. Sal estimates that demand for hot dogs for next week’s game will be either 15,000, 20,000, 30,000 or 35,000. Setup payoff table and compute maximax, maximin, and minimax-Regret and expected value solutions from the payoff table.

#There are 4 demand levels so 4 columns for unknown game attendance

#and same action items (hotdogs brought to the game)

prob=c(.3,.2, .2, .3)

gross.rev=matrix(c(30000, 30000, 30000, 30000, 30000, rep(40000,3), 30000, 40000, rep(60000,2), 30000, 40000, 60000, 70000),4,4,byrow=T)

gross.rev #prints gross revenue matrix

gross.rev2=matrix(c(rep(0,4), 5000*.25, rep(0,3), 15000*.25, 10000*.25, 0,0, 20000*.25, 15000*.25, 5000*.25,0), 4,4, byrow=T)

gross.rev2 #secondary revenue from selling leftover hotdogs

cost=matrix(c(rep(15000,4), rep(20000,4), rep(30000,4), rep(35000,4)),4,4,byrow=T)

cost #prints the cost matrix

payoff=gross.rev+gross.rev2-cost

payoff

#[1,] 15000 15000 15000 15000

#[2,] 11250 20000 20000 20000

#[3,] 3750 12500 30000 30000

#[4,] 0 8750 26250 35000

nact=nrow(payoff) #number of actions

nj=ncol(payoff) #number of states of nature

maxact=rep(0,nact) #initialize

i=1

while (i ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download