Simple Linear Regression – Assignment #7 ( points)



STAT 602 – Simple & Multiple Linear Regression (50 pts.)Spring 20171 - Mercury Contamination of Walleyes in Island Lake Reservoir Data Set:? Walleyes Island LakeGoal: Develop a regression models to predict/explain mercury level found in the tissues of a walleye (ppm) using length (in.). This assignment is similar to your simple linear regression handout; however, I want you to investigate mercury contamination levels found in walleyes (ppm) versus the length of the walleye (in.).?? The primary interest is in developing a walleye consumption advisory based on length for walleyes in Island Lake Reservoir near Duluth, so let Y=HGPPM and X = LGTHIN.Main Items to address: a.) Obtain a linear correlation measurement to initially investigate the linear relationship between these two variables.? Are these variables linearly related to each other?? Explain. (2 pts.) b.) Perform the overall regression usefulness test (i.e. HO: Regression is not useful vs HA: Regression is useful) to formalize your initial investigation of these variables.? What is your decision for this test?? Write a conclusion in everyday language for this test. (2 pts.)c.) Perform the test to ensure that the slope of our regression line is not zero (i.e. HO: βLGTHIN = 0 vs HA: βLGTHIN ≠ 0).? What is your decision for this test?? Write a conclusion using everyday language for this test. (2 pts.)d.) What is the RSquare value for this analysis?? In the context of this problem, carefully explain what this number is measuring.? (3 pts.)e.) Create a scatter plot of the data with the estimated regression line added. In the context of this problem, carefully interpret the y-intercept and slope of your estimated regression line.? Again, carefully explain what these numbers are measuring. (You need to do more than say they are the y-intercept and slope of the line.) (4 pts.)f.) Discuss whether or not the assumptions for this procedure are being meet.? Also, identify any outliers in the data set. If there are problems, I do not expect to you try to fix them, just identify them and for the purposes of the rest of the problem we will pretend they are not there. (4 pts.)Checking the assumptions:? > Model Appropriate: Make sure no existing trends remain in the residual plot.? > Constant Variance:? Make sure there is no megaphone patterns in the residual plot? > Independence: Don’t really need to check this as these data are not collected over time.? > Normality:? Make a histogram of the residuals and make sure they follow a normal distribution > Outliers: Any observations that fall outside ±2*RMSE are considered possible outliers. ? g.) It is recommended that humans should not consume more than one fish per month with mercury levels in its tissues greater than .5 ppm. Because your average walleye angler does not carry a gas spectrometer in their fishing boat, actually measuring the Hg level found in a walleye they have caught is a problem. However, it is very easy for an angler to measure the length of their walleye in inches. Using your regression, model what length of walleye would you recommend for the “do not eat more than one walleye exceeding _______ inches per month” advisory? (2 pts.)It is also recommended that humans should never consume fish with mercury levels exceeding 1 ppm in their tissues. Complete the following “we recommend that you do not eat any walleyes exceeding __________ inches from Island Lake”. (2 pts.)h.) Using your regression analysis, estimate the mean mercury level found in the population of walleyes in Island Lake that are the lengths below. Give both a single value estimate and a 95% confidence interval for the mean. Also give the correct interpretation of the confidence interval for each case.a) 21.2 inches in length (Note: this is the actual length of one of the walleyes in the data)? (3 pts.)b) 11 inches in length (Note: this is the actual length of one of the walleyes in the data)? (3 pts.)i.) Suppose you just caught a whopper walleye measuring 25.1 inches from Island Lake. What do predict the mercury level would be in this particular fish? Give both a single value estimate and an interval estimate, giving the correct interpretation of the interval estimate. (3 pts.)j.) Would you recommend using your model to predict the mercury level for a walleye that is 8 inches in length? How about 29 inches? Explain your reasoning. (1 pt.)k.) Would you recommend using this model to predict the mercury levels for walleyes in the Mississippi River? Explain. (1 pt.)l.) The Island Lake walleye data file also contains the weight (lbs.) for each of the fish sampled. Do you think using weight as opposed to length to establish consumption advisories is a good idea? Justify your answer. (2 pts.)2 - Waist Circumference and Deep Abdominal AT Data File:? Waist CircumferenceGoal: Develop a regression model to predict/explain deep abdominal AT (Y) using waist circumference (cm) as the predictor (X). Despres et al. in “Estimate of Deep Abdominal Adipose-Tissue Accumulation from Simple Anthropometric Measurements in Men”, American Journal of Clinical Nutrition, (1991), point out that the topography of adipose tissue (AT) is associated with metabolic complications considered as risk factors for cardiovascular disease. It is important, they state, to measure the amount of intraabdominal AT as part of the evaluation of the cardiovascular-disease risk of an individual. Computed tomography (CT), the only available technique that precisely and reliably measures the amount of deep abdominal AT, however, is costly and requires irradiation of the subject. In addition, the technique is not available to many physicians. Despres and his colleagues conducted a study to develop equations to predict the amount of deep abdominal AT from simple anthropometric measurements. Their subjects were men between the ages of 18 and 42 years who were free from metabolic disease that would require treatment. Among the measurements taken on each subject were deep abdominal AT obtained by CT and waist circumference (cm). The question of interest is how well can one predict and estimate deep abdominal AT from a knowledge of waist circumference. Main Items to address: a.) Create a scatter plot of the data and compute the correlation between waist circumference and deep abdominal AT. Comment what you see in this plot in terms of the relationship between deep abdominal AT and waist circumference. (2 pts.)b.) In the context of this problem, carefully interpret the y-intercept and slope of your estimated regression line, i.e. carefully explain what these numbers are measuring. (You need to do more than say they are the y-intercept and slope of the line.) Also explain to a colleague how they would use this model to predict deep abdominal AT. (3 pts.) c.) What is the R- Square value for this analysis?? In the context of this problem, carefully explain what this number is measuring.? (2 pts.)d.) Discuss whether or not the regression assumptions are being met.? Also, identify any outliers in the data set. (4 pts.)Checklist for checking the regression model assumptions:? > Model Appropriate: Make sure no existing trends remain in the residual plot.? > Constant Variance:? Make sure there is no megaphone patterns in the residual plot? > Independence: Don’t really need to check this as these data are collected over time.? > Normality:? Make a histogram of the residuals and make sure they follow a normal distribution > Outliers: Any observations that fall outside ±2*RMSE are considered possible outliers. ? e.) Give a 95% prediction interval for the deep abdominal AT for 25 year old man with a waist circumference of 105 cm. Interpret this interval. (2 pts.)Note: There is an individual with a waist circumference of 105 cm in these data.f.) Can we use the results of this study to predict the deep abdominal AT of an individual with a waist circumference of 135 cm? Explain. (1 pt.)g.) Can we use the results of this study predict the deep abdominal AT of 50 year old male with a waist circumference of 100 cm? Explain. (1 pt.)h.) Can we use the results of this study predict the deep abdominal AT of 24 year old female with a waist circumference of 70 cm? Explain. (1 pt.)3 – Selling Price of Homes in Polk County, IA from 2012-2013(Datafile: Polk County 2012.JMP)The variables descriptions for this analysis are given below. The goal of the analysis is identifying key factors/variables that are related to the selling price of the homes. All sales represent arms length situations, i.e. where both the buyer and seller are on equal footing. VariableDescriptionpriceThis is the response for the prediction problem and is the price the home sold for in U.S. dollars ($).jurisThis is a categorical variable which denotes the region/city within Polk County the home was in. The codes are:AL = Altoona, ANK = Ankeny, BO = Bondurant, CL = Clive, DM = Des Moines, GR = Grimes, JO = Johnston, O = Other, PC = Polk City, PH = Pleasant Hill, UR = Urbandale, WDM = West Des MoinesmonthMonth the home sale occurred (1 = January, …, 12 = December)instrumentDeed or ContractZIP*Zip Code (FOR MAPPING PURPOSES ONLY!!!)bldg.fullAssessed value of building(s) on the lot. ($)total.fullAssessed value of entire property (land and structures) ($)land.acresLot size in acres.residence.typeResidence Type - 1.5 Stories, 2 Stories, 1 Story, Over 2 Stories, Split bldg.styleSplit, Ranch, Other, Early 20s, Conv (conventional), Bungalow ext.wallMaterial used to construct exterior walls – Wood, Vinyl, Metal, Hardboard, Conc.Board, Brick, Otherpercent.brickPercentage of house that is brick, ranges from 0 – 100.roof.typeGable, Hip, Othermain.living.areaSquare foot main living areaupper.living.areaSquare foot upper living areafin.attic.areaSquare foot of finished attic areatotal.living.areaTotal square foot living areaunfin.attic.areaSquare foot of unfinished attic areafoundationMaterial used to construct foundation –Poured Concrete, Concrete Block, Brick, Otherbasement.areaSquare foot of basement areafin.bsmt.area.totSquare foot of finished basement areabsmt.walkoutLineal feet of exposed wallbsmt.gar.capacityNumber of cars (capacity) that fit in basement garageatt.garage.areaSquare foot of attached garageopen.porch.areaSquare foot of all attached open porchesenclose.porch.areaSquare foot of all attached enclosed porchespatio.areaSquare foot of all patio areasdeck.areaSquare foot of all deck areascanopy.areaSquare foot of all canopiesveneer.areaLineal feet of brick veneer on housecarportIs there are carport? (1 = yes, 0 = no)bathroomsNumber of full or ? bathstoilet.roomsNumber of ? bathsextra.fixturesNumber of extra fixtureswhirlpoolsNumber of whirlpool tubsfireplacesNumber of fireplacesbedroomsNumber of bedroomsroomsTotal number of roomsyear.builtYear home was builtgasairHomes has gas furnace with forced air heat (1 = yes, 0 = no)air.conditioningPercent of central air conditioning (ranges 0 = none to 100 = full)detachedAre there detached structures? (1 = yes, 0 = no)bsmt.qualBasement quality – None (i.e. no basement), Low, Average, Average Plus, Living QuartersConditionThe condition of the house for its age and type of construction1 = Very poor, 2 = Poor, 3 = Below Normal, 4 = Normal, 5 = Above Normal, 6 = Very Good, 7 = ExcellentGradeThe quality of original construction ranging from 1 to 6, with 1 being the best (notice it uses an opposite ordering than condition)Grade AdjustedAn adjustment made between different grades above – adjusts up or down on the grade scale above. The smaller the adjusted grade value the better the quality of the original construction.Use multiple regression to develop a regression model for Y= selling price (price) using the other variables in the table above as predictors. Summarize your final model and assess model adequacy by examining the residuals. (25 pts.) ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download