Lab 2 Statistics and Graphing_summer18



LAB 2: DATA ANALYSIS: STATISTICS, and GRAPHING Please note: There will be laptops available for you to perform the activities. However, if you have a personal computer (Mac or PC) or tablet the concepts and skills in this lab will be much more meaningful to you if you are to learn them using your own device. Please take the time BEFORE LAB to familiarize yourself with your device and the Microsoft Excel that is on there. If you do not have Excel please go to: follow the instructions and download and familiarize yourself with at least Microsoft Excel BEFORE coming to lab this week so that you can be ready to learn!ACTIVITY 1: Calculating the Mean, Median, Mode, Range, and Standard Deviation In order to determine cause and effect relationships, scientists record observations about how changes in one variable cause another variable to change. A variable is the factor or characteristic being measured and can differ or vary in amounts or types. Examples of variables include height, weight, femur length, eye color, attitude, and health status. A list of data collected about variables often is not useful by itself; therefore, descriptive statistics are used to summarize the data and give scientists a clearer view of the data in order to identify trends or patterns in the data. Examples of descriptive statistics with their definitions are listed below. The mean, median, and mode are used to determine the “center” of the values in a data set. The range and standard deviation indicate the spread of the data set values around the “center”. Mean: The mean is often called the average. It is calculated by adding up all the values in a data set and then dividing by the total number of values. The mean is very sensitive to unusual values (outliers) and is easily distorted. For example, consider the data set with the following values: 4, 7, 3, 8, 10, 8. The average is 6.7. But what if the last value was mistakenly entered as 80 instead of 8? Now the average is 18.7! Median: The median is the middle value of the data set. Half of the values are lower than the median and half are higher than the median. In order to determine the median, the values must be listed in numerical order. If there is an odd number of values, the median is the ((n + 1)/2)th value (n = total number of values). If there is an even number of values, the median is the average of the two middlemost values, the (n/2)th and [(n/2)+1]th values. For example, the data set 5, 10, 6, 9, 12 contains an odd number of values. First, we must arrange the values in numerical order: 5, 6, 9, 10, 12. The median is the ((5+1)/2) or 3rd value, or 9. It is easy to see by looking at this data set that the value 9 is in the middle. Now calculate the median of the data set from the mean example above. The data set listed in numerical order is 3, 4, 7, 8, 8, 10 and contains an even number of values. The median is the average of the 6/2 or 3rd value and the [(6/2) +1] or 4th value. The median is (7+8)/2, which equals 7.5. Note that if we used the data set from the mean example with 80 instead of 8, the median is still 7.5! The median is not as sensitive to outliers. Remember, median = middle value. Mode: The mode is the value that occurs the most frequently in a data set. If all values in a data set only occur once, then there is no mode. Remember, mode = most often. The mode for the data set 4, 8, 3, 8, 10, 7 is 8 because that value occurs the most often. What is the mode for the data set 5, 10, 6, 9, 12? In this case, there is no mode. Range: The range is the difference between the highest and lowest values. It indicates the spread of the values in the data set. Since the range only considers the extreme values in the data set, it is also highly influenced by outliers. For the data set 3, 4, 7, 8, 8, 10, the range is 10-3 = 7. If one of the 8s was mistakenly recorded as an 80 in the data set above, then the range is 80-3 = 77! Standard deviation: The standard deviation is a measure of the variability or spread of the data around the mean. When comparing two data sets collected on the same variable (ex. respiratory rate), the data set with the higher standard deviation has a greater amount of variability among the values; the values are more spread out. In contrast, the data set with the smaller standard deviation has values that are more similar and closer to the mean. For example, respiratory rates in breaths per minute were recorded for two different groups of students. Group 1: 16, 15, 14, 17, 18 Mean = 16 bpm Standard deviation = 1.6 bpm Group 2: 10, 10, 20, 25, 15 Mean = 16 bpm Standard deviation = 6.5 bpm Even though the two groups have the same mean, group 2 has a higher standard deviation, indicating that the values are more variable and spread out. Many important physiological variables, such as blood pressure, height, and weight, have a pattern of data value distribution called a normal distribution. For example, if you randomly sampled a large group of people and plotted their blood pressure values, a curve that is symmetrical about the mean blood pressure is obtained. This curve is called a normal curve or bell-shaped curve (see Figure 1). If the data is distributed in the form of a normal or bell-shaped curve, then the following conclusions can be made: approximately 68% of the values lie within +/- 1 standard deviation from the mean approximately 95% of the values are within +/- 2 standard deviations from the mean approximately 99% of the values lie within +/- 3 standard deviations from the mean Figure 1. Normal or bell-shaped curve with standard deviations The standard deviation is defined as the square root of the sum of the value deviations from the mean divided by one less than the sample size! This can be represented by the formula: 1371728-564260where where are the observed values of the sample i. The following example of how to calculate standard deviation will allow you to understand the above definition and formula better. Please follow along with the example, so that you will be able to calculate standard deviation for the other data sets. Researchers recorded the number of precancerous skin lesions found on six patients who had a skin cancer mass removed 10 years ago. The data set is as follows: 1, 3, 4, 6, 9, and 19 precancerous skin lesions. Calculate the mean for the data set. Mean = (1+3+4+6+9+19) / 6 = 42 / 6 = 7 Mean = 7 precancerous skin lesions Subtract the mean from every number to get the list of deviations. It is OK to get negative numbers here. list of deviations: -6, -4, -3, -1, 2, 12 Next, square the resulting deviations. squares of deviations: 36, 16, 9, 1, 4, 144 Add up all of the resulting squares to get their total sum. sum of squared deviations: 36+16+9+1+4+144 = 210 Data Value (xi) (# of precancerous skin lesions) Data Value –Mean (xi - "?) (= deviation) Deviation Squared 1 -6 36 3 -4 16 4 -3 9 6 -1 1 9 2 4 19 12 144 Total ? 210 Table 1: calculation of standard deviations from numbers of precancerous skin lesions. Divide the sum of squared deviations by one less than the number of values. 210 / 5 = 42 __ Then take the square root of this number: √42 = 6.48 precancerous skin lesions = standard deviation If this data formed a normal curve (which it does not), we could conclude that 68% of the values will fall in the range of +/- 1 standard deviations from the mean. That is, 68% of the values in the data set are within the range of 7 – (1 x 6.48) to 7 + (1 x 6.48) or 0.5 to 13 precancerous skin lesions. ACTIVITY 2: Applying Descriptive Statistics and Graphing using Microsoft Excel Spreadsheets Graphing is a technique that allows visual examination of data, and, in addition to descriptive statistics, is another useful way to reveal trends in lists of data. You will be using graphs to examine data during the remainder of this laboratory exercise. Here are some guidelines for presentation of data in graphs or tables: In a paper, each graph or table must be identified by a figure or table reference number. Besides this number should be a short but clear description of what the table/graph/figure illustrates. This figure number should be referenced in the prose of the paper itself The graph or table must have a brief, clear, descriptive title. In tables, the column and row headings should clearly identify the variable and units of measurement. In graphs, the axes should be clearly labeled and the units of measurement denoted. NOTE: always graph time (the independent variable) on the horizontal (x) axis. Note: look back at Lab 1 and review the differences between the independent and dependent variables A graph or table, along with its descriptive title, should be able to stand alone without the reader having to read additional text to understand it (hence numbers 1-4) One of the most commonly used graphs is the Scatterplot (scatter graph and/or scatter chart). A scatterplot graphically represents 2 variables on a Cartesian coordinate system. The pattern of data dots can be analyzed for types of correlations. Often straight or curved lines are drawn between the data. To use this function, you will need to create a spreadsheet with the two variables – both the dependent and independent – listed in adjacent columns. Select the data and then go to “insert” “charts” “scatter” and then choose the type of scatterplot that you feel will illustrate well your data type. In figure 2, below, age is the independent variable and weight is the dependent variable. Independent data points are indicated with dots (you can also have the program draw a line or curve between the data point to illuminate any continuity between data). A trendline is graphed in a dashed line. What kind of correlation does this graph suggest between these two different variables? Figure 2: Age versus weight for 25 Physiology Lab students at Salt Lake Community College. Another very common graph type is the histogram. Histograms visually display the frequency distribution of data. The absolute frequency is the number of times a value occurs in a data set. It is plotted on the y-axis while the dependent variable (the variable you are studying) is plotted on the xaxis. For example, researchers collected serum cholesterol values from a large sample of men, ages 25-34. The serum cholesterol values are divided into intervals of equal widths, which allows for comparisons among the intervals. The frequencies for each interval are presented in Table 2 and the corresponding histogram is seen in Figure 3. Cholesterol Level (mg/100 ml) Ages 25-34 Ages 55-64 Number of Men Relative Frequency (%) Number of Men Relative Frequency (%) 80-119 13 1.2 5 0.4 120-159 150 14.1 48 3.9 160-199 442 41.4 265 21.6 200-239 299 28.0 458 37.3 240-279 115 10.8 281 22.9 280-319 34 3.2 128 10.4 320-359 9 0.8 35 2.9 360-399 5 0.5 7 0.6 Total 1067 100.0 1227 100.0 Table 2: Absolute and relative frequencies of serum cholesterol levels for 2294 U.S. males, 19761980 Data from National Center for Health Statistics, Fulwood R, Kalsbeek W, Rifkind B, Russell-Briefel R, Muesing R, LaRosa J, and Lippel K. Total serum cholesterol levels of adults 20-74 years of age: United States, 1976-1980. Vital and Health Statistics, Series 11, Number 236, May 1986. The relative frequency for an interval is calculated by dividing the absolute frequency by the total number of values. The resulting number is then multiplied by 100%. This gives the proportion of values that fall into an interval rather than the absolute number. For example, the relative frequency in the 160-199 mg/100 ml interval for men ages 25-34 is (442/1067) x 100% = 41.4%. The relative frequency is used to compare two or more data sets where the number of values obtained is not equal. For example, serum cholesterol values were also collected for 1227 men aged 55-64 years. Because the group sizes are not the same, one cannot compare the absolute frequencies of the two groups. However, the relative frequencies can be compared, and one can see that the older men tend to have higher serum cholesterol levels than the younger men. 05010015020025030035040045050080-119120-159160-199200-239240-279280-319320-359360-399Number of menSerum cholesterol level (mg/100 ml)Absolute Frequencies of Serum Cholesterol Levels for 1067 U.S. Males, Aged 25-34 Years, 1976-198005010015020025030035040045050080-119120-159160-199200-239240-279280-319320-359360-399Number of menSerum cholesterol level (mg/100 ml)Absolute Frequencies of Serum Cholesterol Levels for 1067 U.S. Males, Aged 25-34 Years, 1976-1980Figure 3: Absolute frequencies of serum cholesterol levels for 1067 U.S. Males, Aged 25-34 years, 1976-1980 As seen in the previous examples, management of data often requires sorting the data, constructing a well-organized table, and calculating statistical parameters. Computerized spreadsheets can be invaluable when performing these tasks. One of the most popular spreadsheets available is Microsoft’s Excel?. The following describes how to enter data and formulas into an Excel spreadsheet to determine some of the parameters that are discussed in this lab. Computers are available in the lab to allow you to practice creating and using a spreadsheet. The data presented in Figure 4 represent measurements obtained from ten BIOL2425 students. Each student is represented by a number in column B. This number representation maintains anonymity and also allows one to quickly determine the number of students measured (sample size). Each student will enter his/her age, height (in inches), and weight (in pounds). The formulas that have been entered into the spreadsheet will convert the height and weight from the English units (inches and pounds) to the SI units (cm and kg). The formulas used to do this conversion are displayed in Figure 5. Your lab instructor will describe how to type formulas into an Excel spreadsheet and how to use the copy/paste and fill down features to quickly design a very useful spreadsheet. The example spreadsheets in Figures 4 and 5 have also been used to calculate the mean and standard variation for age, height, and weight of the students that have been entered. Notice that in columns E and F, this calculation is inaccurate because of the zeros that have been calculated by the entered formulas in those columns. One must always assure that unwanted zeros are not found in any data column. The age data found in column C were copied and pasted into column J. These data were then sorted from largest to smallest to allow easy determination of the age range and the median and mode values. Finally, a formula has been entered into the cells of column H to calculate the BMI (body mass index) for each of the students for which data have been entered. The formula to calculate one’s BMI is weight in kilograms (kg) divided by height in meters squared (m2). Notice that in column F, Figure 3, the height was originally converted to cm. This value needs to be converted into meters by dividing by 100 before being used in the BMI calculation (see formula in column F, Fig. 5). Note: there are 39.37 inches per meter (or, stated another way, each inch is 0.3937 inches per centimeter) and there are 2.204 pounds (lbs) per kilogram. 1ABCDEFGHIJ2Student NumberAgeHeight(in)Weight(lb)Height(m)Weight(Kg)BMISorted Data3122722001.8390.9127.18474230581301.4759.0927.23385318621601.5772.7329.333064470.000.00#DIV/0!3075190.000.00#DIV/0!2286220.000.00#DIV/0!2297300.000.00#DIV/0!20108160.000.00#DIV/0!19119380.000.00#DIV/0!181210200.000.00#DIV/0!161314Mean26.264163.330.4922.27#DIV/0!15Mode2216Median2217Standard D9.997.2135.120.7936.64#DIV/0!1819201 in=2.54 cm2.2 lb. = 1 KgBMI=Kg/M221Figure 4: Excel spreadsheet used to calculate mean, mode, median, standard deviation, and range for three variables measured on BIOL2425 students. This spreadsheet also converts measurement units from English to SI units and then calculates the BMI for each student. 1ABCDEFGHIJ2Student NumberAgeHeight(in)Weight(lb)Height(m)Weight(Kg)BMISorted Data312272200=D3*2.54/100=E3/2.2=G3/(F3*F3)47423058130=D4*2.54/100=E4/2.2=G4/(F4*F4)38531862160=D5*2.54/100=E5/2.2=G5/(F5*F5)306447=D6*2.54/100=E6/2.2=G6/(F6*F6)307519=D7*2.54/100=E7/2.2=G7/(F7*F7)228622=D8*2.54/100=E8/2.2=G8/(F8*F8)229730=D9*2.54/100=E9/2.2=G9/(F9*F9)2010816=D10*2.54/100=E10/2.2=G10/(F10*F10)1911938=D11*2.54/100=E11/2.2=G11/(F11*F11)18121020=D12*2.54/100=E12/2.2=G12/(F12*F12)161314Mean=SUM(C3:C12)/COUNT(C3:C12)=SUM(D3:D12)/CO=SUM(E3:E12)/COU=SUM(F3:F12)/CO=SUM(G3:G12)/CO=SUM(H3:H12)/COUNT(H3:H12)15Mode2216Median2217Standard Deviation=STDEV(C3:C12)=STDEV(D3:D12)=STDEV(E3:E12)=STDEV(F3:F12)=STDEV(G3:G12)=STDEV(H3:H12)1819201 in=2.54 cm2.2 lb. = 1 KgBMI=Kg/M2Figure 5: The underlying formulas used to perform calculations in the above Excel spreadsheet The final topic to discuss in terms of applying descriptive statistics and graphing is the reliability of the measurements upon which the data are based. Reliability is affected by both accuracy and precision. Accuracy refers to how close to the actual or true value a measurement is. Precision refers to the repeatability of the measurement. We have chosen to demonstrate the meaning of accuracy and precision by comparing three different methods used to track a person’s progress in a fitness program. One goal of a health-promoting fitness program should be to decrease a person’s percent body fat. Percent body fat is the weight of a person’s fat divided by the person’s total body weight. Obviously, a researcher cannot pull out a person’s fat from the rest of the body to weigh it. Thus, other methods must be used to estimate the fat content of an individual. It is beyond the scope of this lab to discuss all of the methods used to estimate percent body fat; however, we will describe three that are commonly used in fitness centers. The first method uses body mass index (BMI), which is a ratio of weight to height (see actual formula later in this lab), to estimate percent body fat. This formula was first published in the British Journal of Nutrition in 1991. This method is quick, cheap, and can be very precise; however, the method does not differentiate among fit individuals, who would presumably have a high protein:fat ratio and unfit individuals, who would presumably have a lower protein:fat ratio. Because protein weighs more than fat, a fit person might have a higher BMI than an unfit person. Thus, BMI is not very accurate when comparing different types of people. It is useful, however, when looking at trends in a population over time. The precision of the BMI calculation depends on the tools used to measure the height and weight and the skill of the person taking the measurements. The second method used to estimate percent body fat relies on measurements of the thickness of the subcutaneous fat layer. The thickness of this layer is determined by using skinfold calipers and formulas to convert the sum of several skinfold measurements to the percent body fat. When performed by an experienced examiner who knows which conversion formula to use for the specific individual being measured, this method can be quite accurate; however, the precision is less than desirable. The percent body fat estimated by this method is expressed as the percentage plus or minus 4 percent. Due to the measurements and calculations used, the value cannot be any more precise than about 4 percent. In addition, the accuracy can be low when an untrained person does the measurements. The third method estimates the percent body fat using the density of the body. Density is the mass of an object divided by its volume. Fat is less dense than the other components of the body; thus, a decrease in body density should indicate an increase in percent body fat. Values based on density measurements can be both very accurate and very precise; however, they are limited by the formulas used to convert density to percent body fat. One very accurate and precise way to determine a person’s density is to find his volume by measuring the amount of water he displaces. This entails weighing the person in water. Although quite accurate and precise, the water displacement method is not very convenient method for a medical practice (or a science lab). A more convenient way to estimate a person’s density is through bioelectrical impedance analysis (BIA). This analysis relies on the fact that fat, being hydrophobic, conducts electricity more slowly than the rest of the body, which is composed of mainly water. A very small current enters the body at one electrode and leaves the body at a second electrode. The time required for the electric current to travel through the body is then converted into a percent body fat value. The accuracy and precision of this method can be affected by the subject’s hydration level and body temperature, the formula used, and the precision of the measuring device. Although the actual value obtained may not be completely accurate, if one controls for the body temperature and hydration level, any changes in percent body fat can be monitored quite accurately with this method. Please note: Pregnant women should not use this BIA scale. Also, if you have a pacemaker or other internal electronic medical device, you should not use the body fat reading feature of the BIA scale. The categories assigned by the World Health Organization to the ranges of BMI and percent body fat are seen in Table 3 and Figure 6. Because the accuracy of percent body fat estimations may be affected by a variety of factors that we will not be controlling for, you should not consider the measurements obtained in this lab as clinical data. 25147286852BMI range, Category kg/m2 Severely underweight less than 16.0 Underweight 16.0 to 18.5 Normal 18.5 to 25 Overweight 25 to 30 Obese Class I 30 to 35 Obese Class II 35 to 40 Obese Class III over 40 Table 3: Categorizations of BMI used Figure 6: WHO percent body fat ranges for by the World Health Organization standard adults. One way to assess a person’s physical fitness is to calculate his BMI (body mass index). This value can be used to estimate the person’s percent body fat (weight of a person’s fat divided by the person’s total body weight). The formula used to calculate BMI is BMI = Weight (kg) / Height (m)2Because last week’s measurements were recorded in inches and pounds, they must be converted to the SI units before being used in the above equation. The conversion from standard measurements are as follows:1 Kilogram (kg) = 2.2 pounds (lbs)1 Meter (m) = 39.37 inches (with 12 inches per foot)2.54 cm = 1 inch; and 100 cm = 1 meter. Once a person’s BMI is calculated, it can be used to estimate his or her percent body fat. Note that the formula to estimate a person’s percent body fat from BMI is different for males and females: For females: percent body fat = (1.2)(BMI) + (0.23)(age in years) – 5.4 For males: percent body fat = (1.2)(BMI) + (0.23)(age in years) – 16.2 Figure 7: Locations for obtaining various skinfold measurements Figure 8: Nomogram for calculating percent body fat from the sum of three skinfold measurements Taken from Baun WD, Baun MR, and Raven PB. A nomogram for the estimate of percent body fat from generalized equations. Research Quarterly for Exercise and Sport, 52:380-384, 1981. To use this nomogram: Locate the sum of the three skinfold measurements in the right column and mark it.Locate the person’s age in years on the far left column and mark it.Connect the two marks with a ruler or straightedge.Read the percent body fat where the line intersects the middle column.Table 3 is from: Ogden CL, Fryar CD, Carroll MD, Flegal KM. Mean bodyweight, height, and body mass index, United States1960–2002. Advance data from Vital and Health Statistics; No 347. Hyattsville, Maryland: National Center for Health Statistics. 2004. Figure 6 is from the advertisement for the Tanita percent body fat scale used in this lab. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download