Stat 203 assignment #1 - Simon Fraser University



Stat 203 assignment #1

Solution

1. In EmployeeData there are 9 variables (columns). Give the level of the data for each variable (nominal, ordinal, interval).

1) gender: nominal

2) birthdate: interval

3) education: interval(ordinal)

4) job category: nominal(ordinal)

5) Salary: interval

6) Salbegin: interval

7) Jabtime: interval

8) Minority: nominal

Note: for c) and d) both answers are reasonable depending on the different understanding and interpretation. Education could be regarded as the grades and job category could be interpreted as the rank of the positions.

2. For the SeniorsData, create a grouped frequency table and include columns for the grouped relative frequency, the grouped percentage relative frequency and the cumulative grouped relative frequency.

Explain your choice of group boundaries.

| |Frequency |Percent |Valid Percent |Cumulative Percent |

|Group |6-7.9 |8 |20.0 |20.0 |20.0 |

| | | | | | |

| | | | | | |

| | | | | | |

| | | | | | |

| |8-9.9 |10 |25.0 |25.0 |45.0 |

| |10-11.9 |14 |35.0 |35.0 |80.0 |

| |12-13.9 |7 |17.5 |17.5 |97.5 |

| |14-15.9 |0 |0 |0 |97.5 |

| |16-17.9 |1 |2.5 |2.5 |100.0 |

| |Total |40 |100.0 |100.0 | |

There are 40 observations and the range of the data is 10.3 so that choose 6 intervals with size 1.9 or 7 intervals with size 1.6. The lower and upper are obtained by adding and subtracting 0.05 from either end of the class interval. A variety of solutions concerning different interval size does not matter. It is preferable to create a concise and readable table and the distance are equal representing this distribution well.

Note: Generally speaking the cumulative percentage should be increasing with the growth of the group intervals.

3/4. For the previous question, draw the corresponding cumulative grouped frequency histogram, a grouped relative frequency histogram and a grouped percentage relative frequency histogram.

[pic]

[pic]

[pic]

[pic]

6. Repeat questions 2-4 for the \sterntot" variable in Relative Data using a computer. You may use the default group boundaries. That is, let the computer choose the group boundaries.

grouped

| |Frequency |Percent |Valid Percent |Cumulative Percent |

|Group |30-39 |3 |1.6 |1.6 |1.6 |

| |40-49 |2 |1.1 |1.1 |2.6 |

| |50-59 |8 |4.2 |4.2 |6.8 |

| |60-69 |32 |16.8 |16.8 |23.7 |

| |70-79 |69 |36.3 |36.3 |60.0 |

| |80-89 |76 |40.0 |40.0 |100.0 |

| |Total |190 |100.0 |100.0 | |

sterntot

|N |Valid |190 |

| |Missing |0 |

|Mean |74.48 |

|Median |76.50 |

[pic]

[pic]

[pic]

7. In a psychological experiment, the time on task was recorded for ten subjects under a 5-minute time constraint. The measurements are in seconds:175 190 250 230 240 200 185 190 225 265

(a) Find the mean time on task.

Mean [pic]

(b) Find the median time on task.

Order the data:

175 185 190 190 200 225 230 240 250 265

Position of median: [pic]

Median=[pic]

(c) If you were writing a report to describe these data, which measure of central tendency would you use? Explain.

There does not seem to be extreme values in the data set so that mean is the best measurement for central tendency, which is the average of all the values, that is to say, mean captures all the information of the data. However, median represents the value at the mid-position among the data set.

8. In an article entitled \You Aren't Paranoid if You Think Someone Eyes Your Every Move" (1985),the Wall Street Journal noted that big business collects detailed statistics on your behaviour. Jockey

International knows how many undershorts you own; Frito-Lay, Inc. knows which you eat ¯rst{the broken pretzels in a pack or the whole ones; and, to get even more speci¯c, Coca-Cola knows you put 3.2 ice cubes in a glass. Have you ever put 3.2 ice cubes in a glass? What did Wall Street Journal article mean by that statement?

The journal was reporting that, on average, 3.2 ice cubes is consumed in the coke, which does not imply that people put 3 ice cube and 0.2 ice cube in the coke. The journal is simply telling us that the central tendency and serves as the guidelines for what is most likely to happen.

9. Calculate the mean, ¹x and the median for the SeniorsData. Comment on the skewness of the data.

Mean[pic]

Position of median: [pic]

Median=[pic]

The data is fairly evenly distributed and a little bit skewed to the right due to one outlier (17.2) relative to the rest of the data. However, the large value is not extremely large, so that it does not skew the mean to a large degree.

Note: the position of the median is not the value of median

10. Obtain the mean and median for the grouped frequency histogram given in question # 6. Comment on the shape of the distribution. Which measure would you use?

| |Frequency |m |fm |cf |

Group |30-39 |3 |34.5 |103.5 |3 | | |40-49 |2 |44.5 |89 |5 | | |50-59 |8 |54.5 |436 |13 | | |60-69 |32 |64.5 |2064 |45 | | |70-79 |69 |74.5 |5140 |114 | | |80-89 |76 |84.5 |6422 |190 | |

Mean from the frequency table[pic]

=[pic]

=75.0263

[pic]=mean

m=midpoint of the class interval

f=frequency of a class interval

N=total number of scores

Median from the frequency table:

[pic]

L=69.5 N=190 cfb=45 f=69 i=10

Median=76.746

N=number of cases in the distribution (190)

[pic]=cumulative frequency below the lower limit of the critical interval

L=lower limit of the critical interval

f=frequency within the critical interval

i=class-interval size

hint: We look for the 190/2=95th case. Moving up from the lowest interval, we see that 45 are below the 69 and 114 are below the 79.Thus, the middle most case must lie in the 70-79 interval. This is the critical interval. Thus the lower limit of the critical interval should be 69.5 and the cumulative frequency below 69.5 should be 45. The frequency within the critical interval 70-79 should be 69.

Note: we can also use the approach of 9) to figure out the mean and median for this problem.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download