ExcelTM and Statistical Analysis



Excel and Statistical Analysis

Jonathan Hill

Florida International University

In this document, basic principles for analyzing data with Microsoft’s Excel software will be detailed. Many assignments with require the information provided in the document. It is assumed that you know how to boot-up Excel from your personal computer’s Start Menu, as well as open existing Excel files and save files.

CONTENTS

Topic Page

1. Data Arrangement: Rows, Columns, Observations 2

1.1 Rows and Columns 2

1.2 Observations 2

2. Creating News Variables as Functions of Existing Variables 3

1. Creating One New Variable for One Observation 3

2. Creating One New Variable for All Observations 4

3. Examples

3. Statistical Analysis: Mean, Variance, Standard Deviation 5

3.1 Sample Mean, Mode and Median 5

3.2 Sample Variance and Standard Deviation 7

3.3 Descriptive Statistics: Tool Package 8

4. Relative Frequency and Histograms 10

4.1 Frequency 10

4.2 Relative Frequency 11

4.3 Histogram 12

5. Probability Distribution Functions 14

5.1 Discreet Distribution Functions 14

Binomial and Poisson

5.2 Continuous Distribution Functions 17

Normal and Student's t

6. Bivariate Analysis: Sample Covariance and Correlation 20

6.1 Sample Covariance 20

6.2 Sample Correlation 21

7. Interval Estimation 23

8. Graphics: Charts, Tables and Figures 25

8.1 Scatter Plots 25

8.2 Time-Series: Line Plots 27

8.3 Editing Graphs and Figures 29

1. Data Arrangement: Rows, Columns, Observations

1.1 Rows and Columns

When you boot-up Excel, a spread sheet of “cells” appears with column labels in letters (e.g. A, B, etc..) and row labels in Arabic numerals (e.g. 1, 2, etc..). For example, the upper-left corned of the spread-sheet will appear as follows:

| |A |B |C |

|1 | | | |

|2 | | | |

|3 | | | |

|4 | | | |

A vertical stack of cells is a “column”. For example, column “A” is the stack

|A |

| |

| |

| |

| |

A horizontal layer of cells is a “row”. For example, the 3rd row is the layer

| |A |B |C |

|1 | | | |

|2 | | | |

|3 | | | |

|4 | | | |

1.2 Observations [Individual People/Time Period in a Sample of Data]

Your data assignments will include Excel data files. For example, consider data on work-hours, hourly wage, education and age of 4 people:

| |A |B |C |D |

|1 |hours |wage |ed |age |

|2 |34 |7.5 |12 |20 |

|3 |50 |20.8 |18 |37 |

|4 |40 |5.75 |16 |50 |

|5 |30 |12.90 |14 |42 |

The data files will always contain column headers with variable names. The data is read as follows. Clearly, the columns contain individual variables. For example, columns C contains information on the years of education the 4 people have. Moreover, in the 2nd row is the first person in our sample of data. This person works 34 hours/week, earns $7.50/hour, has 12 years of education and is 20 years old. In this manner, the rows contain information for one person. We call one such set of information for one person an “observation”. Thus, the fourth observation comprises 30 work hours, a wage of $12.90, 14 years of education, and an age of 42. Our sample, therefore, has n = 4 observations.

Similarly, we could have data that changes over time, like GNP, the CPI and inflation for 1988, 1989 and 1990

| |A |B |C |D |

|1 |YEAR |GNP |CPI |INFL |

|2 |1988 | | | |

|3 |1989 | | | |

|4 |1990 | | | |

Thus, one year, say 1988, and its GNP, CPI and inflation rate, represents one observation.

2. Creating News Variables as Functions of Existing Variables

2.1 Creating One New Variable for One Observation

Often, we want to create new columns of data based on the existing data in our data file. For example, if our file contains information on one firm’s revenue [REV] and cost-per-unit [COST] of production for 4 separate years (1998, 1999, 2000, 2001), we can use an empty column to create a new profit variable, PROFIT= REV – COST.

| |A |B |C |D |

|1 |YEAR |REV |COST | |

|2 | 1998 |100 |87 | |

|3 | 1999 |76 |75 | |

|4 | 2000 |490 |500 | |

|5 | 2001 |80 |40 | |

2.2.1 First, we want to create a new column label, in column D, for the PROFIT variable. Simply place the cursor over the cell in row 1, column D (denoted as cell "D1"), click, and type “PROFIT”:

| |A |B |C |D |

|1 |YEAR |REV |COST |PROFIT |

|2 | 1998 |100 |87 | |

|3 | 1999 |76 |75 | |

|4 | 2000 |490 |500 | |

|5 | 2001 |80 |40 | |

2.2.2 Next, we want to use the data in the 2nd row (the first year in our sample) to create the new profit variable for the year 1998. Point and click-on column D, row 2 (i.e. cell D2), just below the new label, PROFIT. We will use Excel’s ability to perform mathematical operations. Type “=”, then type in the cell coordinates containing the first year’s revenues, “B2” (always letters/columns first), then type “-”, then the coordinates containing cost, “C2”:

| |A |B |C |D |

|1 |YEAR |REV |COST |PROFIT |

|2 | 1998 |100 |87 |=B2-C2 |

|3 | 1999 |76 |75 | |

|4 | 2000 |490 |500 | |

|5 | 2001 |80 |40 | |

Finally, hit Enter. Excel will perform the operation for you:

| |A |B |C |D |

|1 |YEAR |REV |COST |PROFIT |

|2 | 1998 |100 |87 |13 |

|3 | 1999 |76 |75 | |

|4 | 2000 |490 |500 | |

|5 | 2001 |80 |40 | |

2.2 Creating One New Variable for All Observations

Excel beautifully allows us to perform the same operation for all other years in the sample simply by clicking and dragging the cursor. Click-on the cell containing the first year’s profit. The cell will be surrounded by black with a small box in the lower-right corner:

| |A |B |C |D |

|1 |YEAR |REV |COST |PROFIT |

|2 | 1998 |100 |87 |13 |

|3 | 1999 |76 |75 | |

|4 | 2000 |490 |500 | |

|5 | 2001 |80 |40 | |

Point the cursor over small blacken box in the lower right corner of the cell: Excel will change the shape of the cursor to a small cross. Then, click, hold, and drag down: Excel will automatically perform the same arithmetic operation for the first cell in all subsequent cells. Once you reach the 5th row (i.e. cell D5), stop dragging, and release the mouse-button. The result is

| |A |B |C |D |

|1 |YEAR |REV |COST |PROFIT |

|2 | 1998 |100 |87 |13 |

|3 | 1999 |76 |75 |1 |

|4 | 2000 |490 |500 |-10 |

|5 | 2001 |80 |40 |40 |

2.3 Examples

We want to the use labor data from above to create a yearly wage income variable. Mathematically, yearly wage income is

[pic] (thus, the units are $'s)

| |A |B |C |D |E |

|1 |hours |wage |ed |age | |

|2 |34 |7.5 |12 |20 | |

|3 |50 |20.8 |18 |37 | |

|4 |40 |5.75 |16 |50 | |

|5 |30 |12.90 |14 |42 | |

In column E, row 1 (cell E1), type “income_wage”, or whatever variable name you prefer, as long as it logically describes the content of the data:

| |A |B |C |D |E |

|1 |hours |wage |ed |age |income_wage |

|2 |34 |7.5 |12 |20 | |

|3 |50 |20.8 |18 |37 | |

|4 |40 |5.75 |16 |50 | |

|5 |30 |12.90 |14 |42 | |

Click-on column E, row 2 (cell E2), type “=A2*B2*52”, then enter. Notice that multiplication is performed with the asterisk “*”. The result is a yearly wage income of $13260/year:

| |A |B |C |D |E |

|1 |hours |wage |ed |age |income_wage |

|2 |34 |7.5 |12 |20 |13260 |

|3 |50 |20.8 |18 |37 | |

|4 |40 |5.75 |16 |50 | |

|5 |30 |12.90 |14 |42 | |

Place the cursor above the small black box in the lower right corned of E2, click, hold, and drag down to the 5th row (cell E5). The outcome is

| |A |B |C |D |E |

|1 |hours |wage |ed |age |income_wage |

|2 |34 |7.5 |12 |20 |13260 |

|3 |50 |20.8 |18 |37 |54080 |

|4 |40 |5.75 |16 |50 |11960 |

|5 |30 |12.90 |14 |42 |20124 |

3. Statistical Analysis: Mean, Variance, Standard Deviation

In the present sub-section, we will study how to use Excel for basic statistical analysis of data. The theory required for the subsequent discourse is contained in Newbold, Chapter 2.

The data that we inspect in any situation or environment is necessarily a sample taken from a larger population: we cannot (typically) have the entire population of data. Thus, all data analysis is of the sample statistic form.

3.1 Sample Mean, Mode and Median

Consider the sample of data on work-hours, wage, education, age, and newly created variable, yearly wage income = income_wage:

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 | | | | | | |

For ease of reference, we can use the 7th row to contain a label for the "mean", as well as the sample means of each variable. Thus, type in cell A7 the title "MEAN".

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN | | | | | |

3.1.1 Sample Mean for one Variable (e.g. work-hours)

For the sample mean (i.e. average) of work-hours, click-on cell B7, and type the command

=average(B2:B5)

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN |=average(B2:B5) | | | | |

This command will tell Excel to derive the sample mean of the numbers contained in cells B2 - through - B5, which identically represents the work hours data. Once you type the above command in, hit ENTER, and Excel will fill the cell with the sample average:

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN | 38.5 | | | | |

3.1.2 Sample Mean for All Variables

Recall the point, click and drag method for easily creating observations of a newly created variable. We will employ the same technique. Click-on cell B7, in which the sample mean of "hours" is contained. The cell will be highlighted with black around the edges, and a small box in the lower right corner:

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN | 38.5 | | | | |

Point the cursor over the small black box until it changes into a small black cross, click and drag to the right over cells C7 - F7. Excel will not copy the numerical value "38.5" into the subsequent cells. Rather, Excel will copy the formula for the average in those cells, updating the column of data used as you move right. The result is sample averages for all variables:

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN | 38.5 |11.7375 |15 |37.25 |24856 |

3.1.3 Formatting Cells for Decimal Places

The above results are somewhat messy in the sense that the sample mean of hours has one decimal place, the sample mean of wages has 4 decimal places, etc. We would like to standardize the representation by displaying all numbers with only two decimal places. Click-on cell B7. Place the cursor above the center of the cell, and NOT the small box in the corner. Click and drag over cells B7 - F7. The cells with become highlighted.

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN | 38.5 |11.7375 |15 |37.25 |24856 |

Go to the task-bar on the top of the screen, click-on FORMAT, CELL, NUMBER. Once done, Excel will have displayed a pop-up box with a choice of the number of decimal places. Choose "2". Click-on "ok" on the bottom of the pop-up box.

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN | 38.50 |11.74 |15.00 |37.25 |24856.00 |

Excel will round values in the case when the true numerical value has more than the prescribed number of decimal places (in this care, two).

3.1.4 Sample Mode and Median

For sample modes and medians, use the command “=mode(…)”and “=median(…)”. As a separate example, consider two columns of data labeled simply X and Y. Using the methods describe above, we can calculate the samples of the two variables:

| |A |B |C |

|1 | |X |Y |

|2 | |1 |2 |

|3 | |1 |7 |

|4 | |0 |0 |

|5 | |-1 |0 |

|6 | | | |

|7 |MEAN |.25 |2.25 |

|8 | | | |

|9 | | | |

For clarity, type “MODE” and “MEDIAN” in cells A8 and A9, respectively. In cell B8, type

=mode(B2:B5)

Then, hit ENTER. The mode of variable X is 1.00, the most frequently occurring value in the sample of 4 observations. For the sake of speed, click on cell B8, place the cursor above the small black box in the lower right corner, click, hold, and drag over cell C8. The mode of variable Y is 0.00. Formatting the cells so numerical values have 2 decimal places, we now have

| |A |B |C |

|1 | |X |Y |

|2 | |1 |2 |

|3 | |1 |7 |

|4 | |0 |0 |

|5 | |-1 |0 |

|6 | | | |

|7 |MEAN |.25 |2.25 |

|8 |MODE |1.00 |0.00 |

|9 |MEDIAN | | |

Next, in cell B9, type “=median(B2:B5)”, then ENTER. Click and drag for variable Y. The result, after formatting, is

| |A |B |C |

|1 | |X |Y |

|2 | |1 |2 |

|3 | |1 |7 |

|4 | |0 |0 |

|5 | |-1 |0 |

|6 | | | |

|7 |MEAN |.25 |2.25 |

|8 |MODE |1.00 |0.00 |

|9 |MEDIAN |.50 |1.00 |

3.2 Sample Variance and Standard Deviation

Any subsequent statistical operation in Excel requires only knowledge of the proper command. All further steps are identical to those outlined for the sample mean.

For the sample variance, type "VARIANCE" in cell A8, and in cell B8 use the command

=var(B2:B5)

Then type ENTER, and click-and-drag over the remaining cells, C8 - F8.

For the sample standard deviation, type "STAND. DEV." in cell A9, then type in cell B9

=stdev(B2:B5)

Then type ENTER, and click-and-drag over the remaining cells, C9 - F9. After the numerical values in the cells have been standardized to two decimal places, we obtain

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN | 38.50 |11.74 |15.00 |37.25 |24856.00 |

|8 |VARIANCE |75.67 |45.76 |6.67 |160.92 |392402677.33 |

|9 |STAND. DEV. |8.70 |6.76 |2.58 |12.69 |19809.16 |

3.3 “Descriptive Statistics” Excel Package

3.3.1 Deriving All Descriptive Statistics

Excel provides a very simple mechanism for obtaining an array of sample descriptive statistics of your data. Consider the spread-sheet

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

In the main Excel tool-bar, click-on TOOLS, DATA ANALYSIS, DESCRIPTIVE STATISTCS, then OK.

In the area to the right of Input Range, type in the cell range in which your data is stored. In this case, we type

B1: F5

Notice that we include the 1st row with the variable labels. Click-on Labels in First Row. Click-on Summary Statistics for standard sample statistics. Click-on Confidence Level for Mean in order to generate the CI-length. Excel sends the output to another “sheet” which can be accessed by clicking-on the sheet with the highest number at the bottom of the screen (e.g. Sheet 4). The result is

|hours |  |wage |  |ed |

|1 |divorces | | | |

|2 |0 | | | |

|3 |0 | | | |

|4 |1 | | | |

|5 |4 | | | |

|6 |2 | | | |

|7 |1 | | | |

|8 |1 | | | |

|9 |0 | | | |

|10 |0 | | | |

|11 |2 | | | |

4.1 Frequency

We would like to derive the frequency for each divorce value. Recall that we define frequency as the number of times a value occurs in a sample. Notice that the only values that appear in the sample are (0,1,2,4), thus the relative frequencies of values 3,5,6,… will be zero.

In cell B2, type FREQUENCY. In column D, we need to create the possible values which we want to include for the frequencies: unfortunately, Excel does not understand that we simply want the values between, say 0 and 5; rather, we need to type the numerical values to be counted in a separate column. Create the following:

| |A |B |C |D |

|1 |divorces |FREQUENCY | | |

|2 |0 | | |0 |

|3 |0 | | |1 |

|4 |1 | | |2 |

|5 |4 | | |3 |

|6 |2 | | |4 |

|7 |1 | | |5 |

|8 |1 | | | |

|9 |0 | | | |

|10 |0 | | | |

|11 |2 | | | |

Now, click on cell B2, hold the mouse-key down, and drag down to cell B7. This will highlight cells B2 - B7, exactly parallel to cells D2-D7, which contain the possible values we want to count in our sample. Next, in the task-bar at the top of the screen, click on the fx key: this brings up a pop-up box within which we can employ of Excel's statistical and mathematical formulas. On the left-side of the pop-up box, click-on "statistical", on the right-side, scroll down to "frequency", and click. A frequency pop-up box appears. In the Data-Array, type the cells in which the column of data is contained, in this case A2:A11 (be sure to use a colon between cell values). In the Bins-Array, type the cells which contain the possible values we want to count, in this case cells D2:D7. Again, be sure to use a colon. DO NOT TYPE ENTER. In order to tell Excel that you want this operation to occur for all possible values in D2-D7, relative to the sample of data A2-A11, we need to hit simultaneously CTRL and SHIFT and ENTER. Once done, Excel will display the number of times each value in cells D2-D7 occurs in the sample contained in A2-A11.

In summary:

i. Highlight cells

| |A |B |C |D |

|1 |divorces |FREQUENCY | | |

|2 |0 | | |0 |

|3 |0 | | |1 |

|4 |1 | | |2 |

|5 |4 | | |3 |

|6 |2 | | |4 |

|7 |1 | | |5 |

|8 |1 | | | |

|9 |0 | | | |

|10 |0 | | | |

|11 |2 | | | |

ii. Click on the fx key, click on statistical, then frequency. Type in A2:A11 in the Data-Array, and type in D2:D7 in the Bins-Array.

iii. Hit simultaneously CTRL and SHIFT and ENTER. The result:

| |A |B |C |D |

|1 |divorces |FREQUENCY | | |

|2 |0 |4 | |0 |

|3 |0 |3 | |1 |

|4 |1 |2 | |2 |

|5 |4 |0 | |3 |

|6 |2 |1 | |4 |

|7 |1 |0 | |5 |

|8 |1 | | | |

|9 |0 | | | |

|10 |0 | | | |

|11 |2 | | | |

Thus, the value 0 occurs 4 times, 1 occurs 2 times, 2 occurs twice, 3 never occurs, 4 occurs onces, zero 5's, and of course, zero 6's, etc.

4.2 Relative Frequency

Recall that the relative frequency is the frequency relative to the sample size. Thus, for example, the relative frequency of the value of 0 divorces represents the percent of the sample which has this value of divorces. In this case, the relative frequency of 0 is 4/10 = .4 0, or 40% of the people in our sample have a value of 0 divorces. To derive the relative frequency, simply use column C, and perform a basic new-variable derivation, as detailed in Section 1.

Type "REL. FREQ." in cell C1. Click on cell C2, type

=B2/10

then hit ENTER. Excel will derive 4/10 for you, and present .40. To derive the rest of the relative frequencies, click on cell C2, hold the cursor over the small black box in the lower right corner, click, hold and drag down over cells C3 - C7. The result, after formatting the numerical values to have 2 decimal places, is

| |A |B |C |D |

|1 |divorces |FREQUENCY |REL. FREQ. | |

|2 |0 |4 |0.40 |0 |

|3 |0 |3 |0.30 |1 |

|4 |1 |2 |0.20 |2 |

|5 |4 |0 |0.00 |3 |

|6 |2 |1 |0.10 |4 |

|7 |1 |0 |0.00 |5 |

|8 |1 | | | |

|9 |0 | | | |

|10 |0 | | | |

|11 |2 | | | |

Clearly, the relative frequencies must sum to 1.00 (one).

4.3 Histogram

A simple method of graphically representing an array of relative frequencies if the histogram. We will use Excel's ability to create graphs to build a histogram: further details on graphs and figures is presented in Section 8.

4.3.1 Create the Rough Figure Shape

First, highlight the cells that contain the relative frequency information, as well as the label "REL. FREQ.":

| |A |B |C |D |

|1 |divorces |FREQUENCY |REL. FREQ. | |

|2 |0 |4 |0.40 |0 |

|3 |0 |3 |0.30 |1 |

|4 |1 |2 |0.20 |2 |

|5 |4 |0 |0.00 |3 |

|6 |2 |1 |0.10 |4 |

|7 |1 |0 |0.00 |5 |

|8 |1 | | | |

|9 |0 | | | |

|10 |0 | | | |

|11 |2 | | | |

Next, go to the task bar at the top of the screen, click on the little colored bar-chart that is two buttons to the left of the screen percent size [e.g. 100%]. A pop-up box appears. Click on "Standard Types", then "Column". Then, click on "Next": Excel will show you the basic set-up of the resulting chart. Click on "Next" again: a pop-up box allowing for substantial alterations of the basic outlook of the bar-chart appears. Type in the Title as "Relative Frequency of Divorces", type is Category X as "Number of Divorces", and type in Category Y as "f(x)/n". If you want to remove the grid-lines, click-on Gridlines, and click-on any box with a check-mark in it: this will remove the mark, and remove the associated grid-lines. Because the legend (the graph box that describes the variable) is redundant, click-on Legend, then click off the mark on "Show Legend". Finally, click on Finish at the bottom of the pop-up box. The result is the following histogram:

4.3.2 Refine the X-axis

Observe that unless we tell Excel otherwise, it will automatically display X-axis values starting at 1: For our histogram purposes, however, this is in error: the smallest number of divorces possible is, of course, zero.

In order to specify to Excel the correct X-axis array of values, in our case 0 – 5, perform the following task. First, recall that column D contains the possible values, 0 – 5. Now, point the cursor over the white-area of the chart, and click once: this highlight the chart with small black dots around the border. Go to the task-bar at the top of the screen, click on CHART, SOURCE DATA, then SERIES. At the bottom of the pop-up box, there is a white-area next to “Category (X) Axis”. Click once in this area: this places Excel’s cursor in the box. Now, move the mouse cursor over the cell that contains the first possible value, 0: in our case, the cell is D2. Click once and hold, dragging the cursor down over all cells that contain possible, cells D2 – D7. Once you reach cell D7, let go of the mouse button: Excel will automatically fill in the cell range D2 – D7 into the white area to the right of “Category (X) Axis”. Once this is done, click on OK at the bottom of the pop-up box.

The result is as follows:

See Section 8 for further information about graph/chart/figure formatting.

5. Probability Distribution Functions

Excel provides a simple means for deriving the probability distribution functions or cumulative distribution functions for a variety of discreet and continuous random variables. In this section, we will study Excel’s built in mathematical operations relating to the pdf’s of the binomial, the poission, normal and Student’s t distributions.

5.1 Discreet Distribution Functions

5.1.1 Binomial Distribution

Consider the following set-up. We have 5 people drawn from a population in which p = 35% are smokers. We want to tabulate the probability distribution function of the number of smokers X in our group of n = 5 people. The easiest way to derive the probability distribution function is by first creating a column containing all possible values of X, in this case (0,1,2,3,4,5):

| |A |B |C |

|1 |X | | |

|2 |0 | | |

|3 |1 | | |

|4 |2 | | |

|5 |3 | | |

|6 |4 | | |

|7 |5 | | |

Now, in cell B1 type P(X). In cell B2, type the command

=binomdist(A2, 5, .35, false)

Then hit ENTER. The command “false” indicates that we want a probability distribution function, and NOT a cumulative distribution function. The result is

| |A |B |C |

|1 |x |PX(X=x) | |

|2 |0 |0.116029 | |

|3 |1 |0.312386 | |

|4 |2 |0.336416 | |

|5 |3 |0.181147 | |

|6 |4 |0.048770 | |

|7 |5 |0.005252 | |

Consider, next, deriving the cumulative distribution function, F(X < x). Type FXX < x) in cell C1, click on cell C2, and type the command

=binomdist(A2, 5, .35, true)

Next, click on cell C2, the hold the cursor above the small black box, click and drag over cells C3-C7. The result, after formatting the cells to show 4 decimal places, is the complete cdf for x = (0,1,2,3,4,5):

| |A |B |C |

|1 |x |PX(X=x) |FX( x) |

|2 |0 |0.1160 |0.1160 |

|3 |1 |0.3124 |0.4284 |

|4 |2 |0.3364 |0.7648 |

|5 |3 |0.1811 |0.9460 |

|6 |4 |0.0488 |0.9947 |

|7 |5 |0.0053 |1.0000 |

Graphically, we can employ the techniques discussed in Sections 4 and 8 for creating a bar-chart of the pdf and cdf. Consider the pdf contained in cells C2-C7. Highlight the cells C2-C7, and create a bar-chart using said techniques. Remember to change the X-axis values: our random variable’s possible values start at 0, and the cells that contain the possible values, in our case, are A2-A7. The result is:

5.1.2 Poisson

The poisson distribution is helpful for modeling the probability of discreet occurrences of some event, x. In particular, the random variable must be zero or positive and discreet. For example, the number of times a volcano erupts in a year can be assumed to be a poisson random variable.

Recall the probability distribution function of a poisson random variable of x occurrences:

[pic]

where it can be shown the parameter ( satisfies

[pic]

Excel provides a means for deriving P(x) based on x and the mean, (. In this case, you can the values of x and ( contained in cells, or you can simply use Excel to derive a probability by typing the values in directly. Consider the latter quick method, and consider a blank grid spread:

| |A |B |

|1 | | |

|2 | | |

|3 | | |

|4 | | |

Suppose you want to derive P(x = 5) for a poisson random variable x with mean ( = 2. Click on any cell, say A1, and type

=poisson(5,2,false)

where "false" dictates that we want the probability at the point x = 5, and NOT the cumulative distribution up to x = 5. Hit ENTER, and Excel displays the odds of such an event:

| |A |B |

|1 |.036089 | |

|2 | | |

|3 | | |

|4 | | |

If you have an array of values x from the same poisson distribution (i.e. with the same mean and variance, (), then type in cell values instead of numerical values. For example, consider the following array of x's from a poisson distribution where ( = 2:

| |A |B |

|1 |x | |

|2 |0 | |

|3 |1 | |

|4 |2 | |

|5 |3 | |

Click on cell B2, and type the command

=poisson(A2,2,false)

Then hit ENTER. Subsequently, point and click on cell B2, aim the cursor above the small black box, click and drag down over the remaining cells, B3-B5. The result is P(x) for x = 0,1,2,…,9:

| |A |B |

|1 |x | |

|2 |0 |0.135335283 |

|3 |1 |0.270670566 |

|4 |2 |0.270670566 |

|5 |3 |0.180447044 |

|6 |4 |0.090223522 |

|7 |5 |0.036089409 |

|8 |6 |0.012029803 |

|9 |7 |0.003437087 |

|10 |8 |0.000859272 |

|11 |9 |0.000190949 |

We can use the graphical methods described in Section 4 on histograms to create a figure for the above probability distribution. Highlight the cells containing the probabilities, B2-B11, and using the methods described in Section 4 (for the title, and x-axis and y-axis labels), create a bar graph. We obtain:

Intuitively, for a poisson random variable x with a mean of 2, there will be great odds that the actual value of x will be near 1 (i.e. 0,1,2,3,4). Hence, the pdf is stacked near 2 (high probability values at 0,1,2,3,4).

For a cumulative distribution value, simply use "true" instead of "false":

=poisson(A2,2,true)

The result is:

| |A |B |

|1 |x | |

|2 |0 |0.135335 |

|3 |1 |0.406006 |

|4 |2 |0.676676 |

|5 |3 |0.857123 |

|6 |4 |0.947347 |

|7 |5 |0.983436 |

|8 |6 |0.995466 |

|9 |7 |0.998903 |

|10 |8 |0.999763 |

|11 |9 |0.999954 |

A simple bar-chart plot of the CDF, above, follows:

5.2 Continuous Distribution Functions

For all subsequent probability derivations, the only changes from the above methods in the particular command required. We can derive continuous probabilities P(x ( a) for normally or t-distributed random variables x by typing in the value a directly, or by using cells that contain various a's.

5.2.1 Normal Distribution

5.2.1.1 Probabilities from any Normal Distribution

Let

[pic]

and suppose we want to derive

[pic]

For example, let ( = 2 ,(2 = 16 and a = 3. For direct computation, directly type in any cell

=normdist(3,2,16,true)

where "true" denotes that we want the cumulative distribution up to the value a. The result is .524918.

Using the same procedure as with Binomial or Poisson random variables, we can instead use cell values that contain a rather than directly typing in a.

5.2.1.2 Probabilities from The Standard Normal

Let

[pic]

and suppose we want to derive

[pic]

For example, let a = 1.23. For direct computation, directly type in any cell

=normsdist(1.23)

The result is .89065.

5.2.1.3 Cut-off Values of the Standard Normal

Suppose instead of a probability, for given a, we want some a for a given probability p. For example, suppose we want some a such that

[pic]

In any cell in Excel, type

=normsinv(.95)

The result is 1.644853.

5.2.2 Student's t Distribution

5.2.2.1 Probabilities from the t-Distribution

Excel provides a built-in operation for deriving "tail" probabilities of t-distributed random variables, x. We define a "one-tail" probability as, for example,

[pic]

or

[pic]

and a "two-tailed" probability as

[pic]

Suppose we want to derive the two-tailed probability

[pic]

which is identical to

[pic]

for a t distributed random variable, x, with 5 degrees of freedom:

[pic]

For a cumulative probability derivation of a t distributed random variable, x, use the following command:

=tdist(1.75,5,2)

where the final "2" denotes that we want a two-tailed derivation rather than a one-tailed probability. The result is .140522.

For a one-tailed probability,

[pic]

use

=tdist(1.75,5,1)

and we obtain half of the two-tailed probability, .070261.

5.2.2.2 Two-Tailed Cut-off Values from the t-Distribution

As in the normal case, we have an inverse function for t-random variables that returns some a for a given probability p. For example, suppose we want some a such that

[pic]

In Excel, type

=tinv(.01, 15)

The result is 2.787438.

6. Sample Bivariate Analysis: Sample Covariance and Correlation

A primary concern in the analysis of data is not measures of central tendency and dispersion, per se. Rather, we are interested in how economic phenomena relate to each other. For example, we might ask whether income tax increases will diminish net tax revenues collected and therefore threaten the solvency of certain welfare programs. This topic is ultimately left for Econ. 120B and 120C, Econometrics 1 and 2, however the initial step entails the analysis of simple linear relationships by measures of covariance and correlation. Details on the theory of covariance and correlation can be found in Newbold, Chapter 4.

6.1 Sample Covariance

Consider the labor data again, without means, variances and standard deviations present:

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN | 38.50 |11.74 |15.00 |37.25 |24856.00 |

|8 |VARIANCE |75.67 |45.76 |6.67 |160.92 |392402677.33 |

|9 |STAND. DEV. |8.70 |6.76 |2.58 |12.69 |19809.16 |

We would like to measure the degree of linear of dependence between work hours, wages education and age.

For the sample covariance between the relevant variables, we need to set up an area in the spread-sheet that can contain this information. Here, we will place the covariance information below the mean and variance information. Type in cell A11 the title "COVARIANCE", and re-type the variable names in cells C11 - F11 and B11 - B14:

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN | 38.50 |11.74 |15.00 |37.25 |24856.00 |

|8 |VARIANCE |75.67 |45.76 |6.67 |160.92 |392402677.33 |

|9 |STAND. DEV. |8.70 |6.76 |2.58 |12.69 |19809.16 |

|10 | | | | | | |

|11 |COVARIANCE | |hours |wage |ed |age |

|12 | |hours | | | | |

|13 | |wage | | | | |

|14 | |ed | | | | |

|15 | |age | | | | |

We will fill out the table completely in order to make a point below. Click-on cell C12 in order to derive the sample covariance between hours and hours. Type the command

=covar(B2:B5, B2:B5)

Excel will use the data in cells B2 - B5 (i.e. work hours) to derive the covariance between hours and itself. Type ENTER to complete the task. Note that the sample covariance between a variable and it self is simply a sample estimate of the variance. However, it is not an unbiased sample estimate of the variance. Specifically, Excel uses the following formula for sample covariance:

[pic]

Thus, the sample between, say, x and x is

[pic]

which is a biased estimator of the variance. The unbiased variance estimator that Excel uses is

[pic]

Next, perform the task for the covariance between hours and wages. Click-on cell D12, and type

=covar(B2:B5, C2:C5)

Then type ENTER. Excel will use the data for work hours (cells B2 - B5) and the data for wages (cells C2 - C5) to derive the sample covariance. Repeat this task for all other combinations of variables (hours, ed), (hours,age), (wage,wage), (wage,ed), (wage,age), (ed,ed), (ed,age) and (age,age). We find, after formatting the numbers to have four decimal places

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN | 38.50 |11.74 |15.00 |37.25 |24856.00 |

|8 |VARIANCE |75.67 |45.76 |6.67 |160.92 |392402677.33 |

|9 |STAND. DEV. |8.70 |6.76 |2.58 |12.69 |19809.16 |

|10 | | | | | | |

|11 |COVARIANCE | |hours |wage |ed |age |

|12 | |hours |56.7500 |26.1063 |14.5000 |13.3850 |

|13 | |wage | |34.3217 |8.1875 |0.0031 |

|14 | |ed | | |5.0000 |14.7500 |

|15 | |age | | | |120.6875 |

Now, recall that the covariance is symmetric, hence

[pic]

which means we can ignore the rest of the cells. For example, the sample covariance between hours and wages in cell D12 is 26.1063. Likewise, the sample covariance between wages and hours would be 26.1063, which could redundantly be placed in cell C13. Thus, the above covariance "table" is considered complete.

6.2 Sample Correlation

Consider the correlation between hours and wages. Deriving the sample correlation is performed in a manner identical to the derivation of the covariance, except we use the command in cell D18 (see the presentation, below)

=correl(B2:B5, C2:C5)

Like covariance, the sample correlation is symmetric, thus we do need to find the correlation between wages and hours. Recall that the correlation between a variable and itself is simple 1, thus you only need to perform the operation for couples of different variables. Consider using the cells beneath the covariance cells. We can type "CORRELATION" in cell A17, and use the cells to the right to fill in the correlation derivations. After formatting the cells to allow for 5 decimal places, we obtain the following:

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN | 38.50 |11.74 |15.00 |37.25 |24856.00 |

|8 |VARIANCE |75.67 |45.76 |6.67 |160.92 |392402677.33 |

|9 |STAND. DEV. |8.70 |6.76 |2.58 |12.69 |19809.16 |

|10 | | | | | | |

|11 |COVARIANCE | |hours |wage |ed |age |

|12 | |hours |56.7500 |26.1063 |14.5000 |13.3850 |

|13 | |wage | |34.3217 |8.1875 |0.0031 |

|14 | |ed | | |5.0000 |14.7500 |

|15 | |age | | | |120.6875 |

|16 | | | | | | |

|17 |CORRELATION | |hours |wage |ed |age |

|18 | |hours | |.59153 |.86080 |.16161 |

|19 | |wage | | |.62500 |.00005 |

|20 | |ed | | | |.60045 |

|21 | |age | | | | |

7. Interval Estimation

Consider the task of estimating a 95% Confidence Interval for the true mean ( of some variable. Recall that the confidence interval itself is

[pic]

where K satisfies

[pic]

Excel assumes the variable in question is either normally distributed, or the sample size is large enough so that, by the Central Limit Theorem (see Newbold, Chapter 6), the sample mean will be nearly normally distributed .

After some simplification (see Newbold, Chapter 8), we can solve directly for K based on Excel's distributional assumptions:

[pic]

Of course, we do not know the true standard deviation, thus we must give Excel a sample estimate of (.

In order to calculate a 95% confidence interval length, K, perform the following. Suppose the sample size is 4, as it is in the previous section, and suppose an estimate of the standard deviation is contained in cell B9, as it is for the variable hours, above. Click-on an empty cell (say, cell B24 for the variable hours), the type

=confidence(.05, B9, 4)

then hit ENTER.

Notice that the above command is specifically contoured to our needs. The general command format is

=confidence((, (, n)

where ( satisfies the (1 - ()% confidence interval size, ( denotes the true standard deviation, and n denotes the number of individuals in our group.

Excel will derive K and place it in the chosen cell. For clarity, we can type "95% CI: K" in cell A24, “95% Lower” in A25 and “95% Upper” in A26. Notice that we re-typed the variable names for ease of reference. The result is

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN |38.50 |11.74 |15.00 |37.25 |24856.00 |

|8 |VARIANCE |75.67 |45.76 |6.67 |160.92 |392402677.33 |

|9 |STAND. DEV. |8.70 |6.76 |2.58 |12.69 |19809.16 |

|10 | | | | | | |

|… | | | | | | |

|22 | | | | | | |

|23 | |hours |wage |ed |age |income_wage |

|24 |95% CI: K |8.524516 | | | | |

|25 |95% Lower | | | | | |

|26 |95% Upper | | | | | |

Thus, the 95% CI for the sample of hours is

[pic]

We can directly create the lower bound by typing in cell B25

=B7 – B24

and we can directly create the upper bound by typing in cell B26

=B7 + A24

We can now proceed as usual to perform the same CI operation on the rest of the cells, C24 - F24. As usual, click-on cell B24, place the cursor above the small black-box in the lower right corner, click, hold and drag over C24 - F24. The result, after formatting the cells to 4 decimal places, is

| |A |B |C |D |E |F |

|1 | |hours |wage |ed |age |income_wage |

|2 | |34 |7.5 |12 |20 |13260 |

|3 | |50 |20.8 |18 |37 |54080 |

|4 | |40 |5.75 |16 |50 |11960 |

|5 | |30 |12.90 |14 |42 |20124 |

|6 | | | | | | |

|7 |MEAN |38.50 |11.74 |15.00 |37.25 |24856.00 |

|8 |VARIANCE |75.67 |45.76 |6.67 |160.92 |392402677.33 |

|9 |STAND. DEV. |8.70 |6.76 |2.58 |12.69 |19809.16 |

|10 | | | | | | |

|… | | | | | | |

|23 | |hours |wage |ed |age |income_wage |

|24 |95% CI: K |8.5245 |6.6294 |2.5303 |12.4313 |19412.5878 |

|25 |95% Lower |29.9755 |5.1106 |12.4697 |24.8187 |5443.4122 |

|26 |95% Upper |47.0245 |18.3694 |17.5303 |49.6813 |44268.5878 |

|27 | | | | | | |

Comment:

For a 99% CI, replace the ".05" with ".01".

For a 90% CI, replace the ".05" with ".10".

8. Graphics: Charts, Tables and Figures

8.1 Scatter Plots: Two Variable Relationships

A scatter-plot present points (X,Y) in two-dimensional Euclidean space. For example, in the (WAGE,ED) space, a point would represent an individuals hourly wages and level of education:

ED

15

14

13

12

11

10

9

8

4 5 6 7 8 9 10 11 12 13 14 15 16 WAGE

Consider a sample of n = 10 years of aggregate U.S. health care expenditure, EXP_HEALTH, and real income, INCOME.

| |A |B |

|1 |INCOME |EXP_HEALTH |

|2 |9.3 |0.998 |

|3 |11.2 |1.499 |

|4 |17.1 |4.285 |

|5 |13.8 |1.573 |

|6 |10.9 |2.021 |

|7 |15.3 |2.26 |

|8 |12.8 |1.953 |

|9 |14.6 |2.103 |

|10 |21.2 |3.428 |

|11 |19.3 |2.277 |

We want to create a scatter-plot with income on the X-axis and health care expenditures on the income on the X-axis and health care expenditures on the Y-axis.

1. Click on cell A1 (the label INCOME), hold and drag over cells A1-A11 and B1-B11. That is, highlight the labels and data for expenditure and income:

| |A |B |

|1 |INCOME |EXP_HEALTH |

|2 |9.3 |0.998 |

|3 |11.2 |1.499 |

|4 |17.1 |4.285 |

|5 |13.8 |1.573 |

|6 |10.9 |2.021 |

|7 |15.3 |2.26 |

|8 |12.8 |1.953 |

|9 |14.6 |2.103 |

|10 |21.2 |3.428 |

|11 |19.3 |2.277 |

2. Next, go to the task bar at the top of the screen, click on INSERT, then CHART. A pop-up bax appears: for a scatter-plot, on the left side of the pop-up box, click on “standard types”, then “XY (Scatter)”. Click on NEXT at the bottom of the box: Excel will show you the graph with its default style. We will add a title, axes labels and remove the grid-lines, next.

3. Click on NEXT again. Click on TITLE, and type “Income versus Health Expenditure”. Under Category (X) axis, type “Income”; under Value (Y) axis, type “Health Expenditure”. Click on GRIDLINES, and remove any check-marks by clicking on them: this will turn off all gridlines. Finally, click on LEGEND, and remove the check-mark in the legend box by clicking on it.

The result is the following polished scatter plot:

Comment

1. Notice that a very clear linear relationship seems to exist: as real income increases, consumers can afford more health care and subsequently purchase more health care. Indeed, the sample correlation between the two economic variates is .74, a strong positive value.

2. For editing (point colors, X-axis, background color, etc.), see the discourse under Time Series, below).

8.2 Time-Series: Line Plots: Multiple Variable Representations, PDF's, CFD's

Often, or data exists over time. Examples include GNP, the CPI, production indices, inflation rate, exchange rates, asset returns and stock market indices, etc. It is often usual to present such information in a concise manner on the same graph.

Consider data U.S. GDP (in billions $), the money supply MS (in billions $) and gross private domestic investment GPDI:

| |A |B |C |D |

|1 |Year |GDP |MS |GPDI |

|2 |1970 |1010.7 |628.1 |150.3 |

|3 |1971 |1097.2 |717.2 |175.5 |

|4 |1972 |1207 |805.2 |205.6 |

|5 |1973 |1349.6 |861 |243.1 |

|6 |1974 |1458.6 |908.6 |245.8 |

|7 |1975 |1585.9 |1023.3 |226 |

|8 |1976 |1768.4 |1163.7 |286.4 |

|9 |1977 |1974.1 |1286.6 |358.3 |

|10 |1978 |2232.7 |1388.7 |434 |

|11 |1979 |2488.6 |1496.7 |480.2 |

|12 |1980 |2708 |1629.5 |467.6 |

|13 |1981 |3030.6 |1792.9 |558 |

|14 |1982 |3149.6 |1951.9 |503.4 |

|15 |1983 |3405 |2186.1 |546.7 |

|16 |1984 |3777.2 |2374.3 |718.9 |

|17 |1985 |4038.7 |2569.4 |714.5 |

|18 |1986 |4268.6 |2811.1 |717.6 |

|19 |1987 |4539.9 |2910.8 |749.3 |

|20 |1988 |4900 |3071.1 |793.6 |

|21 |1989 |5250.8 |3227.3 |832.3 |

|22 |1990 |5522.2 |3339 |799.5 |

|23 |1991 |5677.5 |3439.8 |721.1 |

In the following discourse, we will look at various graphical presentations of the above data.

8.2.1 Smooth Lines

Creating the Figure

Highlight the data and labels by clicking on cell A1, hold and drag over cells A1-D23. The "year" variable has already been formatted to be understood by Excel to be, in fact, a year. When we include this variable in a graph, Excel will not plot it: rather, Excel will use the year-formatted data for a time-line on the X-axis.

Next, go to the task bar, click-on INSERT, then CHART, then COSTUM TYPES, scroll down, and finally click-on Smooth Lines.

Click on NEXT: Excel shows you the default style of the figure. Click on NEXT again: a pop-up box appears for title and label editing. Click on TITLE, and type under Chart Title, "U.S. Macro Data: 1970-1991", or whatever you deem appropriate. Under Value (Y) Axis, type "Billions of $": this will provide a Y-axis label explaining that the data is in billions of U.S. dollars (i.e. GDP = 1010.7 denotes 1010.7(billion = 1.0107 trillion dollars).

Finally, click on FINISH. Excel will present you the following figure:

8.3 Editing Graphs and Figures

Consider the figure we created for Gross Domestic Product and Investment data:

The year orientation is difficult to read, and we might want to change the yellow color of the line representing GDPI over time to some darker hue. Also, for style (according to your personal interests as well as readability for the viewer of your figure), we may want to remove the dark background color, or eved remove the outline boundary line of the entire figure, as well as re-locate the legend, and change the font of the test used.

Line Color

In order to change a line's color, double click on the line itself, and a pop-up box appears. Click on PATTERN, then COLOR, and chose a darker hue: here, we will choose a shade of green.

Background Area

In order to remove the dark background, double click on the dark background itself, go to AREA and click on NONE.

Borders

I prefer figures without outer boundaries. To create this effect, double click-on the white-area between the outer box and the inner box. A pop-up box appears: go to BORDER, and click on NONE. Also, we can remove the border around the legend: double-click on the legend, in the pop-up box, under BORDER, and click on NONE,

X-Axis Label and Orientation

Finally, to orient the year denominations, double click on the X-axis itself, then click on ALIGNMENT, and notice that a little line appears sprouting from the word Text . Click on this line, hold and drag the line to an upward orientation. If you want the numerical values on the Y-axis to display dollar signs, double click on the Y-axis, click on Number, Currency, go to Symbol, and click on $. Next to Decimal Places, because we like round numbers on the axes, type 0. We obtain:

Fonts

If you prefer a different than the default font (usually 12-point), click ONCE on the white-area between the outer and inner boxes. Excel will "highlight" the figure by placing small black boxes in the outer corners. Next, at the top of the screen, in the task bar, change the font size to 10, or 8, or what-have-you. I prefer 10-point font.

Using the Left and Right Y-Axes

Finally, some data is difficult to see because we represent it on the same Y-axis as data with a very different size structure. For example, above, clearly GDP is much larger than GDPI, and therefore some of the small year-to-year nuances of GDPI is lost in the large Y-axis used for GDP. There are, however, two Y-axes, the left and standard Y-axis, and the right Y-axis. Consider placing GDPI on its own Y-axis, to the right. Click on the green, GDPI, line: a pop-up box appears. Click on Axis, then click on Secondary. Excel will orient the GDPI data on the right-side Y-axis, using a more convenient range of numerical values that befits the smaller values of the GDPI. Now, we can see specific year-to-year changes that were somewhat over-shadowed, above. As before, we can double click on the right Y-axis, click on Number, Currency, go to Symbol, and click on $.

Figure Title

If you want to enhance the figure title (increase font size, use italics, bold, etc.), click on the title once, place the cursor anywhere above the title characters (the cursor should change in shape to a little line), and click. Excel will now allow you to edit the title. You can highlight the title, and using the task bar at the top of the screen, change the characters to italics, bold, change the font, etc. We will change the font to 12-point and use bold:

-----------------------

(

(

[pic]

[pic]

[pic]

[pic]

[pic]

( This point denotes an individual's (WAGE, ED) = ($5.25/hour, 13 years)

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download