Amalgamation: Aggregating Data to New Boundaries



Amalgamation: Aggregating Data to New Boundaries

Over the past several years, many provinces have amalgamated municipalities, forming new, larger Census Subdivisions (CSDs). As a result, data from previous censuses at the CSD level may not match data from the 2001 census. In this workshop we will use 1996 aggregate census data to determine if and how amalgamation has changed the look of our Ontario municipalities. In particular, we will examine the percentages of the population with advanced degrees (MAs and above), the percentage of immigrants and the average value of homes in the old and new CSDs. Because the 2001 Census data have yet to be released, we will simply compare the old CSDs with the new.

In future workshops, we will compare across censuses and CSDs.

The objective of this workshop is to review previous concepts such as working with aggregate data, computing new variables and matching files and introducing you to the theory and practice of aggregating data using SPSS.

To do this, you will be using data from the 1996 Census, Basic Summary Tables (BSTs) and Census Profiles which have been converted from Beyond 20/20 to SPSS. [1]

Note: These exercises use drive and path names for Carleton University Library’s laptop. You will be given the correct drive and path names at the workshop. Please do not assume that, if you use the drive and path names below, the program will work. By the same token, if you use this as a guide back at your institution, you will have to modify the drive and path names to suit your setup.

We will be using two separate SPSS files for these exercises. One contains data from the 1996 Census BSTs and Profiles; the second is a file with concordances between amalgamated and dissolved CSDs for Ontario.[2]

Merging Files

In last year’s workshop, you learned how to merge split files. This is essentially the same procedure.

To begin, double-click on the SPSS icon, then click on File, Open and Data.

Choose ed_im_house.sav. [3]

Hold your cursor over the first column of numbers, oldcode. This is the old CSD code and will provide the variable on which the files will be matched. To do this, go to Data, Merge Files and Add Variables.

Highlight dissolveont.sav.

This should be becoming familiar to some of you. Tick Match cases on key variable in sorted files and Working file is keyed table. Finally, highlight oldcode[+] and using the bottom arrow button, movie it into the Key Variables box.

Click OK.

Note: both files must be sorted so that oldcode is in ascending order. If this is not the case, you will receive an error message stating that the file(s) are out of order, whereupon, you must sort and save both files. You will receive the following warning whether or not the files are sorted.

Click OK.

The matched file will contain all the variables originally in both ed_im_house.sav and dissolveont.sav. However, only those cases for which there is a match will appear in the new file. To be safe, check to see if you have all the variables in your new file and save it with a different name. Then, if you make an error in the next bit, you will not have to begin from scratch.

Beginning to Answer the Research Questions

Now we can finally get around to answering the research questions. We want to know if the percentage of people with advanced degrees, the percentage of immigrants and the average value of a home went up or down in three amalgamated municipalities—Hamilton, Ottawa and Toronto.

The first thing to do is to compute new variables to calculate the percentages based on the old CSD codes (oldcode). Remember that percentages are calculated by dividing the number of those in the target group (advanced degrees and immigrants) by the total number in the group and multiplying the result by 100.

Click on Transform and Compute.

You will need to put the formula into the dialogue box. To do that, first give the variable a name in the Target Variable box. Then, highlight the variables and move them over, or simply type the following into the box:

(v18+v19+v20+v21+v22+v23) / (v2+v3)*100

Where v18, v20 and v22 =males with advanced degrees; v19, v21 and v23=females with advanced degrees, v2=total males over 15 and v3=total females over 15. The brackets ensure that the addition is done before the multiplication and division.

Click OK and move your cursor over to the end of the file. You will see that the percentages are relatively small. Look for our three CSDs and fill in the following table:

CSD name CSD number Percent Advanced Degree

Ottawa ___________ ____________

Toronto ___________ ____________

Hamilton ___________ ____________

One municipality (not one of those above) has the highest concentration of advanced degrees in the province—more than 25%. Hint, it’s in the first 50 cases. What is it? _____________________

The second part of the question was to discover the percentage of immigrant population in each municipality. To do that, we repeat the exercise above. Because immigration is not broken down by any other variables, it’s a less-complicated equation.

Again, go to Transform and Compute.

Click OK and again move the cursor to the end of the file to make sure your computation has worked. The percentages here will, of course, be much higher.

Fill in the following table.

CSD name CSD number Percent Immigrant

Ottawa ___________ ____________

Toronto ___________ ____________

Hamilton ___________ ____________

There are three municipalities where the majority of the population are immigrants. These are:

_________________ _______________ ________________

Hint: These are all near Toronto both geographically and in the file.

Finally, we want to look at the average value of a home. This is the simplest of the three exercises, as we are already beginning with the average. All we need to do here is to report it.

But, just so life isn’t altogether easy, we will list the average house price of all the old CSDs making up the new municipalities.

Ottawa (New CSD code 3506000)

Previous CSD Average Dwelling Value

Osgoode _________________

Cumberland _________________

Gloucester _________________

Vanier _________________

Rockcliffe Park _________________

Nepean _________________

Ottawa _________________

Rideau _________________

Goulbourn _________________

Kanata _________________

West Carleton _________________

Toronto (New CSD code 3520000)

Previous CSD Average Dwelling Value

Scarborough _________________

Toronto _________________

East York _________________

North York _________________

York _________________

Etobicoke _________________

Hamilton (New CSD code 3525000)

Previous CSD Average Dwelling Value

Stoney Creek _________________

Glanbrook _________________

Ancaster _________________

Hamilton _________________

Dundas _________________

Flamborough _________________

As you can see, there is a fairly wide range of values here, none more so than between Vanier and Rockcliffe Park in the new municipality of Ottawa. This will have to be taken into account when we aggregate the file.

Aggregating Data

By now you are likely wondering if we’re ever going to get around to the topic of this workshop. Rest assured that you have already completed the difficult part. The remainder should be fairly straightforward. But we will need to compute one more variable before we begin the process.

Because we only have the average value of a dwelling, we can’t simply aggregate that. As we saw with Ottawa, Rockcliffe Park is far above the average, and if we simply averaged the averages, the results would be misleading. Thus, we need to go back to the totals. To do that, we simply multiply the average value of dwellings by the number of dwelling units. Once again, we need to compute a new variable to determine the total value of all dwellings in the old CSDs.

To do that, go to Transform and Compute.

Here, you will need to compute tot_val (or whatever you wish to call the variable) using the variables num_own (the number of owned dwellings in the CSD) and av$house (the average value of a dwelling in the CSD).

Here the equation is tot_val = num_own * av$house

Fill in the variable names in the appropriate spots and click OK.

Once this has been done successfully, save the file (use the same name).

A Brief Digression

Next, it is useful to make a quick digression into SPSS help.

If you are confronted with a new situation or procedure in SPSS, simply click on Help and Topics and type the name of the procedure in the dialogue box.

Click on Display and you will be presented with the syntax for the command.

Don’t despair if you don’t understand this. There’s more help, but it’s buried. Click on See Also….

Highlight Aggregate Data (you’ve already seen the syntax) and click Display. You will receive a short explanation of the procedure.

From here, click How To, and you will be led through the SPSS for Windows procedure, step-by-step. This help works the same way for most procedures.

Following the instructions above, go to Data and then Aggregate.

The variable newcode is the new CSD code for the amalgamated municipalities. The variables to be aggregated are those you used in the previous exercise. Move v2, v3, v18 – v23, total, immig, num_own and tot_val into the bottom box.

You will want to Create a new data file rather than writing over the old one (just in case). You can use the file name they suggest or change it. You must, however, select a Function. In this case, since we want to add the populations together, it will be the Sum of values.

Click Continue. You will be returned to this screen.

Click OK.

Since you created a new data file, you will need to close (make sure you save your changes!) the file you are working on and open the new file.

Go to File, Open, Data and you’ll see:

If you used the file name supplied by SPSS, you’ll see AGGR.SAV in the dialogue box. Open it. It should look like this:

You will notice that we have brought only those variables needed to recreate the needed percentages (both numerators and denominators).

You will have to remember what the variable names mean, as the labelling is lost in the conversion. The names are, however similar—they simply have a 1 added to the end. Because of the eight-character limitation on SPSS variable names, tot_val and num_own have been truncated to tot_va_1 and num_ow_1.

To calculate the percentages, repeat the steps as before, remembering to add a _1 to the end of each variable name. For example, the first equation would be:

(v18_1+v19_1+v20_1+v21_1+v22_1+v23_1)/(v2_1+v3_1)*100

Once you have calculated the percentages fill in the following tables:

New CSD % adv % adv

CSD name number before after % change

Ottawa 3506000 ________ ________ _______

Toronto 3520000 ________ _________ _______

Hamilton 3525000 _________ _________ _______

New CSD % Imm % Imm

CSD name number before after % change

Ottawa 3506000 ________ ________ ________

Toronto 3520000 ________ _________ ________

Hamilton 3525000 _________ _________ ________

To calculate the average house price in the new cities, you will have to compute one final variable, using num_ow_1 and tot_va_1. If we simply used the average function, the few, very expensive Rockcliffe dwellings would have the same weight as the many, many less-expensive dwellings in the rest of the new Ottawa (as an example). That is why you will need to return to the now overly-familiar Transform/Compute partners and compute av$house.

Click OK.

Now complete the following table:

New CSD Dwelling value Dwelling value

CSD name number before after

Ottawa 3506000 _____________ _____________

Toronto 3520000 _____________ _____________

Hamilton 3525000 _____________ _____________

Had you not recalculated the average house price, the Ottawa value would have been $ 205,827.50—far above the actual 1996 average. Differences in the other two new cities would not have been as dramatic, but would have existed. In short, the numbers would have been incorrect.

Looking at the old and new CSDs, has there been a marked change in any of the three new cities? _____________________ If yes, in which one(s)?

______________________ In which of the three characteristics?

________________________

The percentage of immigrants went up in Toronto, but down in Ottawa and Hamilton. Why do you think this was? __________________________

In a Nutshell

• Use aggregation to:

publish analytic results

• anonymize data

• change the unit of analysis

• summarize over time (create time series)

• summarize over geography (create geo-referenced data)

• Break variables are those which will form the unit of analysis.

• Select functions carefully:

• use sum when you need to find a total number

• use means only when you have n’s for both cells and totals

• always recreate (compute) totals if using rates, percents or averages

-----------------------

[1] While we will not go through the steps in converting the file, the procedure will be outlined in a step-by-step fashion, on the website housing the DLI Ontario training materials: and, if there is interest, may by the topic of a workshop in 2003.

[2] This file, for all of Canada, will be available in Excel and SPSS formats on the DLI ftp site.

[3] This is a combination of 1996 Census BSTs, 95F0226XDB96001 and 95F0223XDB96001, and the CSD profiles, all of which have been pared down considerably for the purposes of this exercise.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download