Lab 7: Spatial Analysis I: Classification



Lab 7: Spatial Analysis I: Classification

Note: This laboratory covers material in chapters 6 and 7.

Cartographic classification. Cartographic features are frequently classified so that patterns in the data can be visualized. Classification methods are statistical techniques for placing individual cases into groups called classes. Maps of polygons that are classified are called choropleth maps. Choropleth maps are frequently found in atlases and newspapers where they are used to portray information about areas and their subsets, e.g., countries and states. You will explore several automated classification techniques in ArcView. During this lab, we will consider ways to select the best classification variable for identifying spatial patterns. We will also look at some data error problems.

Census data and spatial analysis. The U.S. Census is a major source of demographic data in the United States. Demographic data is data about the distribution of people. The census has created nested spatial units that contain increasingly smaller areas. The data that we will work with are spatial units called census tracts and are low-resolution expressions of demographic information. The highest resolution units publicly available are blockgroups which nest hierarchically inside of the tracts. Tract areas are defined by population numbers. Areas of high population density have small tracts while areas of low population tend to have large tracts. You will see these patterns as you look at the tracts surrounding the Atlanta metropolitan area.

Data Requirements for Classification

In this laboratory we will explore the data requirements for classification. Because we are classifying polygons of varying size based on their data values, we must factor out the effect of size on the classified variable. For example a very large polygon should have many more people living in it than a very small polygon. To examine the differences in population structure, we look at a variable that is a ratio of population to area called population density. It is very important to remember that only area-normalized data or other ratio data can be used in a choropleth map. Start ArcMap. Copy the data folder for this lab into your own working directory. Add the atlanta.shp layer from your working directory. This theme is composed of demographic data collected during the 1990 census. Open the table for the Atlanta layer and examine the fields that are available in this data.

Question: List all of the fields (attributes) that are available in the atlanta.shp table.

Look at the Fields labeled Pop_90, Pop_93, Pop_98, and Pop_growth.

Question: Is the Pop_growth field related to Pop_90, Pop_93, and Pop_98? What do the values in the Pop_growth field mean? How were they calculated?

Open the Properties window for the Atlanta layer and select the Symbology tab. Show the data as Graduated Colors (under Quantities -> Graduated Colors). Set the Classification Field to Pop_90 and Normalize by Sq_miles. Select a color ramp of your choice. Apply the changes, but do not close the window. Please do not use the Classify button yet - we will accept default classification.

[pic]

Question: What attribute are you mapping when you normalize population with area? What do you think might be wrong about mapping raw population values without normalization?

Question: Why do you suppose the lowest class has a negative value for Population /Area?

Since there appears to be a negative value somewhere in the Pop_90 data field which is skewing the classification we will need to do some data manipulations. Click on the Classify button. The Classification window will open. This window provides information on the data being classified, and has several options for you to modify the classification. For instance, you can change the classification method, number of classes, breaks between classes, and you can exclude certain data from classification.

[pic]

We will now remove the negative values from your classification. Click on the Exclusion button. The Data Exclusion Properties window will open. This window allows you to write logical queries in order to select particular records from your data set. Because these are logical expressions it is very important to have all of the parameters in the correct locations. Incorrect expressions cannot be processed. The query builder window contains a scrolling list of Fields on the left and a scrolling list of Values on the right. In between the lists are the automatic logical operators that are available to you. In this window, write a query to select all of the records containing a value of –99 in the Pop_90 field. To write a query you will double click on a field, select a logical operator and then double click on a value. A sample query is shown below.

[pic]

Click OK and look at the new statistics in the Classification window. When you are done, click OK and Apply the changes.

Question: Describe what effect this query has had on your classification.

Switch to layout view. Copy and paste your existing data frame so that you have a second copy (recall that we did this in lab 2). You will want to re-size the data frames so that you can see both of them at the same time.

Change the classification on the second data frame so that it is showing the Pop_90 field but is not normalized.

Question: Compare your classified population map, normalized by area, to your map that is not normalized. Which is more meaningful? Write a short analysis of the population patterns in the Atlanta urban area.

The variable that you created through classification was a population density variable. Another way to make such a variable is to create a new field and calculate the density value as a ratio of population to area. In order to do this you will need to edit the data.

Open the attribute table for the Atlanta.shp layer in your new data frame (the one that is not normalized).

Click on the Options -> Add Field button.

[pic]

Add a new field called Popden. It should be type double, with precision of 13 and scale of 6.

[pic]

Now start editing. Right click on the field header for your new Popden field. Select “Calculate Values.”

The field calculator will open. Here you will write a mathematical expression in which attribute names stand for the field values. Double click on the Field name Pop_90. Double click on the Request / (divide by). Double click on the Field name Sq_miles. Your Field Calculator should look like this:

[pic]

Select OK. Save your edits and stop editing. Now classify your data using the Symbology tab in the Properties window. Let the Classification Field be popden and select for the Normalization field.

Question: Is there a difference between the two approaches to normalizing population data by area? What would have happened if you had used a Pop_98/ Sq_miles ratio?

Cartographic Classification

We will now explore different types of classification available in ArcView. ArcView provides five approaches to classification: Natural Breaks (which is the default), Equal Interval, Quantile, and Standard Deviation. All of these classification approaches are based on the structure of a data variable. A Natural Breaks approach is based upon the histogram of a data variable with counts in the y-axis and values in the x-axis. A simple histogram appears below.

[pic]

In the histogram we can see a Natural Break about halfway along the x-axis. This classification approach assumes that humans can intuitively find such breaks in a histogram, and design classes accordingly. ArcView uses an algorithm to statistically optimize the Natural Breaks approach. This is a nice approach because it is intuitively easy to understand. It can be inappropriate when some classes in a histogram have many counts and others have few. For explanations of the other 3 classification approaches use the ArcGIS desktop help index and type in “classification.” The page on standard classification schemes is brief, but should be sufficient.

At the beginning of Exercise 1 we said that choropleth data must be ratio data. Look now at your data table for Atlanta.

Question: List all of the attributes for your data table. State whether you think each attribute is a ratio, is not a ratio, or you are not sure.

Decide on a ratio variable field from your table which you will use in this exercise. You may not use the popden field that you created.

Question: What is the name of the attribute you have chosen? How do you know that it is a ratio variable?

Now you will copy your existing data frame so that you have a total of four data frames with the same atlanta.shp data.

[pic]

Now you will classify your variable using each of the classification methods. You will print one map with the four different classifications to turn in with your lab. Make certain that you use layout, or add text to your view so that your TA or instructor can tell which classification method you used for each map.

Open the Properties window for the Atlanta layer in the first data frame. Select the Symbology tab. Click on the Classify button. The Classification window will pop up.

[pic]

Pull down the Classification Method menu, and set the method to Equal interval. Notice what has happened to the histogram in the classification window. Click OK to close the Classification window. In the Properties window, click OK to apply the changes and close the window. Label this data layer using the text tool. Be sure to include the variable being classified.

Repeat this technique in your remaining data frames for the Quantile, Natural Breaks, and Standard Deviation classification methods. Print your map to turn in.

Question: What do the histograms in the Classification window show? How do they change with the different classifications?

Question: Write a discussion of the differences and similarities between your maps. Do different approaches create different patterns? Do you see a spatial pattern in your variable? If so, what does the pattern mean? Which classification method do you like best for your variable?

Conclusion

In this lab we have explored some of the database issues related to spatial classification. We have seen that when small errors or irregularities occur in a data table, extreme errors can be produced through mathematical manipulations. In order to use mathematical or statistical methods it is very important to develop the ability to spot problems in the data, diagnose, and solve them. In addition, we experimented with creating some simple choropleth maps. Classification is a major area of cartography, and has been covered here very briefly. Further experimentation with the classification methods in ArcView can help to develop a deeper understanding of the effects of classification.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download