Microsoft



American Community Survey Data (ACS)

The Florida House of Representatives provides Census American Community Survey (ACS) data within its online redistricting tool, called MyDistrictBuilder, found at floridaredistricting.. MyDistrictBulder allows anyone with a computer attached to the internet to draw new districts and submit them to the legislature. The ACS data is the result of 5 years of surveys collected between January 1, 2005 and December 31, 2009 by the U.S. Census Bureau’s long form. The long form was used by the Census Bureau to obtain detailed social and economic information from households.

The ACS data that is on the House’s MyDistrictBuilder website has over 14,000 demographic values that will enable the House to identify compact communities of interest. The goal in providing the ACS data is to allow the public, and eventually the legislature, to have as much information as possible while drawing districts. By enabling the drawing of districts with this information the needs of the communities being represented are more likely to be met. With this data we will be able to identify such things as language minorities, poverty, transportation, education, veteran and senior citizen needs.

The ACS data is not as detailed as the census redistricting Public Law (PL) data. The PL data is reported down to the census block level (often this is a city block), whereas the ACS data is reported at the block group and tract level. In order to give you a sense of the level of aggregation involved consider that the average population size of a block group and tract in the 2009 Florida census data was 2,000 and 5,800 respectively. Still, with this new data the public as well as the legislature will be able to identify many compact groups with similar needs as new legislative districts are drawn.

How did we process this ACS data?

To start, we downloaded the data that can be found on the census website at . The House staff then assembled the data into a number of Microsoft Access Databases using the Census 2005-2009_SummaryFileXLS, from the same census site, to get our data structure. Next, the migration tool that comes with MYSQL was used to load this data into a MYSQL database as over 100 flat file type tables with the structure and column names being supplied by the Census ‘SEQ’ files.

Next, we had to fit this data into the current structure of the MyDistrictBuilder application. MyDistrictBuilder uses a dynamic data model for all data. This model is made-up of three tables. One is the geography Regions table, the second is the Data table, and the third is the Dictionary table. A brief description of the purpose and structure tables is as follows:

• Geography: The geography table has a unique identifier which we call a regionID number. This is a unique number across all of the levels of census geography (Bock, Block Group, Tract, County and State). The other column in this file is the geography description. Basically this is a string of the latitude and longitude of the points that make up each census shape.

• Data: This table has three fields. The first field is the RegionID field that points to the Geography on the map, the second field is the DictionaryID field that points the description of the data. The third field is a text field that allows us to store any kind of data. It is basically unlimited in size. The RegionID and DictionaryID make a unique identifier for the data file.

• Dictionary: This is the table that each “DictionaryID” in the “Data” table points to. It tells us what kind of data the DictionaryID refers to. Additionally, it gives what name to use on column headings or ‘mouse overs’ to describe the data in detail. Anything we need to know about the data is stored in this file. A typical file will look something like below.

[pic]

The next stage was to take the Census ‘ACS2009_5-Year_TableShells’ file and import it into our dictionary file. We then created a new column we called header. This was a combination of the ‘Unique ID’ and underline character to separate it from the ‘line’ column. You can see this row in the example above. As we processed through the data we discovered that we needed to identify the heading and the subheading fields. To identify them with the data items that go with we appended ‘00A’ and ‘00B’ to delineate a heading and subheading field in the final report. The census supplied a category field to group the major sections together. We should of allowed a record for this value in our list of id’s as well. We then gave each line a unique dictionary ID number starting at 150,000. We started at 150,000 because in our system I am trying to keep different data sources divided into different number ranges.

The next task was to create the Geography look-up table. To do this we went to ‘’ and downloaded the Florida geography file. This file contains the columns LOGRECNO and GEOID. From the GEOID column we were able to match the RegionID number from our system. We put this information together so we could tie the census data together with our own GIS data. We then added the file created to our MYSQL database and called it ‘acsgeo’.

We then wrote a PHP program to take the 100 plus Census flat files and put the data into our data structure. The PHP program essentially opens each of the 100+ ‘SEQ’ files and utilizes a PHP function mysql_fetch_field to get the name of the column headings. These heading are then matched into our dictionary file to get the corresponding dictionary identification number. Next, the PHP program took the LOGRECNO on each record and matched this into the ‘acsgeo’ file to get the RegionID number. These 2 numbers and the data value in each cell are then added to the acs5yr2009 table. This table had almost 57 million records when we were done, even after we excluded the ACS data IMUTATION, and values that were averages. These were taken out as they do not lend themselves to creating new numbers as we build new districts.

Next we had to figure out how to take the data reported at the tract level to use it at the block group level so that we had only block group values to use in our program. To do this we took the ACS total population count for block groups and divided it by the tract total population value. This percentage was then multiplied by the tract value for each tract value to get the block groups portion of each number.

The next step was to develop an interpolation file to tell us what percentage of each geography type was to be included in order to make the total for all the different categories. For this we utilized a process known as aerial interpolation. This process essentially takes the area of source geographies and compares them to the size of target geography areas. County, tract and block group were 100% as they were always as big as or bigger than the smallest size at which the data was aggregated. However, for the census blocks where there was not a matching total population count that matched on both the block and block group level we took the area in acres of the blocks and divided this by the area of the block group and this percent of each block group was interpolated to the block. We ended up with a table like you see to the left.

To be able to create a report for any district then we could utilize the RegionID’s or the list of census counties, tracts, block groups and blocks and multiply this by the percentage in the interpolate file to get the number to show for each data value in the ACS table. As a district can be made up of up to around 15,000 UOGID RegionID parts and this is summed up for over 14,000 ACS categories this query can take almost a minute for just one district.

After this is done we get a report that allows anyone with a internet connection to select an area as a district and then print up a report that gives very detailed demographic on that area. Go to MyDistrictBuilder and build a district, save the result and then launch the report function to see the demographics of your new district.

APPANDIX I – PHP program to process Census ACS data

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download