jenningsplanet

Scripting a data export workflow using Collector

Craig Mueller

Summary

Over the last half year I have migrated my work team at the Department of Conservation Abandoned Mine Lands Program to using ESRI’s Collector app for iOS for field data collection from their previous use of Trimble Terrasync. Our program regularly gets contracts to inventory abandoned mines for hazards around the state which makes for a lot of field work. We are using a legacy Access Database for data storage and analysis, leading to the need for data conversion and processing to make our new Collector field data compatible with the previously established import routine in the database. For this project I created a series of 6 separate Python scripts and an operating manual for them to be run as tools within ArcMap by my team to shepherd data from our field devices into Access while ensuring data integrity along the way. These scripts and the move to using Collector are saving many hours of data manipulation and entry compared to our previous workflow.

Purpose

This spring I have been working on finalizing the AMLP field data collection to database workflow using ESRI Collector. During the fall and winter I created templates and documentation to facilitate the use of the app for our field data collection purposes, designing the template to closely match our previous Trimble Terrasync data dictionary. An IT associate helped modify a previous script to create an export of the field data to prepare it for import into our Access database, however after she left our department I was left to manage the script and data conversion. I started by largely reworking her script to fix several errors (non syntax errors, but data integrity issues) and modified the script to as I modified the Collector template. Taking over as the steward of our process gave me the freedom to get creative and solve any problems and inefficiencies I saw in the workflow myself (my favorite thing to do at work is problem solve). The purpose of this project was to create all of the Python scripts needed to simplify and speed up the data processing required to get our data ready for input into our database, while increasing the quality of the data at the same time.

o I had to create several scripts to first convert the runtime geodatabases that can be exported from the devices using iTunes, and second merge the file geodatabases created by the previous step into one data file to make it match what would be exported from ArcGIS Online if that method were to work. I then realized that we could do a great deal of QA of the data before it enters the database automatically using Python so I created a QA analysis script. We have a field collection tracking spreadsheet where we record the number of new sites, mine features, and TOMS (topographically occurring mine symbols) we visit each day. I determined I could largely automate the counting of these using Python and GIS where we were manually determining it before. I also decided to create a script to simplify the export of the data into an Excel spreadsheet for import into our database (I know it is all very archaic sounding, I’m now working on getting us migrated to SQL Server). My main goal in all of this was to simplify and speed up the data processing required to get our data ready for input into our database, while increasing the quality of the data at the same time.

Description

In this section I will first overview the workflow again, and then describe each script individually.

The data export procedure for our Collector workflow became more complicated due to having regular sync issues with ArcGIS Online where our feature services are hosted. Our field work is largely in remote areas of the state where data connectivity is highly limited to unavailable. We opted to purchase devices without data plans given that we would largely not be able to use the connection anyways, and instead opted to use Collector’s offline capability. At the end of a field trip we have frequently come home with 2-3 gigabytes of data per device to be synced which has been very hit or miss. Because of this I had to create several scripts to first convert the runtime geodatabases that can be exported from the devices using iTunes, and second merge the file geodatabases created by the previous step into one data file to make it match what would be exported from ArcGIS Online if that method was to work.

I then realized that we could do a great deal of QA on the data before it enters the database automatically using Python so I created a QA analysis script. We also have a field collection tracking spreadsheet where we record the number of new sites, mine features, and TOMS (topographically occurring mine symbols) we visit each day. I determined I could largely automate the counting of these using Python and GIS where we were manually determining it before. I also decided to create a script to simplify the export of the data into an Excel spreadsheet for import into our database (I know it is all very archaic sounding, I’m now working on getting us migrated to SQL Server). All of these scripts are run as Tools in ArcMap.

The finalized workflow is diagrammed below:

[pic]

Script 1 Runtime GDB Converter:

The first tool in my workflow converts the *.geodatabase files pulled from our devices in the event of an unsuccessful sync. This tool is only necessary when the sync fails, and is outlined in 2.2 on the diagram (2.2 is the corresponding section in the processing manual I wrote). Our office is currently still running ArcGIS 10.2.2 apart from my machine and one in a common area running 10.4 so the tool can only be run on those due to the “Copy Runtime Geodatabase to File Geodatabase” tool it requires only being made available in ArcGIS 10.3.

This tool requires as input only the *.geodatabase files from each phone, and the output of the script uses the location of the files to define the output since I have a set directory structure and the runtime geodatabase files are located in the Backups folder (I accomplished this by creating a function I use elsewhere called lvl_down that drills down a directory using os.path.split) The tool allows for inputting as many files as exist.

[pic]

The tool runs through the inputs in a for loop and first moves the runtime GDB to a temp folder created on the desktop using shutil.copy and getpass.getuser (this is due to a problem with the tool not running when accessing data on a network share). Then runs the CopyRuntimeGdbToFileGdb_conversion tool outputting a new file GDB that is simply named incrementally from 1 (e.g. 1.gdb, 2.gdb) as these are intermediates of the next tool they don’t need any specific name. These are output back into the Backups folder using the lvl_down function. Once the for loop is done it outputs a message saying how many geodatabases have been created.

Script 2 GDB Merger:

The second script merges all of the previously created 1,2,3 etc GDBs into a single GDB and ties the workflow back into the path if Collector synced the data properly. The output GDB is automatically named using the name of the overarching project folder which describes the field work (e.g. 2016.05.01.DPT.CBM.PintoMountains) using a new last_lvl function also utilizing os.path.split. Because periods don’t work in folder names I use .replace to swap periods for underscores. The tool requires only the input of the temporary GDBs and again relies on the set directory structure to back out from the Backups folder, then into the GIS folder using the lvl_down function. The tool relies heavily on code I found online at written by Ben Nadler

[pic]

First the tool uses GetParameterAsText and the parameter is defined as file, multiple input. That input is split using .split(“;”) to create a list and the first GDB is arbitrarily chosen as the destination for all other GDBs to be inserted into. Lastly in preparation editor tracking is turned off on the destination so the tracking fields are not affected by the merge. All of the functions required for Ben’s code are defined (I promise I understand them I’m skipping their function for brevity’s sake, the bulk of what they do is append the data from one GDB into another while preserving global IDs so that the attachment table and relationship don’t break). Using a for loop I run through all of the GDBs using Ben’s merge script to merge each in the list after the first into the first writing a message in the script window saying how many have been merged.

I found the resulting GDB to be incredibly slow to work with, and running compact on it didn’t help, but after experimentation I discovered that exporting data and schema to xml and reimporting that into another empty GDB fixed the speed issue. Once all GDBs have been merged into the first of the inputs, I then create an empty GDB in the GIS folder using the name I created using the project folder and .replace and export out the XML of the merged GDB and import it into the new blank one. This tool takes a while to run but the time and rage saved by doing the export/import makes it worth it, as does the fact that it only requires a few seconds to get running (I’m loving automation).

Script 3 Site Accounting:

As discussed before, we manually counted the number of new mine features inventoried as well as TOMS visited daily, which required cumbersome queries and led to a lot of error so I opted to automate this process. I created another script and tool in ArcMap that only requires your Collector feature class in the new GDB made by the last tool, as well as the start and end date of the trip to confine the edit dates on the TOMS layer (editor tracking is enabled on it).

[pic]

Besides the user defined parameters the script the script also uses a template layer file I created to symbolize the data by date, this helps visualize the work by day. The script first creates a temp in_memory feature class for the Collector data input and converts the created_date field to time in PST from UTC. This is done because Collector must collect in UTC for universality. A layer is then created using the template symbology and added to the map. This process is repeated for both TOMS and another dataset called ExtraFeatures which is also used to guide field workers to features (it is a point feature class of features interpreted from aerial imagery). Lastly the script makes a report in the process window of the number of new features collected by date and the number of TOMS visited by date. This is accomplished using a searchcursor to make a list of dates in the date range then using set() to find only the unique values, and sorts them using sorted(). It then uses .count() to count all values in the original cursor defined list that meet each date and prints that number in the message (with my extra added goofy if statement to change the plurality of words if the count is not 1. This saves a lot of time and error.

Script 4 Collector QA generator:

The next script in the workflow was created because I got the idea to check through things in bulk before moving them to the database. After attempting to find several common errors in a whole data table I determined it was far easier if I made definition queries related to the errors and cleaned the data up from there. I created this script to create layers based on the Collector feature class as the only user input conditionally if the data has the error, if not no layer is created. It also limits the fields in the layer to only those needed to solve the error. I also create a layer symbolized on the field Access which we fill in according to how easy it is to access the feature. This is done to allow an easy last sanity check on the results because they should change in value in bands of distance from roads. The QA analyses are as follows: check for null hazard, check for null access, check for null GPSPerson/NoteTaker, check for openings with no aspect, check for openings without bat rank, check for waste piles with no color, check for reversed xy dimensions, check for features with missing dimensions, check for sites without site points, check for features outside of mine boundaries, check for site points outside of mine boundaries, check for features without attachments, check for features that are too close together.

Mine site boundary polygons and previously inventoried features in our database are used as fixed inputs for several analyses and several template layers are also defined. After the access visualization QA the majority of the scripts are set up to be as modular as possible. The form is defined test_lyr names, defined queries, and defined field visibility for the MakeFeatureLayer_management tool. Each QA then uses a SelectByAttribute_management and the defined query to select erroneous points, if the count of that selection is greater than one than a layer is created using the previously defined name and field visibility, and a template symbology. Another variable “total_errors” increases with every error found for the sake of harassing the user at the end. And another variable “error_count” counts the errors present in that analysis and a unique message is printed with the number of errors using .format(). I set this up to be relatively copy and paste for all attribute based QA analyses but several spatial ones required customization beyond the norm and one required using AddJoin_management to join the attachment table to the feature class and IS NULL in the query to find records that don’t have any attachments (hard to fix back in office, but I wanted to call attention if field users were not taking pictures which is a requirement). The spatial QA analyses required the use of SelectLayerByLocation_management for several and GenerateNearTable_analysis to find features that are too close together. The script window produces a report for each analysis, then at the bottom a total count of errors and a encouraging message stored in a list chosen at random using randint.

[pic]

Script 5 Collector PExport generator:

This script creates what our program called PExport when using the previous workflow, which is a feature class with all necessary extra fields prepared for database import. Since Access is not a spatial database we hard code the X and Y of every point into separate fields at this stage and also record the topo quad location as well as PLSS information (township, range, section, subsection etc) and county. A lot of this process is soon to be deprecated when we move our database to SQL Server. This tool also creates a unique ID for the database we call GIS_ID and converts the time to PST, and lastly exports all of the photos in the Attachments table into jpg files with unique names based on the GIS_ID of their respective feature. Some of the script was created by Kit Lai, a previous IT associate of our department but I rewrote large amounts (though not the general structure yet which is why it’s a little different than my others).

This tool checks out the spatial analyst extension for use of the ExtractValuesToPoints tool (described later) and requires the input of the Collector feature class as a parameter. The tool first uses if statements to check whether several fields are fully entered, if not they will cause errors soon after. Next the global ID an Object ID are made permanent using CalculateField_management and then the created_date field is converted to PST and made permanent in ActualDate field. Next the elevation of each point is determined using ExtractValuesToPoints on a DEM input since Collector only collects X and Y and then the elevation is converted to feet.

The rest of the script is behind another if statement verifying that there are features in the input. A series of SpatialJoin_analysis statements adds the location attribute data and creates the final PExport feature class (which then will have other fields added to it). Next the GIS_ID is created using GPS_Person and components of the date and several if statements (simplifying a lot). X and Y fields are now added and Lat and Long as well using a CalculateField_management.

The next step is to output the photos in the attachments table. A Boolean parameter allows for this part of the script not to be run if the user has already output the photos (if the tool must be re-run). If the box is not checked then it checks for photos, if there are any it joins the attachments table to the feature class then uses a search cursor to write the photo to the photos folder (found using lvl_down) using open and .write(attachment.tobytes()). The script uses the GIS_ID and attachment name (photo_1 etc) as final file naming scheme.

Script 6 Excel Exporter:

The last script in my workflow is the Excel Exporter. I created this as an easier way to output the table for a given site for import into the Access database. Previously we selected all features of a given site, then exported that selection as DBF and opened it in excel to convert to XLS. This was inefficient and the fact we were previously using shapefile and DBFs meant that no field could be longer than 256 characters (which was a hindrance because we write field notes now in Collector which frequently are 500 or more characters and necessitated several redundant fields). Now the tool easily exports straight to XLS, but also prints out the GIS_ID of the site in the script window (which is used for entry into the database) and also opens the spreadsheet in Excel so that spellcheck can be run on all of the descriptions at once before they are imported.

This script has three parameters: The PExport created previously, a feature dataset set with schema from a polygon layer so that the user can draw a polygon around the site quickly, and the desired name of the spreadsheet.

[pic]

The tool first clips the points using Clip_analysis and the polygon drawn by the user. It then exports this clip to a spreadsheet using TableToExcel_conversion a output name created using the input text in a previously defined variable. Lastly the tool uses a search cursor to find the GIS_ID of the site point using the query Feature_Ty = Site and prints it in the script window and uses os.system() to open the spreadsheet.

Challenges:

This project took a lot of learning and googling along the way. The simplest thing I battled (and continue to battle) is properly escaping “\”. Some tools would result in double “\\” in the output but I eventually have gotten everything to run. I have also run into issues with not being able to delete intermediates and haven’t found a resolution. ArcMap seems to be holding on to the .geodatabase files and file GDBs and Delete_management can’t get rid of them because of the locks. I used PyScripter since I’m doing these for work purposes as well and need to be as efficient as I can be and that significantly reduced indent and spelling errors. My abilities came a long way this semester since I only learned Python starting with CodeAcademy just before the semester began.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches