Nicholas M. Geyer ePortfolio - Home - Excel calculate 95th percentile

Quality Assured/Quality Control Meteorological Data Processing and Analysis

By: Nicholas M. Geyer

Intern of Arctic Region Supercomputing Center, Summer 2009

1. Abstract

The purpose of this internship is to implement quality assured/quality controlled data processing of five weather observing station networks containing data from 1978 to 2008, and to analyze the climate trends of the North Slope of Alaska by looking at the coastal and Brooks Range mountain effects on the wind fields. This was accomplished through a quality controlling/formatting program, an error analysis program for wind speed and direction, and a climatological analysis program, in that order. Further, we used the QA/QC’ed data for the WRF model verification. Finally, we used Microsoft Excel to analyze the all-time monthly mean wind speeds and the 10th, 50th, and 95th percentile wind speeds and to diagnose the trends between regional climate and topographic differences. We found that when the QA/QC’ed data was used for the WRF model verification, the model bias was decreased by 0.05-0.065 m/s, and the correlation between the model simulations and observations was increased by 0.02-0.05, from the original data for total 47 different model physics sensitivity tests conducted for the period of September 2004. From our climatological analysis, we saw that the inland plains always provided significantly greater wind speeds than the valleys on the far side of the Brooks Range. We diagnosed that Brooks Range causes a significantly greater decrease in the wind speed difference as you move from the windward side to the leeside of the range in the winter as opposed to the summer. This information should be used in consideration of where to set up civilizations, pipelines, and oceanic traffic along the North Slope of Alaska, but future studies should include running our data against the WRF simulations over longer periods while including the NARR reanalysis as well as adding more stations to verify our results.

2. Introduction (Tables, figures and acronyms are appended at the end of the paper)

The North Slope of Alaska and its nearby offshore region of the Arctic represents an area that in today’s ever growing world is now ecologically significant as well as economically important to the world’s natural resources. The decline Arctic’s natural ice cover is of significant importance in this area particularly for the development of increasing marine shipping and harvesting of natural oil supplies (ACCAP,2009). Prudhoe Bay, located on Alaska’s Beaufort Sea coast, is one of the largest oil fields in the world. New development opportunities exist along the Beaufort Sea coast and in the Chukchi Sea, as evidenced by the 2008 Oil and Gas lease sale that generated more than $2.6 billion in revenue to the U.S. A group of scientists at the University of Alaska Fairbanks (David Atkinson, Jeremy Krieger, Martha Shulski, Jing Zhang, and Xiangdong Zhang) have undertaken a significant contract to experiment with the WRF model to understand the area’s mesoscale meteorology features and long term climatological trends that have become so important to the local economy development. Using the WRF model and the NARR reanalysis, we have been able to describe the climactic trends of the North Slope. The collected observational data, as well as a newly deployed buoy into the Beaufort Sea, observe the real world change and provide a foundation for model verification. However, because of the remote location and harsh climate that the weather stations on the North Slope and its nearby offshore region experience, the data collected thus far has been skewed via weather events such as anemometer icing.

The purpose of this particular internship is to aid the current work being done, through the implementation of QA/QC routines on the data collected from five major weather observing station networks: NCDC, MMS, C-MAN, RAWS, and WERC. The specific tasks of this internship include the following:

1. Implementation of QA/QC routines on vast amounts of meteorological data

2. Spatial and graphical display of the data

3. Climatological analysis of the data

Furthermore, the data checked by the QC programs will be further implemented in the WRF model verification to check if the model bias and correlation between model simulations and QA/QC’ed data has improved from the verification with original data. We believe that the WRF model verification will have a better correlation and reduced bias if errors and missing data has been removed and accounted for by the QC programs.

We do not fully understand the coast-inland and mountain effects that the coast and Brooks Range have on the North Slope climate. Using, the WRF model output and the yearly trends from the station data will be able to describe the climatological, seasonal, annual, and monthly differences in wind speed along the North Slope depending on either regional or topographical location. The North Slope is composed of the Arctic coastline, which experiences seasonal ice cover, the Brooks Mountain Range, and an expansive tundra plain (Shulski and Wendler, 2007). We predict that because of the North Slope’s topography that wind speed will be reduced by some extent, but which of these factors contributes to the decrease in wind speed is yet to be determined.

3. Methods

3.1 Quality Control

We began the QA/QC control by looking at the data archived from the past 30 years, approximately 1978 to 2008. As previously mentioned, our study involves five major weather observing station networks: NCDC, WERC, MMS, RAWS, and C-MAN; and their station map can be seen in Figure 1. Since each station network has its own format, the first step was to use a code for easy reading between comma separated variables. Using a code named mac2unix.sh, we produced a data set for each station free of front slashes and commas separating variables.

From here, we had to tailor make FORTRAN 90 codes for each of the five network types. Each of these codes read in from a station data log of all the available station data line by line, then standardized the formatting into Metric units for further analysis. The final portion of the data’s processing in this code produced the first stage of the quality control. Our parameters for quality control were set for the following variables: temperature, dew point temperature, relative humidity, wind speed, wind direction, station pressure and sea level pressure. Each variable needed to be within a real world physical range of values. If any variable did not satisfy the quality control checks, the particular variable was denoted with an arbitrary value of -999 and the program continued. Once the line of data had been standardized and initially quality controlled it was written into a new station file that provided the basis for our second set of codes.

The second set of FORTRAN 90 codes created provided a primitive form of quality analysis by using the quality controlled data and outputting a running monthly and all time totals of botched to demonstrate which particular stations displayed abnormally high counts of botched data. The focus here was to look at wind direction and wind speed, especially if the wind speed was calm for an extended period. We placed a flag of 1 or 0 into the quality controlled lines of data, 1 for bad data, if the sequence of read data qualified in the following aspects:

1. Continuous calm wind speeds of 0 m/s

2. Temperature is below 0 oC

From this program a third statistical program was created to produce the detailed quality analysis report.

The third FORTRAN 90 program was written to perform climatological analysis of each station’s newly flagged data. This program sorted the data using an ascending quick sort as well as calculated the monthly and annual mean wind speeds, maximum all time wind speed, as well as the 5th, 10th, 50th, 95th, and 99th percentiles of wind speed for each month and all time at each station. We used equation 1 below to find the percentiles.

X=(N/100)*(n+1) (1)

Where “X” is the value of the percentile, “N” is the percent value, and “n” is the number of values in the array.

2. WRF model verification

Once finished with the statistical third program, we used the flagged data to run the model verification on the WRF simulations, which include 47 different model physics sensitivity tests for the period of September 2004. By comparing with the verification performed with the original observational data, we expect to see if QA/QC’ed data was useful for reducing the model bias and enhancing the correlation between the model simulation and observations.

3.3 Climate and location comparisons

Finally, using the created statistical data we could then analyze the various areas around the North Slope climatologically. We chose random sites from the northern coast of Alaska, northwest coast of Alaska, and on both sides of the Brooks Mountain Range. We used Microsoft Excel to produce graphs of the monthly mean wind speeds for each of the regions. To understand the differences in the climate regions we used the following graphical comparisons:

1. Northern Coast v. Northwestern Coast

2. Northern Coast v. Mountainous

3. Northwestern Coast v. Mountainous

Next, we used Excel to produce graphs of wind speed percentile values over seasons between different variations in station locations. To understand the differences we used the following comparisons:

1. Inland v. Valley

2. Windward mountainside v. Leeward mountainside

3. Coast v. Inland

From these illustrations, we deduced simple climate wind speed patterns and major differences between the regions and landforms.

4. Results

4.1 Quality Control

Table 1 shows our list of quality control checks to be used in our data quality control FORTRAN code. Using these parameters and FORTRAN 90, we produced our data quality control program that created identically formatted outputs as seen in Table 2 for all station types.

Utilizing the formatted output from the data quality control program, we read in the data into our second FORTRAN 90 source code for data error counting. Table 3 presents the source code for the flagging protocol that was written for the extended period of calm wind speeds. If there were six or more consecutive hours of error marked data, missing data, or calm wind speeds the program tabulated one additional stat point to monthly and all-time counters. The second program produced the example output text file seen in Table 4.

An appended program was then written to consolidate the wind speed, wind direction, and calm wind events for all stations in the networks. This program produced a large list from which we could diagnose whether as station was producing quality data. We found that RAWS, WERC, MMS, and C-MAN indicated no abnormally high numbers of botched wind speed data points for their stations, but we found that five NCDC network stations contain an abnormal amount of botched wind speed data. As seen in Figure 2, the numbers of errors at these five stations were much higher than other stations in the NCDC network.

Using equation 1 and the flagged quality controlled data from the output of the second program; the third FORTRAN 90 code was created to calculate statistical data. The source code allows for monthly and annual wind speed averages, monthly and all time wind speed maxima, as well as percentile data for monthly and all time wind speeds. The statistical data produced the example output as seen in Table 5.

The QA/QC’ed data was then used for the WRF model verification (Figures 3 and 4). As seen in Figure 3, when the 47 different WRF simulations were verified against the QA/QC’ed data, the correlation between model simulations and observations increases from the verification against the original data. Noticeable in Figure 4, is the model bias is lowered from the verification with the original data.

4.2 Climate Analysis

With the output created from the statistical data program, we used Microsoft Excel to perform comparisons between stations along the Northern Coast, Northwest Coast, and Mountainous Regions. Table 6 references the stations that we used. With this in mind, our first graphical representations are the all-time monthly mean wind speed for the three regions as seen in Figures 5, 6, and 7.

Figure 5 represents the Northern Coast, and as can be seen the wind speed variability tends to be rather low ranging from 4.3 m/s to 6.7 m/s across the region. Prudhoe Bay PADR 2 is shaped in such a way because of the length of time the data was drawn from, only two years of data, as opposed to others which all had five or more years of data.

Figure 6 represents the Northwestern Coast. This particular set of stations follow a trend that shows wind speeds decrease during the summer months and increase during the winter months. This is representative of the effects of sea ice on the region.

Figure 7 represents the Mountainous Land. The great variation in monthly mean wind speed from station to station is caused by either the roughness of the topography or the distance to the shoreline. Unlike the Northern and Northwestern coasts, we saw a decrease in average wind speeds during the winter and a rise in average wind speeds in the summer except the station Ambler, which is close to the west coast.

When we crossed these graphs we yielded several trends. In Figure 8, we see observed stations crossed between the Northwest coast and Mountainous regions. This figure shows that there are significantly faster wind speeds on the Northwestern coast as opposed to the Mountainous regions year round. Figure 9 reaffirms our findings as the entire Northern coast recorded significantly greater wind speeds year round when compared to the Mountainous regions. When comparing the Northern coast to the Northwestern coast, as seen in Figure 10, we found that there is no clear distinction between the two regions, but the Northwestern region did display more seasonal variability than the Northern coast probably sue to larger variation in sea ice cover in the Chuckchi Sea.

4.3 Climatic seasonal variation between land/sea and mountains

Using the percentile data from our third FORTRAN program and the same stations used for the regional analysis, we found that the climate varied greatly from place to place based on location. In Figure 11, we saw that the inland station reported a percentile value that was always greater wind speed than the valley station, with the exception of the 10th percentile. From this we can say that the long distance and mountains between the lowlands significantly slows the winds as the move across the lands.

In Figure 12, we see that the leeward side, Arctic Village, showed consistently slower wind speeds than the windward sides, Galbraith Lake, at all percentiles. The effect of the mountain range is evident in the fact that the difference in wind speed between the two sides is much greater in the winter and very small in the summer.

In Figure 13, we compared the difference in coastal, Prudhoe Bay, wind speeds and inland wind speeds, Umiat. As seen, the coastal station displayed greater wind speeds at all percentiles than the inland station, which proves that the surface roughness from the coast to the inland does play a significant impact in slowing wind as it blows across the tundra plains. The rise in difference between Umiat’s and Prudhoe Bay’s 95th percentile wind speeds during the winter shows us that the winds blowing over in from the coast are slowed more during the winter and fall seasons.

5. Discussion

5.1. Quality Assured Quality Control Errors and Blunders

Although the QA/QC’ed observational data helps to improve the quality for model verification, reducing the model bias and increasing the correlation between model simulations and observations, we cannot significantly say that the QA/QC analysis is perfect. The first major flaws are the QC parameter choices. Using general values and large ranges for such values as temperature, dew point, wind speed, etc. can have a major effect in the statistical error counts and abnormal counting events. None of these were more significant than the wind speed choice. We originally chose a parameter value that the wind speed must have a value greater than or equal to 0 m/s and less than 100 m/s. This is a very large range for wind speed. While keeping the values real the maxima of 100 m/s is too high. We discovered this after our percentile program uncovered that extraordinarily high values of wind speed in excess of 90 m/s were being found for some stations while other stations in the same region were not reporting anything nearly that great. To correct the situation we lowered our maxima to 50 m/s, a lesser but more feasible value for the surface of Earth. From this, we found that our findings made much more sense for the climate and not only should the negative unreal winds should be quality checked but the unphysical faster winds should also be checked.

Our second flaw occurred during the flagging process. Originally, the duty was to design flagging protocol that counted the number of consecutive calm wind events and when the count reached six or greater the flag was placed on the data line. There was a misinterpretation in the task and the program was written as the way seen in Table 2, which counts to six or greater and places a flag at the sixth data point and all other from then on, but not the first five. To correct this an algorithm was made to check to see if a data point contained a flag, from this it would flag the preceding five points, which corrects the flagging error.

The last error in our QC programming may have occurred in percentile calculation. When, dealing with mass amounts of data such the arrays used would become very large and very full. The original percentile program was able to account for all of these points through a counting system in another monthly counting array. When the program was finished with its circulation through a station’s data, it should have cleared the entire data point array as well as all other data arrays providing a clean slate for the next station. However, our original percentile results were skewed because the counts held in our monthly counting arrays did not clear all of the data, usually by 1 or 2 elements. This did not have an effect on annual averages or all-time monthly averages, but it did make a significant effect on monthly maxima and all time maxima as well as the percentile calculation. To correct this, we increased the total amount of values in each monthly count by 10 to totally clear all elements of the array as well as increased the monthly count’s value by 1 during percentile calculation. The addition of 1 does not make a difference in the percentile calculation since the location will only decrease by 1 or 2 elements. The vast amount of data in an ascending order array provided multiple elements of the same value, and the value would still be essentially the same. This adjustment did help correct the percentile problem for all stations, but a couple of the NCDC stations did display the maxima errors that should have been cleared.

5.2 Climate Analysis and Topographical Variations

After the correction of our QA/QC programming, the climatological analysis of the North Slope completely depended on the visual analysis of our team. Our Northern coast display as seen in Figure 5 makes sense. The minor decreases and increases of an otherwise steadily oscillating monthly pattern is common for coastal regions such as this. The influence of sea-ice causes the rise in the winter while the open ocean water roughness should be the cause of the decrease in the summer. The same pattern was observed for the Northwestern coast, which is in nearly the same climatic situation, so this makes sense from a climate perspective.

When looking at the mountains and Figure 12, the increase in wind speed during the summer makes sense as well. There is a low capping inversion during the winter on the North Slope, especially in the mountains, this coupled with well below freezing temperatures causes the temperature gradient to be low so wind speed decreases. In the summer, however, the climate changes to a much more variant and warmer pattern around the mountains meaning faster winds. Additionally, traditional southern winds in the summer off the coast and a northern wind in the winter allows for greater wind speed from on each side of the mountains depending on season.

Our findings on the effects of the coast-inland winds also make sense, but it may not hold true for the entire area. The two trends from the coast and mountains still apply for the set up between the coast and inland, respectively. Figure 13, shows us those particular patterns, but only of a northern coast to inland plains wind pattern. We would need to take a more comprehensive look at the entire area with the inclusion of coast-inland breezes from just off the northwest coast and a look at the coast-inland breeze effects between the Northeastern corner of the Brooks Range and the Beaufort Sea.

6. Conclusion

The QA/QC analyses of our station network data have presented us with significant findings about the climate of the North Slope of Alaska. We can see if we are to make this area habitable for the surrounding area with the increase in commercial traffic the climate must be taken into account. The data has given us a look at the wind field features from the northern coast to the plain inland of the North Slope. From this we can say that sea ice wind can be quite a factor if we are to expand to a civil life beyond oil requisition along the coastline.

From our data, we have seen that the leeside of a mountain range tends to have slower winds than the windward side. From this we can suggest that the new natural gas pipeline should be built on the leeside of the mountains. This may be unavoidable due to permafrost, snow depth, and wildlife impact, but from an engineering perspective the more stability the better and slower winds make the difference on a big above ground pipeline.

Oceanic shipping is also impacted by our findings. The fact that the coastal winds tend to be lower during the summer and the sea ice is also much further North provides the best opportunity to transport goods and supplies from our newly navigable ocean. However, once the sea ice is gone due to anticipated rises in global temperature, our results may not hold true from this climatic study.

The future of this study on the climate of the North Slope should continue with additional model verifications. First, we should use the WRF to simulate real time physical situations and crossing this data with that of the actual station data. The model bias and correlation are good for current WRF outputs, but adjustments to both additional QA/QC experiments as well as additional WRF tuning can result in better correlations and biases. This will yield better understanding of the North Slope climate. In addition, we need to cross our station data with that of the NARR reanalysis, which can provide a detailed look at the progression of the climate over the last 30 years.

7. Acknowledgements

ACCAP, (2009). Alaska Weather and Climate Highlights. Retrieved August 14, 2009, from Alaska Center for Climate Assessment and Policy Web site:

Google, (2009, August 13). Google Earth. Retrieved August 14, 2009, from Google Earth Web site:

Hunter, M. (Ed.). (1971). Climate of the North Slope Alaska. In NOAA Technical Memorandum (1971 ed., Anchorage, AK:

Shulski, M, & Wendler, G (2007). The Climate of Alaska.Fairbanks, AK: University of Alaska Press.

TABLES

Table 1. Weather data quality control check parameters

|1. Temperature >= -100 oC and Temperature = -100 oC and Dew Point = 0 m/s and Wind Speed = 0o and Wind Direction = 990 hPa or mb and Station Pressure = 0% and Relative Humidity =6) then

flag=1

else

flag=0

endif

else

spdoc=0

set5=0

flag=0

endif

Table 4. Example output from station data statistical analysis program. (C-MAN: Prudhoe Bay)

| |Janu |

|NORTHWESTERN COAST |Point Lay, Point Lay LIZ, Point Hope, Wainwright |

|MOUNTAIN LANDS |Ambler, Umiat, Galbraith Lake, Arctic Village, Central |

FIGURES

Figure 1. Map of Study Area. Shown are NCDC=Red, WERC=Blue, RAWS=Green, C-Man=Light Blue

[pic]

Figure 2. Severely errored NCDC stations v. Moderate and Low errored NCDC stations.

[pic]

Figure 3. WRF model verification: correlation comparisons between QA/QC’ed and original observational data.

[pic]

Figure 4. WRF model verification: model bias comparisons between QA/QC’ed and original observational data.

[pic]

Figure 5. All-time monthly wind speed averages for the Northern Coast.

[pic]

Figure 6. All-time monthly wind speed averages for the Northernwestern Coast.

[pic]

Figure 7. All-time monthly wind speed averages for the Mountainous Regions.

[pic]

Figure 8. Northwestern coast all-time average wind speeds v. Mountainous all-time average wind speeds.

[pic]

Figure 9. Northern coast all-time average wind speeds v. Mountainous all-time average wind speeds.

[pic]

Figure 10. Northwestern coast all-time average wind speeds v. Northern coast all-time average wind speeds.

[pic]

Figure 11. Seasonal inland wind speeds v. valley recorded wind speeds. Inland winds represented by Umiat. Valley winds represented by Central.

[pic]

Figure 12. Seasonal “windward” mountain side wind speeds v. “Lee” mountain side wind speeds. Windward station represented by Galbraith Lake. Leeward side represented by Arctic Village.

[pic]

Figure 13. Seasonal coastal wind speeds v. inland wind speeds. Coastal is represented by Prudhoe Bay. Inland is represented by Umiat.

[pic]

Acronyms

(In order of appearance)

WRF- Weather Research and Forcasting Model

NARR- North American Regional Reanalysis

QA/QC- Quality Assured/Quality Control

NCDC- National Climatic Data Center

MMS- Microscale Meteorology Section

C-MAN- Coastal-Marine Automated Network

RAWS- Remote Automatic Weather Station

WERC- World Environment and Resources Council

ACCAP – Alaska Center for Climate Assesment and Policy

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Nicholas M. Geyer ePortfolio - Home

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches