During the course of my lifetime I have collected a large ...



Introduction

During the course of my lifetime I have collected a large amount of anecdotal information about wave heights in the Atlantic Ocean off Long Island. This experience has led me to believe that ocean wave heights at this location are the result of two types of factors. The first is local weather conditions. The second factor that influences wave heights is weather systems in the Atlantic Ocean that are geographically removed from the site. It is unclear how local conditions and remote weather react with one another. In some cases the two forces nullify one another. In other circumstances, however, local weather patterns can increase the effect of distant systems. I am interested in constructing a model that can explain how these two general factors interact to cause waves in the Atlantic Ocean.

In order to construct this model I selected two types of variables, both of which were gathered from buoys deployed by the National Oceanic and Atmospheric Administration. The first represented local weather readings for the Long Island area, 33 miles East of Islip, NY. [1] They were:

1. Wave Heights readings at the Long Island NOAA Buoy (# 44025) for 1997

2. Wind Speed readings at the Long Island NOAA Buoy for 1997

3. Barometer readings at the Long Island NOAA Buoy for 1997

The second type of data represents the influence of distant weather systems. These data are:

4. Wave Heights readings at the Hatteras NOAA Buoy (#41002) for 1997

5. Wave Heights readings at the George’s Bank NOAA Buoy (#44011) for 1997

I chose this second group of data for two reasons. First, wave heights aggregate many variables into one, thereby simplifying the analysis. Second, these data can provide information about weather patterns in distant parts of the Atlantic, because most waves created by remote disturbances must pass through these points on their way to the Long Island buoy. In order to adjust for the time that it takes waves to travel, I allowed for a lag of two days between these readings and the readings from the Long Island buoy. (i.e.: The reading noted for the Hatteras buoy at 6:00am, January 3 was actually recorded at 6:00am January 1.)

There are two possible elements that may introduce some inaccuracy in these data. First, because I was unable to include a buoy station that was due East of the Long Island buoy the influence of some remote weather patterns may be misinterpreted or overlooked. Second, the aggregation of many variables into the factor “Wave height” is a process that is not well understood, so it is not clear how well these data actually represent distant weather systems in the model. Nevertheless, I felt that these choices provide the most predicative power possible given the available resources.

For all of the data, I chose readings that were taken at 6 hour intervals (12:00am, 6:00am, 12:00pm, 6:00pm) over the course of every day in 1997. This data is archived for each buoy and available via the station’s web site.[2]

Preliminary Analysis

Local Predictors

1) Wave Height at the Long Island Buoy (LIWVHT)

The first variable that I chose to examine is the wave heights at the Long Island Buoy. This is the variable for which I am trying to construct a model. The distribution of wave heights at the Long Island buoy in 1997 was unimodal with a center slightly below 1 meter. The majority of the readings were below 2 meters. The data are skewed, however, with a long right tail extending out to the maximum reading of 5.22m from 11/25/97 at 12:00am. This skewedness may be due to the fact that a reading below 0 is impossible. In addition, the likelihood of readings around zero is also small, because there is always some measurable wave action in the ocean. The minimum reading was 0.22m taken on at 6:00am on September 4.

It is interesting note that there are clusters of unusually high readings. This is consistent with expectations for severe storms, which tend to last for relatively short periods but are associated with high seas. This phenomenon might also be responsible for the skewedness of the data. There also seems to be less variance towards the late spring and early summer months.

The skewedness of the histogram suggests a logarithmic transformation. The mean of the logged wave height variable is 0.04536, which translates into a geometric mean height of 1.11m. The distribution of the logarithm of the wave heights appears to be extremely close to Gaussian.

2) Barometric Pressure at Long Island Buoy (Barometer)

The readings for barometric pressure at the buoy were unimodal with a center around 1015 millibars. The data are skewed with a left tail that extends to the minimum reading of 981.7mb taken on 10/2/97 at 6:00pm. There is a slight inverse relationship between barometric pressure and wave height at the buoy. This pattern is even more prevalent when comparing barometric pressures to the logged values for wave heights. This is consistent with what I expected, since low barometer readings signify stormy local weather conditions.

A probability plot comparing the distribution of barometric pressure to the Normal suggests that readings which are below 1000mb represent data points that merit further investigation. An examination of barometer readings of less than 1000mb shows that the wave heights associated with such readings are significantly higher than those on other days.

Descriptive Statistics

Variable LOWPRESS N N* Mean Median TrMean

LI WVHT 0 1336 8 1.2267 1.0650 1.1709

1 56 0 2.470 2.295 2.439

* 0 60 * * *

Variable LOWPRESS StDev SE Mean Minimum Maximum Q1

LI WVHT 0 0.6604 0.0181 0.2200 4.4900 0.7600

1 1.029 0.137 0.490 5.220 1.823

* * * * * *

Variable LOWPRESS Q3

LI WVHT 0 1.5200

1 2.967

* *

The mean wave height for readings when the barometer read below 1000mb was 2.47m and the median wave height for these same reading times was 2.295. When the barometer did not read below 1000mb the mean wave height was 1.2267m and the median was 1.0650m. In addition, the IQR of wave height associated with the lower pressure readings was 1.14m. This was higher than the IQR associated with other barometric readings, which was .76m. There was also a difference in the overall range of the two sets of data. The maximum height associated with low-pressure readings was approximately 16% greater that the maximum height for days above 1000mb. However, there were a number of wave heights recorded under barometric readings above 1000mb that were as high as those recorded under the lower readings.

Once again examining these groups against the logged LI wave heights makes these distinctions even more apparent.

3) Wind Speed (WSP)

The wind speed data from the Long Island Buoy are unimodal with a center around 7mph. The data exhibit a slight right tail, which may be due to the fact that no reading below zero is possible. A logarithmic transformation does not seem to help in this instance. There seems to be no relationship between the wind speeds and wave heights associate with identical reading times. In addition, graphs plotting wave heights against wind speeds from previous days suggest that that there is not any time lag relationship between these data either.

Remote predictors

4) Wave Heights at the Hatteras Buoy @ “T-2 days”

The data for wave heights at the Hatteras buoy is unimodal with a long right tail extending out to the maximum reading of 6.66m on 4/3/97 at 6:00am. As with the Long Island wave heights, it is helpful to perform a logarithmic transformation. The logged data appear roughly normal with a mean of 0.20480, which corresponds to an average wave height of 1.6m.

There does not seem to be any noticeable relationship between the wave heights at the Hatteras buoy and those at the Long Island buoy, even when both are examined on the log scale. However, the trend of unusually high readings, which come in clusters holds true for these data too. In fact, it seems that there are more groups of high readings for the Hatteras buoy. This could suggest that Hatteras area is more susceptible to storms than the Long Island area. There also seems to be changing variance with these data, but the lull occurs later in the year.

5) Wave Height at George’s Bank Buoy @ “T-2 Days”

These data are unimodal and have a long right tail extending to the maximum reading of 8.17 m taken at 12:00am on March 2. A logarithmic transformation yields a data set that is roughly normally distributed with a mean of 0.24521, which corresponds to a wave height of 1.76m. As with the other wave height data, there appear to be clusters of abnormally high readings that are responsible for the skewedness of the data set. George’s Bank also seems to be more susceptible to sever weather than Long Island. Once again we see a lull in variance, but in this case it is slightly earlier in the year than one observed at either Long Island or Hatteras.

[pic]

[pic]

There seems to be little relation between the wave heights at the George’s Bank Buoy and the wave heights at the Long Island Buoy.

Regression Analysis

The initial model I ran included five variable factors: Barometer, WSP, LOWPRESS, LOGHAT, LOGGRGE. The results were as follows:

General Linear Model

Factor Type Levels Values

LOWPRESS random 2 0 1

Analysis of Variance for LOG LIWV, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

LogGRGE- 1 0.6007 1.0807 1.0807 26.85 0.000

LogHAT-2 1 0.7718 0.5007 0.5007 12.44 0.000

BAROMETE 1 12.7305 8.3531 8.3531 207.50 0.000

WSP 1 0.4810 0.4409 0.4409 10.95 0.001

LOWPRESS 1 0.1463 0.1463 0.1463 3.63 0.057

Error 1208 48.6304 48.6304 0.0403

Total 1213 63.3607

Term Coef StDev T P

Constant 12.0910 0.8312 14.55 0.000

LogGRGE- 0.12052 0.02326 5.18 0.000

LogHAT-2 0.10692 0.03032 3.53 0.000

BAROMETE -0.011924 0.000828 -14.40 0.000

WSP 0.005799 0.001752 3.31 0.001

LOWPRESS

0 -0.03312 0.01738 -1.91 0.057

The F statistic is 73.1, which is extremely high and statistically significant at (5,() degrees of freedom. The calculated r2 is approximately 23.25%, which suggests that this is not an extremely useful model. However, a plot of residuals versus order shows that there is a clear pattern in the variance. The middle of the graph, which roughly corresponds to the summer, has significantly less variance than the two ends. This is most likely due to the fact that there are relatively fewer storms, either remote or local, during these summer months. This suggests that the time of year (i.e.: month) may have significant predictive power.

When a variable that accounts for month is added to the model the p-values for the variables “WSP,” “LogHAT-2,” and “LogGrge-2” rise to 0.430, 0.473and 0.697, respectively. This suggests that they do not add much power to the model. This is consistent with my preliminary findings.

I re-ran the analysis without these three factors. The resulting analysis follows:

General Linear Model

Factor Type Levels Values

MM random 12 1 2 3 4 5 6 7 8 9 10 11 12

LOWPRESS random 2 LOW NOT LOW

Analysis of Variance for logliwvh, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

MM 11 8.7143 10.8145 0.9831 28.11 0.000

LOWPRESS 1 4.7215 0.0073 0.0073 0.21 0.649

BAROMETE 1 10.9236 10.9236 10.9236 312.38 0.000

Error 1378 48.1865 48.1865 0.0350

Total 1391 72.5458

Term Coef StDev T P

Constant 13.5598 0.7569 17.91 0.000

MM

1 0.10683 0.01643 6.50 0.000

2 0.15186 0.01715 8.86 0.000

3 -0.04639 0.01640 -2.83 0.005

4 -0.02157 0.01672 -1.29 0.197

5 -0.12299 0.01627 -7.56 0.000

6 -0.12418 0.01644 -7.56 0.000

7 -0.11580 0.01616 -7.17 0.000

8 -0.00947 0.01624 -0.58 0.560

9 0.00260 0.01645 0.16 0.875

10 0.06175 0.01628 3.79 0.000

11 0.09534 0.01647 5.79 0.000

LOWPRESS

LOW 0.00693 0.01522 0.46 0.649

BAROMETE -0.013303 0.000753 -17.67 0.000

The F statistic for this model is 53.537, still extremely large. The r2 is 33.6%, which is higher than before, but still does not lead me to believe that I have constructed the best model possible

Conclusion

In the model above, “LowPress” variable had a p value of 0.649. This led me to remove this predictor. Removing the LOWPRESS predictor from the model yielded the following results:

General Linear Model

Factor Type Levels Values

MM random 12 1 2 3 4 5 6 7 8 9 10 11 12

Analysis of Variance for LOG LIWV, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

BAROMETE 1 13.2509 15.6378 15.6378 447.45 0.000

MM 11 11.1012 11.1012 1.0092 28.88 0.000

Error 1379 48.1938 48.1938 0.0349

Total 1391 72.5458

Term Coef StDev T P

Constant 13.7384 0.6473 21.22 0.000

BAROMETE -0.013486 0.000638 -21.15 0.000

MM

1 0.10729 0.01639 6.54 0.000

2 0.15219 0.01713 8.89 0.000

3 -0.04641 0.01639 -2.83 0.005

4 -0.02259 0.01656 -1.36 0.173

5 -0.12361 0.01621 -7.63 0.000

6 -0.12470 0.01639 -7.61 0.000

7 -0.11595 0.01615 -7.18 0.000

8 -0.00908 0.01621 -0.56 0.576

9 0.00228 0.01643 0.14 0.890

10 0.06246 0.01620 3.85 0.000

11 0.09551 0.01646 5.80 0.000

The r2 value is still low, about 33.6%, but it has not changed significantly from the previous model. The F statistic is about 58.147, which is significant on (12,() degrees of freedom..

This model fulfills only one of the assumptions necessary for least squares regression. The residuals are close to Normally distributed.

[pic]

All of the Cook’s distances are below one so none of the points seem to have excessive influence on the regression. Although there are 14 values with leverage values above the critical [pic], the large sample size removes any concern about the effects of leverage points or outliers.

However, there is strong reason to believe that there is heteroscedasticity in the data. First, there is a clear time-series pattern present in the data, as is exhibited by the plot of residuals versus order. There is much less variance in the residuals observed during the period that is associated with the summer months. Plots of the residuals versus the predictors also reinforce this conclusion. Second, there is a noticeable clustering of residuals between –0.01 and 0.01, and the plot of residuals versus barometer exhibits a “funnel” type structure. This lends further support to my conclusion about heteroscedasticity.

It seems that the local weather predictor “barometer” is the only weather specific variable that has any predictive power. However, the variable “month” is clearly related to the weather. The summer months, which have the mildest weather patterns, clearly have different wave patterns than the other months of the year. This makes intuitive sense.

The explicit time series nature of the data leads me to believe that weighted squares regression would be necessary to produce an accurate model for wave heights at the Long Island buoy. Perhaps with within this framework more distinction could be made between the effects that local and remote weather systems have on waves.

Appendix A

Buoy # 41002 - S HATTERAS - 250 NM East of Charleston, SC

6-meter NOMAD buoy

DACT payload

Longitude: 32.28 N Latitude: 75.20 W (32°16'N 75° 12'W)

Site elevation: sea level

Anemometer height:5 m above site elevation

Barometer elevation: sea level

Sea temp depth:1m below site elevation

Water depth:3,785.6 m

Watch circle radius:3,100 yards

44011 - GEORGES BANK 170 NM East of Hyannis, MA

6-meter NOMAD buoy

DACT payload

Longitude: 41.08 N Latitude: 66.58 W (41°04'N 66°34'W)

Site elevation: sea level

Anemometer height:5 m above site elevation

Barometer elevation: sea level

Sea temp depth:1m below site elevation

Water depth:88.4 m

Watch circle radius:230 yards

44025 - LONG ISLAND 33 NM South of Islip, NY

3-meter discus buoy

DACT payload

Longitude: 40.25 N Latitude: 73.17 W (40°15'N 73°10'W)

Site elevation:sea level

Anemometer height:5 m above site elevation

Barometer elevation:sea level

Sea temp depth:0.6 m below site elevation

Water depth:40.0 m

Watch circle radius:84 yards

-----------------------

[1] See Appendix A for coordinates of buoys.

[2] Long Island Buoy Web Site ?$station=44025

South Hatteras Buoy Web Site: ?$station=41001

Georges Bank Buoy Web Site ?$station=44011

-----------------------

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download