SAMPLE Using Linear Regression REPORT

SSatellite Applications Motivated by the Development of a Silver-Zinc Battery A Battery Performance Analysis MPLEUsing Linear Regression REP By: Leslie Gillespie-Marthaler O EMSE 271 RT December 18, 2009

1

Introduction: Satellite manufacturers recently proposed replacing battery technology with a silver-zinc technology. Since satellite applications require reliable and long-lasting battery technology, the manufacturing association requested an analysis of the following:

1. Develop a model for linear regression based on battery performance data, using the Log

of (Cycles to Failure); the model should be based on the best predictors available to

characteristic the behavior of the battery throughout its lifecycle; 2. Perform diagnostic analysis of the fitted model; and 3. Forecast the Cycles to Failure with a 95% confidence interval, using the model for the

following independent variables: X1 = 1.5, X2 = 4.5, X3 = 50, X4 = 25, X5 = 2.

SThe table below provides the original battery performance data provided by the manufacturing

association.

AThe Dependent Variable is:

- Cycles to Failure is the dependent variable (Y)

M - The Log of (Cycles to Failure) is represented as Log(Y)

The Independent Variables are: - Charge Rate (X1)

P - Discharge Rate (X2)

- Depth of Discharge (X3) - Temperature (X4)

L - End of Charge (X5) ETable 1: Original Performance Data

Data 1 2 3 4 5 6 7 8 9 10 11

Cycles to

Failure

Y 101.000 141.000 96.000 125.000 43.000 16.000 188.000 10.000

3.000 386.000 45.000

Log Cycles

to Failure

Log(Y) 2.004 2.149 1.982 2.097 1.633 1.204 2.274 1.000 0.477 2.587 1.653

Charge Rate

(Amps)

X1 0.375 1.000 1.000 1.000 1.625 1.625 1.625 0.375 1.000 1.000 1.000

Discharge

RRate

(Amps) X2

E 3.130

3.130

P 3.130

3.130 3.130

O 3.130

3.130 5.000

R 5.000

5.000

T 5.000

Depth of Discharge

(% of rated amperehours)

X3

60.000 76.800 60.000 60.000 43.200 60.000 60.000 76.800 43.200 43.200 100.000

Temperature (Celsius)

X4 40.000 30.000 20.000 20.000 10.000 20.000 20.000 10.000 10.000 30.000 20.000

End of charge (Volts)

X5 2.000 1.990 2.000 1.980 2.010 2.000 2.020 2.010 1.990 2.010 2.000

12

2.000 0.301

1.625

5.000

76.800

10.000

1.990

13

76.000 1.881

0.375

1.250

76.800

10.000

2.010

14

78.000 1.892

1.000

1.250

43.200

10.000

1.990

15 160.000 2.204

1.000

1.250

76.800

30.000

2.000

16

3.000 0.477

1.000

1.250

60.000

0.000

2.000

17 216.000 2.334

1.625

1.250

43.200

30.000

1.990

18

73.000 1.863

1.625

1.250

60.000

20.000

2.000

19 314.000 2.497

0.375

3.130

76.800

30.000

1.990

20 170.000 2.230

0.375

3.130

60.000

20.000

2.000

2

When initially analyzing the performance data, the following observations were made concerning the Dependent Variable (Y) and its relationship with the Independent Variables (X15):

- There is large variability in the original cycles to failure (Y) data. In the histogram of the

dependent variable (Y), we can see that it is skewed toward the left. This could be

problematic in conducting the regression analysis. - When we conduct a probability plot for this data, the standard deviation is also very

large.

SThese observations are displayed in the histogram and probability plot generated by Minitab

below:

A Figure 1: Histogram of Cycles to Failure (Y)

Histogram of Cycles to Failure

M Normal

5

Mean 112.3

StDev 104.7

N

20

P4

LE 3

Frequency

2 1 0

99 95 90 80

RE -100

0

100

200

300

400

Cycles to Failure

P Figure 2: Probability Plot of Cycles to Failure (Y)

O Probability Plot of Cycles to Failure RT Normal - 95% CI

Mean StDev N AD P-Value

112.3 104.7

20 0.668 0.069

70

Percent

60

50

40

30

20

10 5

1

-300 -200 -100

0

100 200 300 400 500

Cycles to Failure

3

We would prefer a more normalized distribution for the dependent variable. When comparing the original dependent variable (Y) to the Log (Y), we do see some improvement in the distribution, indicating increased normality. The following observations were made when analyzing Log (Y):

- The standard deviation for Log cycles to failure is much smaller, but the P-value has

decreased. - In general, we would prefer to have a larger p-value in order to indicate greater

normality of the distribution. - At this point, it is difficult to discern the greater normality expressed by the Log (Y).

S- For the purposes of this project (and to meet the client's request), we will choose (Log cycles to failure) as the dependent variable for the regression model. Choosing the Log(Y) allows for clear interpretation in that constant changes to Log(Y) translate to Aconstant percentage changes in Y.

These observations are displayed in the histogram and probability plot generated by Minitab

M below:

Figure 3: Histogram of Log Cycles to Failure (Log(Y))

P Histogram of Log Cycles to Failure

Normal

L 7

Mean 1.737

E StDev 0.6875

Frequency

6 5 4 3 2 1 0

0.4

0.8

N

20

REPO 1.2 1.6 2.0 2.4 2.8 3.2 RT Log Cycles to Failure

4

Figure 4: Probability Plot of Log Cycles to Failure (Log(Y))

Probability Plot of Log Cycles to Failure

Normal - 95% CI

99

Mean

1.737

Percent

95

90

80

S 70 60 50 A 40 30 20 M10 5 P1 0

1

2

3

Log Cycles to Failure

StDev N AD P-Value

0.6875 20

1.046 0.007

4

LE Correlation Analysis: In order to determine the best predictors for the regression model, we

completed a correlation analysis of the dependent variable Log(Y) and the independent variables (X1-5). The figure below displays the correlation strengths between the dependent and independent variables.

R Figure 5: Correlation between Log(Y) and X1-5

EP Log(Y)

X1

O X2

X3

R X4 T X5

Log Cycles to Failure

Log(Y) 1 -

0.175377126 -

0.291453599 -

0.068901748 0.718930287

0.101140168

Charge Rate

X1

1 -0.08686 -0.31402 -0.13537 0.007163

Discharge Rate

X2

1 0.191942 -0.00283 0.064439

Depth Discharge

X3

1 0.066934 0.019973

Temp X4

1 0.11434

End of Charge

X5

1

The threshold chosen to indicate significant correlation is (0.19). The highlighted values represent significant correlation. Based on these findings, we should keep the following independent variables as best predictors for the regression model: (X2) Discharge Rate, (X3) Depth of Discharge, and (X4) Temperature.

Initial Regression Analysis: Based on this decision, we then move forward with regression analysis using the informed outcome from the correlation analysis. The results of the initial regression analysis are displayed below.

5

S Regression Statistics

Multiple R

0.778

R Square

0.605

A Adjusted R Square

0.530

Standard Error

0.471

Observations

20.000

M ANOVA P Regression

Residual

L Total

df 3.000 16.000 19.000

E Log(Y) Intercept

(X2) Discharge rate

R (X3) Depth of discharge E (X4) Temperature

Coefficients 1.352 -0.134 -0.003 0.050

Figure 6: Initial Regression Analysis for Log(Y) and X2, X3, X4

SS 5.429 3.551 8.980

Standard Error 0.510 0.077 0.007 0.011

MS 1.810 0.222

t Stat 2.651 -1.730 -0.399 4.584

F-value is moderately high

P-value is moderately low

P-value

F

Significance F

8.154

0.002

VIF

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

from Minitab

0.017

0.271

2.434

0.271

2.434

0.103

-0.298

0.030

-0.298

0.030

1.039

0.695

-0.018

0.012

-0.018

0.012

1.043

0.000

0.027

0.073

0.027

0.073

1.005

The P-value for depth of discharge is high, which indicates that we may want to discard.

The P-value for discharge rate is also high, which indicates that we may want to discard.

PO The regression equation is: RT Log Cycles to Failure = 1.35 - 0.134 Discharge Rate - 0.00285 Depth of Discharge + 0.0497 Temperature

6

The observations resulting from the initial regression results above are as follows:

- The variance inflation factors (VIF) values obtained from Minitab for each independent

variable are all in the range of 1, so there is little to no colinearity among independent

variables and the estimates for coefficients are considered stable.

- The R-Squared = 60.5%, which is moderately high. Ultimately, we would like a higher R-

Squared value, indicating increased "goodness of fit" for the model. - The Durbin-Watson statistic = 2.02425, indicating very little to no presence of auto-

correlation among observations. - The critical F-value is moderately high, but not significantly high. Ultimately, we would

S prefer a higher F-value. - The statistical significance, or P-value is low, but not extremely low. Ultimately, we would prefer a lower P-value that is closer to zero. A- When looking at the individual P-values for the independent variables, that X2 and X3 have high P-values. In particular, the P-value for X3 is very high. This indicates that we M may want to consider discarding X3 from the model. - The residual analysis appears to support the assumption of normality for residuals. - The normal probability plot of the residuals shows some deviation from normality. However, deviations do not invalidate the assumption of normality for the residuals. P - We do see a high P-value for the residuals probability plot (0.243), which indicates goodness of fit for normality test. - There is 1 influential observation (outlier) identified within the probability plot for the L residuals. This observation may require review or possible removal. - There is no apparent heteroscedasticity in the plot of the residual versus fitted values E for Log(Y). So, there is evidence to support constant variance in residuals. The following figures support the observations listed above:

Figure 7: Residual Plots for Log (Y)

R Residual Plots for Log Cycles to Failure

E Normal Probability Plot

99

Versus Fits

0.6

P 90

0.3

50

Residual

O 10

1

-1.0

-0.5

0.0

0.5

1.0

R Residual

0.0 -0.3 -0.6

1.0

1.5

2.0

2.5

3.0

Fitted Value

Percent

T Histogram

Versus Order

0.6 4.8

Residual

Frequency

0.3 3.6

0.0

2.4 -0.3

1.2 -0.6

0.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6

Residual

2 4 6 8 10 12 14 16 18 20 Observation Order

7

Figure 8: Probability Plot of Residuals

Probability Plot of RESI5

Normal - 95% CI

99

Percent

95 90

S 80

70

A60 50 40 M30 20

10

P 5

L 1

-1.5

-1.0

-0.5

0.0

0.5

1.0

ERESI5

Mean StDev N AD P-Value

-4.99600E-16 0.471 20 0.654 0.243

Figure 9: Plot of Residuals versus Fitted Values for Log(Y)

R Residuals Versus Fitted Values E 0.50 P 0.25 O 0.00 RT -0.25

RESI5

-0.50

-0.75

1.0

1.5

2.0

2.5

3.0

FITS5

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download