Daglindgren.upsc.se



Selection intensity and order statistics for breeding

Dag Lindgren, ?????

To this document is associated an EXCEL workbook ( .XLS) and a mathcad file ( .CAD). When working with this draft try to get these in the same directory.

Introduction

Response to selection can be expressed as [pic], where R is the selection differential, that is the difference before and after selection , [pic]the heritability, [pic]the phenotypic variance, [pic] the variance of breeding values (the additive genetic variance) and [pic]the selection intensity. The selection intensity can be regarded as a standardized value of the difference before and after selection (Falconer and Mackey 1996). Geneticists and breeders routinely use the selection intensity for predictions of genetic gain. To predict selection intensity, the underlaying density function must be known, usually a normally distributed variable is assumed. Methods to deal with such numeric calculations are needed. For more advanced considerations a collection of formulas and methods related to the concept is needed. This paper try to meet the need to compile such formulas and methods.

The subject was earlier compiled by Lindgren and Nilsson (1985), and many of their formulations and presentations still remains. For more citations to earlier efforts and for extended tables with numerical values this paper can be consulted.

Model and assumptions

Let X be a continuous random variable with probability density function f and distribution function F. A standardized probability density function with mean 0 and variance 1 will be assumed when applicable. Unless the contrary is indicated, a normal distribution will be assumed.

A sample of size n is obtained by unrestricted random sampling from an infinite population of values with distribution function F. j values are selected and the other rejected. The expected mean of the selected values is the selection intensity. The sample may be finite or infinite. The most frequent application of selection intensity concerns truncation selection (it may also be characterized as directional selection or censorship). This means that the j largest values of the sample are selected, or for the infinite case the corresponding proportion of the values, P, above a truncation point, t. The concepts for the infinite case are illustrated in Figure 1.

[pic]

Figure 1. Truncation selection in an infinite population. It is possible to measure x. All individuals with values of x above t (the truncation point) are selected, and those below the truncation point rejected. The probability density function of x is [pic]and the distribution function is F(x) which can be expressed [pic]. The proportion selected is denoted p, which can be expressed[pic].

Value formula for selection intensity following truncation selection, infinite case

The selection intensity (it) following truncation selection at X=t in a sample of infinite size is

[pic]

it is the mean value of selected values in standardized terms.

In the standardized normal distribution case

[pic]

Note that for the normal case it can be simpler expressed than for most distributions as [pic]can be evaluated. Usually the selection intensity with selected proportion as an entry, i(p), is required rather than the truncation point. For the normal case there is no simple formula for that as there is no simple formula for [pic] , but some iterative method has to be used where is [pic]needed. It can be calculated with arbitrary accuracy. Methods compiled by Zelen and Severo (1964) can be used, some methods are developed more in detail by Lindgren and Bondesson (1987).

Many commercial program-packages will include cumulated normal distribution functions which can be used for selection intensity calculations. E.g. MS EXCEL7 functions gave selection intensity (with truncation point as an entry) with at least four correct decimals below 2.5, but at around 3.5 there could be an error in the third decimal. Mathcad7 has an inverse cumulated normal distribution function, and can thus find tp. Using the default precision, selection intensity could be calculated with at least four correct digits for a selected proportion above 10-10, but it seems possible make the accuracy arbitrary high with Mathcad. Standard programmes will continue to be more powerful.

If own programming is made the following rational approximation for finding the truncation point corresponding to a certain proportion, tp, (Bratley et al 1983) can be used rather than using methods which generate iterative loops and branches,

[pic]

where

|a = 0.322232431088 |e = 0.000453642210148 |i = 0.10353775285 |

|b = 1 |f = 0.099348462606 |j = 0.0038560700634 |

|c = 0.342242088547 |g = 0.5888581570495 | |

|d = 0.0204231210245 |h = 0.5311034622366 | |

The relative accuracy is 6 decimal digits. For selection intensities derived by the formula there were at least four correct decimals for selection intensities below 3. For fast and rough calculations simpler rational approximations can be used.

A direct approximation of selection intensity was made by Saxton (1988),

[pic]

cited from Walsh (1999), untested.

The appearance of the selection intensity is demonstrated in Fig A2.

[pic]

Figure A2. The selection intensity as a function of the selected proportion assuming normal distribution. The curve starts at 0 when all are selected and increases monotonically starting straight upwards, passes an inflexion point at p=0.71 and approaches infinity when the selected proportion becomes small.

The selection intensity value derived for a selected proportion in an infinite population will however constitute a poor approximation of selection intensity of the same selected proportion in a finite population (cf Figure A3).

[pic]

Figure A3. The selection intensity as a function of the proportion of a truncated normal distribution. Two cases are demonstrated, either the best value from a sample is selected (lower curve) or the best proportion from an infinite population. The scale of the X-axis is logarithmic, the expressions approach the parabole [pic]when p becomes small in this scale.

If the selected number is small, the error can be considerable if

Order statistics and its use for breeding theory studies

The expected value of the j:th largest observation from a sample of size n, designed ((j,n), can be calculated as

[pic]

The coefficient of this formula may be realised that n ranked objects can be arranged in n! ways. j objects can be taken from in (n-j)!j! ways. Of these permutations (n-j)!(j-1)! are equivalent concerning the rank of the j:th ranking element among the j. The integrand can be interpreted as a probability given that the probability of an individual selection is above x.

These values can be seen as the selection intensity for an individual selected on its rank. Thus in theoretical breeding studies it is useful technique to use order statistics for individual values. It has the advantage that the values are typical and underlying parameters are known. It is possible to deal with the corresponding situations with Monte-Carlo simulations of values also, but that is technically more complicated, and sometimes computing time is limiting, although the simulations are more reliable, as they will catch the variance around the typical and not just the typical. The technique to use expected order statistics values has been used e. g. by Lindgren et al (1989), Hodge and White (1993), Ruotsalainen and Lindgren (1998) and Wei (199?).

An algorithm for the calculation of exact expected order statistics values was given by Royston (1982). Values are tabulated by Harter (1970). When a numeric evaluation of expression AX is made, accuracy can cause considerable problems and large numeric errors even if the program code is correct, alertness against this is strongly recommended.

Value formula for selection intensity following truncation selection, finite case

If n is finite but large and j not close to 1 or n, the formulas for the infinite case with p=1-j/n will be reasonable approximations, but if high accuracy is desired the demands on size are considerable.

The expected mean value i(j,n) of the j largest observations from a sample of size n could be calculated as

[pic]

A useful approximation was constructed by Burrows (1972) by expanding the mean of a truncated distribution in a Taylor series. It has the following form

[pic]

For the normal probability density case Lindgren and Nilsson (1985) studied the accuracy. The error is rather constant independent of n for n>10 and decreases with j roughly as 1/j2 . The error is less than 0.0005 when j>7 but is of magnitude 0.02 when j=1. Some examples of the size of the error are given in Table AX. It is suggested that Burrows approximation is used for all j if an error of 0.025 (less than 5% of the selection intensity) is acceptable, for j > 2 if an error 6 if error -inf |->0 |0 | | | |

|.999 | | | | | | |

|.99 | | | | | | |

|.95 | | | | | | |

|.9 | | | | | | |

|.7 | | | | | | |

|.5 | | | | | | |

|.3 | | | | | | |

|.2 | | | | | | |

|.1 | | | | | | |

|.05 | | | | | | |

|.02 | | | | | | |

|.01 | | | | | | |

|.001 | | | | | | |

|.0001 | | | | | | |

|.00001 | | | | | | |

|.000001 | | | | | | |

Table A10 Values for a truncated normal function with truncation point as entry

|Truncation point |Selected proportion|( |Selection intensity| | |Vt |

|t |pt=1-((t) |((t) |i(t) | | | |

|t->-inf |->1 |->0 |->0 | | | |

|-3 |.998650 |.004432 |.0044 | | | |

|-2 | | | | | | |

|-1 | | | | | | |

|0 | | | | | | |

|1 | | | | | | |

|2 | | | | | | |

|3 | | | | | | |

|4 | | | | | | |

|5 | | | | | | |

| | | | | | | |

|t->inf | | | | | | |

Table A11

Possible further tasks

If we set the goal to have an algorithm which gives an error ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download