Allele frequency estimation in the human ABO blood group system

Allele frequency estimation in the human ABO blood group system

Pedro J.N. Silva Faculdade de Ciencias da Universidade de Lisboa

Campo Grande, C2, 4o. piso P-1700 LISBOA PORTUGAL

Pedro.Silva@fc.ul.pt

2002

Table of Contents

OVERVIEW

1

THEORY

2

Population genetics

2

Genetics nomenclature

2

The ABO system

2

Hardy-Weinberg frequencies

3

ABO allele frequency estimators

3

Bernestein (1925)

3

Bernestein (1930)

4

Wiener (1929)

4

Maximum Likelihood and the EM algorithm

5

Statistics

5

Maximum Likelihood

5

The EM algorithm

5

Log-likelihood ratio test

6

Pearson's 2 test

6

S2 ABOESTIMATOR

7

Description

7

How to get the latest version

7

RECOMMENDED READING

8

Overview

We deal here with the estimation of allele frequencies of the human ABO blood group system.

It is assumed that

? the ABO system is determined by three alleles of a single gene, call them A, B and O ? A and B are codominant, and both are dominant over O ? this gene is in Hardy-Weinberg frequencies in the population ? the data are a random sample from the population

You should be familiar with classical population genetics, maximum likelihood estimation and the EM algorithm, as well as statistical testing in general and goodness-of-fit tests in particular.

Towards the end, there is a plug for a computer program that you may find useful for the actual calculations.

See some suggested bibliography at the end, and a brief summary of relevant theory follows.

ABO allele frequency estimation

1

Theory

Population genetics

Genetics nomenclature

A gene is a unit of hereditary transmission (or, as some whould say, a gene is whatever geneticists study...).

Different forms of the same gene are known as alleles (e.g., A and a; A, B and O).

Alleles may be combined in genotypes (e.g., AB, or OO), which may or may not have distinct phenotypes (e.g., white or red flowers; different blood groups), depending on dominance relationships. For example, since AA and AO have the same phenotype (blood group A), different from that of OO, we say A is dominant over O; on the other hand, AA, AB and BB all have distinct phenotypes (blood groups A, AB and B, respectively), so we say A and B are codominant.

The relative proportion of each allele in a population is called its allele frequency; similarly, the relative proportion of each genotype is its genotypic frequency and, as you can guess, the relative proportion of each phenotype is the phenotypic frequency. As long as there is no dominance, the frequency of one allele can be estimated from the genotypic frequencies by adding the homozygote frequencies and half the heterozygote frequencies (for the respective allele). For example, for two alleles,

pA

=

2N AA + 2N

N Aa

=

n AA

+

1 2

n

Aa

.

However, if there is dominance we cannot distinguish (some of) the homozygotes and (some of) the heterozygotes, so this simple procedure cannot be used, and we can run into trouble.

The ABO system

The ABO is a blood group system notorious for being responsible for blood transfusion accidents. It was among the first human traits proven to be mendelian. It was often used in forensic (identitication and paternity) studies, but has been superceded in this by other genetic markers. It remains clinically important, and a great system for teaching.

We assume that ? the ABO system is determined by three alleles of a single gene, call them A, B and O ? A and B are codominant, and both are dominant over O ? this gene is in Hardy-Weinberg frequencies in the population

Note that these assumptions are not necessarily true. Why the Hardy-Weinberg assumption? For without it, estimation of allele frequencies is not possible in this case (because of dominance). Wanna try?... :-) Because of its importance in the estimation proceedings, this assumption should always be tested.

These assumptions, and some of its consequences, are summarized in the following table, where p, q and r are the frequencies of alleles A, B and O, respectively:

2

Pedro J.N. Silva

Phenotype

Genotype

Phenotypic

Genotypic

Expected

(Blood group)

frequency

frequency

frequency

---------------------------------------------------------------------------------------------------------------------------------

A

AA + AO

nA

nAA+nAO

p2 + 2 pr

B

BB + BO

nB

nBB+nBO

q2 + 2 pr

AB

AB

nAB

nAB

2 pq

O

OO

nO

nOO

r2

---------------------------------------------------------------------------------------------------------------------------------

Total

n

n

1

It is interesting to note that the genetic basis of the ABO system was not determined by family investigations, as might be expected, but by testing the predictions of the competing genetic hypotheses (two genes with two alleles each vs. the above model) against actual population data, using the Hardy-Weinberg law.

Hardy-Weinberg frequencies

While the (complete set of) genotypic frequencies always determine the allelic frequencies, the reverse is not necessarily true, that is, we cannot always calculate the genotypic frequencies from the allelic.

Given some assumptions -- random union of gametes (with or without random mating), very large population size (in theory, infinite), absence of selection, migration, etc. --, however, the genotypic frequencies eventually take a form that depends only on the allele frequencies. For example, for an autosomal gene with just two alleles (A and a) with respective frequencies p and q, we have three genotypes (AA, Aa and aa), whose frequencies are

p 2 , 2 pq and q2 .

These genotypic frequencies can be thought of as the development of the square of the sum of the allele frequencies:

(

pA

+ qa

)2

=

p

2 A

+

2 p Aqa

+ qa2

.

This result was published independently by the british mathematician G.H. Hardy and the german physician W. Weinberg in 1908.

For more than two alleles we have

( ) ( p1 + p2 + ... + pn )2 = p12 + 2 p1 p2 + ... + 2 p1 pn + p22 + ... + pn2 + ... + 2 pn-1 pn .

ABO allele frequency estimators

Bernestein (1925)

Let us agree to name the three allele frequencies of the ABO system p (of allele A), q (of B) and r (of... you guessed it!).

The oldest estimator of the ABO allele frequencies is due to Bernstein (1925), who had determined the genetic basis of this blood group just the year before using Hardy-Weinberg frequencies.

Since the expected (Hardy-Weinberg) frequency of individuals with blood group O is r2 , a fairly obvious estimate of r is

ABO allele frequency estimation

3

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download