Mapping genes with the LOD score method

Genetics Extra Credit Problem: March 18, 1997

©1997 Mitrick A. Johns

This problem will work you through the "lod score" method of estimating genetic distances in situations other than simple testcrosses. It is one of the basic human genetics methods used today. Although it is possible to do parts of this assignment with a calculator, most of it will be much easier if you use a spreadsheet program such as Lotus 1-2-3 or Microsoft Excel, because there are many repetitive steps.

The fundamental aim of this problem is to determine R, the recombinant fraction (fraction of gametes that are recombinant), using data from relatively small families. R can vary from 0 (2 genes completely linked) to 0.50 (2 genes unlinked). There are 4 basic steps in the process: (1) determine the expected frequencies of F2 phenotypes for every value of R from 0.01 to 0.50; (2) Determine the "likelihood" (L) that the family data observed resulted from a given R value: the maximum likelihood is the best estimate of R for the given data; (3) Determine the Odds Ratio and the logarithm of the odds ratio (lod score) by comparing the Likelihood for each value of R to the Likelihood for unlinked genes (R = 0.50); (4) Add lod scores from different families to achieve an acceptably high lod score so a specific most likely R can be assigned. We are going to do this problem using 2 genes that show complete dominance: the heterozygote is indistinguishable from the dominant homozygote. It is more common to use at least one co-dominant gene, but this simply adds complication.

Step 1: Calculate the expected frequency of offspring for values of R from 0 to 0.50.

The expected offspring numbers are calculated as follows:

  1. a. Determine the frequency of each gamete produced by the F1's. For example, if R= 0.20, then 20% of the gametes produced by either parent will be recombinant. Since there are two types of recombinant gamete, A b and a B, the frequency of each will be 0.10. Since 80% of the gametes will be parental, the frequency of the parental types A B and a b will be 0.40 each.
  2. b. Use a Punnett square to determine the offspring being formed from the union of the gametes. Multiply the gamete frequencies to get the offspring frequency. For instance, one cell of the Punnett square has the A B gamete from the father combining with the A b gamete from the mother. The frequency of the A B gamete is 0.40 and the frequency of the A b gamete is 0.10. Thus the frequency of the offspring in this cell is 0.40 x 0.10 = 0.04.
  3. c. Determine the phenotype for each cell in the Punnett square and add up the frequencies to get the total frequency for each offspring phenotype.
A B

0.40

A b

0.10

a B

0.10

a b

0.40

A B

0.40

A B/A B

0.16

A b/A B

0.04

a B/A B

0.04

a b/A B

0.16

A b

0.10

A B/A b

0.04

A b/A b

0.01

a B/A b

0.01

a b/A b

0.04

a B

0.10

A B/a B

0.04

A b/a B

0.01

a B/a B

0.01

a b/a B

0.04

a b

0.40

A B/a b

0.16

A b/a b

0.04

a B/a b

0.04

a b/a b

0.16

F2 phenotype cell sums expected freq
A_ B_ .16+.04+.04+.16+.04+.01+.04+.01+.16

0.66

A_ bb 0.01 + 0.04 + 0.04

0.09

aa B_ 0.01 + 0.04 + 0.04

0.09

aa bb 0.16

0.16

using a Punnett square to determine the genotypes and multiplying the frequencies of the two gametes that go into each type of offspring, then adding up offspring that have the same phenotype.

Assignment part a:

Calculate the expected frequencies of the 4 types of offspring for all R values from 0.00 to 0.50.

Step 2: Examine the observed family data in light of the expected distribution of offspring for each R value.

This is done by determining the likelihood (L) of the observed family for each value of R. The likelihood is simply the probability of the observed family, as determined using the multinomial theorem, an extension of the binomial theorem we studied in class.

First we define our terms for the observed family:

Then we define terms for the expected family proportions (obtained from step 1 above):

The term of the multinomial equation that describes the actual family is: pa qb rc sd multiplied by a coefficient.

The coefficient is: n! /(a! b! c! d!), where ! means "factorial".

This is very similar to the coefficient for the binomial.

Thus, the likelihood equation is: L = [n! /(a! b! c! d!)]pa qb rc sd

Example: Above we calculated the expected phenotype proportions for R = 0.20 (20 map units between A and B). They are: A_ B_ = 0.66; A_ bb = 0.09; aa B_ = 0.09; aa bb = 0.16. A family of 5 children has 2 with the A_ B_ phenotype, 1 with aa B_, and 2 with aa bb. What is the likelihood of this family, given a R of 0.20?

The likelihood (L) needs to be calculated for all values of R between 0.01 and 0.50. Note that the coefficient will be the same for all values of R; the coefficient only depends on the observed data. When this is done, the value of R with the highest likelihood is the best estimate of R that can be obtained with data from this particular family.

Assignment part b.

The family you observe has 7 A_ B_ offspring, 1 A_ bb, 1 aa B_, and 3 aa bb. Calculate the likelihood for all R values between 0.01 and 0.50. What is the most likely value of R?

Steps 3 and 4. Combining data from several families.

We want to be able to compare (and add) data from several different families, to get a good estimate of R. To do this, the L values must be standardized by calculating the Odds Ratio (OR), which is the ratio of the L for each R value divided by the L for R = 0.50 (unlinked). Then, the logarithm of the Odds ratio is taken; this is the lod score. Lod scores from different families can be added. (This is equivalent to multiplying the Odds Ratios, as in the AND rule for two events--family 1 AND family 2--both occurring.) A total lod score for some R value of 3.0 is considered proof of linkage between the two genes.

Example: For R = 0.20, the Odds Ratio = L0.20 / L0.50. We calculated L0.20 = 0.0301 above; L0.50 = 0.00695. The Odds ratio is thus 4.331 and the lod score is the base 10 logarithm of this, 0.637. Clearly it would take several families of this size to reach a lod score of 3.0.

Assignment part c.

Calculate the Odds Ratio and lod score for all R values from 0.01 to 0.50, from part b above.

Assignment part d.

Once you have developed the method to calculate lod scores for one family, it is fairly easy to repeat the process for further families. Here are data from 3 more families. Calculate the lod score for each family, then combine the data from all 4 families to find the most likely value of R.

pheno: fam 2 fam 3 fam 4
A_ B_ 4 2 7
A_ bb 0 1 0
aa B_ 2 0 1
aa bb 1 1 1

Further refinements are possible as well, for calculating unknown linkage phase (coupling vs. repulsion), for different cross types (other than this simple intercross), and for co-dominant markers.