©1997 Mitrick A. Johns
This problem will work you through the "lod score" method of estimating genetic distances in situations other than simple testcrosses. It is one of the basic human genetics methods used today. Although it is possible to do parts of this assignment with a calculator, most of it will be much easier if you use a spreadsheet program such as Lotus 1-2-3 or Microsoft Excel, because there are many repetitive steps.
The fundamental aim of this problem is to determine R, the recombinant fraction (fraction of gametes that are recombinant), using data from relatively small families. R can vary from 0 (2 genes completely linked) to 0.50 (2 genes unlinked). There are 4 basic steps in the process: (1) determine the expected frequencies of F2 phenotypes for every value of R from 0.01 to 0.50; (2) Determine the "likelihood" (L) that the family data observed resulted from a given R value: the maximum likelihood is the best estimate of R for the given data; (3) Determine the Odds Ratio and the logarithm of the odds ratio (lod score) by comparing the Likelihood for each value of R to the Likelihood for unlinked genes (R = 0.50); (4) Add lod scores from different families to achieve an acceptably high lod score so a specific most likely R can be assigned. We are going to do this problem using 2 genes that show complete dominance: the heterozygote is indistinguishable from the dominant homozygote. It is more common to use at least one co-dominant gene, but this simply adds complication.

The expected offspring numbers are calculated as follows:
| A B
0.40 |
A b
0.10 |
a B
0.10 |
a b
0.40 |
|
| A B
0.40 |
A B/A B
0.16 |
A b/A B
0.04 |
a B/A B
0.04 |
a b/A B
0.16 |
| A b
0.10 |
A B/A b
0.04 |
A b/A b
0.01 |
a B/A b
0.01 |
a b/A b
0.04 |
| a B
0.10 |
A B/a B
0.04 |
A b/a B
0.01 |
a B/a B
0.01 |
a b/a B
0.04 |
| a b
0.40 |
A B/a b
0.16 |
A b/a b
0.04 |
a B/a b
0.04 |
a b/a b
0.16 |
| F2 phenotype | cell sums | expected freq |
| A_ B_ | .16+.04+.04+.16+.04+.01+.04+.01+.16 |
0.66 |
| A_ bb | 0.01 + 0.04 + 0.04 |
0.09 |
| aa B_ | 0.01 + 0.04 + 0.04 |
0.09 |
| aa bb | 0.16 |
0.16 |
using a Punnett square to determine the genotypes and multiplying the frequencies of the two gametes that go into each type of offspring, then adding up offspring that have the same phenotype.
Calculate the expected frequencies of the 4 types of offspring for all R values from 0.00 to 0.50.
This is done by determining the likelihood (L) of the observed family for each value of R. The likelihood is simply the probability of the observed family, as determined using the multinomial theorem, an extension of the binomial theorem we studied in class.
First we define our terms for the observed family:
Then we define terms for the expected family proportions (obtained from step 1 above):
The term of the multinomial equation that describes the actual family is: pa qb rc sd multiplied by a coefficient.
The coefficient is: n! /(a! b! c! d!), where ! means "factorial".
This is very similar to the coefficient for the binomial.
Thus, the likelihood equation is: L = [n! /(a! b! c! d!)]pa qb rc sd
Example: Above we calculated the expected phenotype proportions for R = 0.20 (20 map units between A and B). They are: A_ B_ = 0.66; A_ bb = 0.09; aa B_ = 0.09; aa bb = 0.16. A family of 5 children has 2 with the A_ B_ phenotype, 1 with aa B_, and 2 with aa bb. What is the likelihood of this family, given a R of 0.20?
The likelihood (L) needs to be calculated for all values of R between 0.01 and 0.50. Note that the coefficient will be the same for all values of R; the coefficient only depends on the observed data. When this is done, the value of R with the highest likelihood is the best estimate of R that can be obtained with data from this particular family.
The family you observe has 7 A_ B_ offspring, 1 A_ bb, 1 aa B_, and 3 aa bb. Calculate the likelihood for all R values between 0.01 and 0.50. What is the most likely value of R?
We want to be able to compare (and add) data from several different families, to get a good estimate of R. To do this, the L values must be standardized by calculating the Odds Ratio (OR), which is the ratio of the L for each R value divided by the L for R = 0.50 (unlinked). Then, the logarithm of the Odds ratio is taken; this is the lod score. Lod scores from different families can be added. (This is equivalent to multiplying the Odds Ratios, as in the AND rule for two events--family 1 AND family 2--both occurring.) A total lod score for some R value of 3.0 is considered proof of linkage between the two genes.
Example: For R = 0.20, the Odds Ratio = L0.20 / L0.50. We calculated L0.20 = 0.0301 above; L0.50 = 0.00695. The Odds ratio is thus 4.331 and the lod score is the base 10 logarithm of this, 0.637. Clearly it would take several families of this size to reach a lod score of 3.0.
Calculate the Odds Ratio and lod score for all R values from 0.01 to 0.50, from part b above.
Once you have developed the method to calculate lod scores for one family, it is fairly easy to repeat the process for further families. Here are data from 3 more families. Calculate the lod score for each family, then combine the data from all 4 families to find the most likely value of R.
| pheno: | fam 2 | fam 3 | fam 4 |
| A_ B_ | 4 | 2 | 7 |
| A_ bb | 0 | 1 | 0 |
| aa B_ | 2 | 0 | 1 |
| aa bb | 1 | 1 | 1 |
Further refinements are possible as well, for calculating unknown linkage phase (coupling vs. repulsion), for different cross types (other than this simple intercross), and for co-dominant markers.