The Hardy-Weinberg Equation IRL (in real life!)
Testing allele frequencies in survivor populations of red clover, using the Chi-Square Goodness of Fit and demonstrating concepts from AP Biology Investigation #2
Background
The Hardy-Weinberg equation is a relatively simple mathematical equation that describes a very important principle of population genetics: the amount of genetic variation in a population will remain the same from generation to generation unless there are factors driving the frequencies of certain alleles (genetic variants) to change. We know that in reality there are almost always some factors at play causing the frequency of alleles to change in any population of organisms. These factors include things like natural selection, mutations, nonrandom mating, random genetic drift, and gene flow from mating with nearby populations. Despite the reality of such factors in almost every real population, the Hardy-Weinberg equation remains so important to biology because it establishes the null hypothesis against which those factors are tested by biologists. In other words, we can use the predictions for allele frequencies from the Hardy-Weinberg equation to test whether or not factors are driving evolution at a particular location within the genome.
Hardy-Weinberg Equation
p2+ 2pq + q2 = 1.0
The equation assumes a single gene that has two alleles, one dominant and one recessive. The p is the decimal frequency of the dominant allele and the q is the frequency of the recessive allele. Thus p squared is the frequency of homozygous dominant genotypes, q squared is the frequency of homozygous recessive genotypes, and 2pq is the frequency of heterozygous genotypes. The sum of all three frequencies represents all individuals in the population and has to be 1.0 by definition.
Though the implications of Godfrey Harold Hardy and Wilhelm Weinberg’s equation are intuitive today, when they each arrived at the concept independently in 1908 it contradicted the commonly held belief in evolutionary biology that a dominant allele will tend to increase in frequency over time. What made their contribution to the field of biology even more controversial was their respective professions: Hardy was a mathematician and Weinberg an obstetrician-gynecologist. Using simulations such as this one you can see how this notion is contradicted even in the absence of any kind of disturbance or selective pressure, as well as the impact of the exceptions to circumstances under which genetic variation will remain constant.
Dataset
As we mentioned above, some common “disturbances” to a population that will result in a change in genetic variation include natural selection on allele expression, migration, or random mutations. Each of these factors will impact the likelihood of any allele to be passed to a successive generation and thus push allele frequencies away from Hardy-Weinberg equilibrium. Study of the ways in which alleles “survive” from one generation to the next can offer important insights about the evolutionary pressures that lead to change in a population. The data below are based on the results of a real scientific investigation into the allele frequencies in Red Clover populations, a species whose nitrogen-fixing capabilities are important for maintaining the productivity of agricultural fields. Scientists first measured the frequency of a few minor alleles in a parent generation of Red Clover, and then recorded its frequency in a successive survivor generation. The relationship between the allele frequencies can tell us whether or not there were selective pressures impacting the likelihood of the allele to be passed to the next generation. In other words, without selection we would expect the frequencies of alleles to be unchanged from one generation to the next.
For your analysis, it will be important to consider the observed Minor Allele Frequency in the original population, listed below. Minor Allele Frequency refers to the frequency of the second most common of the two alleles present. The most common allele would be known as the major allele and its frequency would be the Major Allele Frequency. For simplicity’s sake, we’ve represented three of the alleles from the study using genotype notation where the major allele is represented by an upper-case letter and the minor allele a lower-case letter. So, instead of using the term “allele,” we’ve just indicated the genotype at a specific location within the genome, or locus, for each trait. For more information on our reasoning, please see the teacher’s note below.
Frequency in Original Parent Population:
Hint- Use these to set your expected counts when running a Chi-Square Goodness of Fit Test
Genotype frequency of minor allele x @ locus 1 (X/x): 0.46
Genotype frequency of minor allele y @ locus 2 (Y/y): 0.35
Genotype frequency of minor allele z @ locus 3 (Z/z): 0.35
Variables in the dataset:
Individual Clover ID# - Each row in the dataset is an individual plant measured in the survivor generation that descended from the original parent generation.
Genotype @ locus 1- This is a categorical variable that is either the major allele (X) or the minor allele (x)
Genotype @ locus 2- This is a categorical variable that is either the major allele (Y) or the minor allele (y)
Genotype @ locus 3- This is a categorical variable that is either the major allele (Z) or the minor allele (z)
Activity
In carrying out the investigation, researchers were careful to select alleles that were expressed in at least 5% of the original population. Why do you think this was a requirement? Think about what the simulations demonstrated about alleles remaining in the population from one generation to the next.
Use the genotype frequencies in the original population (listed above) to determine the expected count in the next generation (n = 50) for each locus (x, y, z) of the as if there were no selective pressures present.
Make a graph with Genotype X/x on the x axis. Run a Chi-square Goodness of Fit (Interactive Analysis) on the frequency of Genotype X/x in the surviving population. Is there evidence of a disturbance in equilibrium based on the change in frequency from the original population to the surviving population? Be sure to use the expected count you calculated in step 2.
Use a graph-driven Chi-square Goodness of Fit statistical test twice more, but now on the frequencies of Genotype Y/y and Genotype Z/z. For these graphs you’ll only need to identify one variable at a time as X. What might be the explanations of the different frequencies of the alleles in the surviving population? Could we observe this result in an undisturbed population?
Teacher’s Note:
One of the great unsolved problems in genetics is the relationship between actual genetic code and the expression of genes (genotype to phenotype). Although we’re increasingly better able to examine the chemical composition of genetic material, there is still a fuzzy area between alleles and observable, expressed traits. Alleles can refer to varying lengths of genetic code and a location, or locus, on a chromosome can vary just as much. We chose to use a notation that will be easier for students to understand that also doesn’t infringe on the concepts at hand. Furthermore, the study analyzed single nucleotide polymorphisms (SNPs, or “snips”) which usually connote two possible alleles at a single location.