How many babies do you need to find the average baby?
A DataClassroom Simulation Activity
Background
Complex traits that are controlled by many factors tend to be distributed in a population with a normal (or a Gaussian) distribution. This has led to the wide use of the Central Limit Theorem in statistics. One of the the most important takeaways from the theorem is that anytime many additive factors are added up, a normal distribution is going to result. This allows us to make powerful inferences from a sample of a population if we assume the underlying distribution is normal. It enables us to take a relatively small sample of a very large population and infer the actual population mean with a reliable degree of certainty. But exactly how big of a sample do we need in practice to reliably infer that population mean?
This activity will allow you to take samples from a simulated population of human babies in order to infer the population mean for human birthweight.
Dataset
The datasets that you create in this activity will be created by sampling from a simulated population for human birthweight. The underlying population mean and variance in this model are based on data collected in the US from 1990 to 2013.
Variable
Birthweight (g) - This numeric variable represents the weight in grams of a single baby sampled from an infinitely large population.
Activity
Get to know the data
In this activity you will repeatedly sample the simulated population and revise your estimate of the mean each time you add a sample or group of samples.
Use the a) Generate #samples field to get the desired number of samples for each simulation run. Press b) the red Go button to start each run of the simulator. Hover on the c) information icon to see the mean for any given run.
Run nine different simulation runs of increasing size to fill in the sample mean into the tab.
Total # of data points | Sample Mean |
---|---|
1 | |
3 | |
10 | |
20 | |
50 | |
100 | |
1000 | |
2000 | |
10000 |
2. At which sample size are you reasonably confident that your sample mean is within 10 grams of the actual population mean birth weight? Refer to your recorded sample means as evidence supporting your response.
3. An ideal normal distribution is often illustrated by a smooth curve that looks like this:
What is the minimum number of samples you need to run in your simulation before the histogram is completely smooth and closely resembles this curve?
Paste in your graph and list your sample size to illustrate a smooth histogram following a normal distribution.
4. After working through these questions with simulated data how would you answer the question? How many babies do you need to sample to reliably estimate the mean birth weight in a human population?