Understanding Randomness

I’ve always thought there was something inherently attractive about the normal distribution, aka the “bell curve”.

It occurs in nature a lot, due to properties of organisms (like, say, the height of a plant) being the result of very many small random influences. If you roll several dice and sum them many times, the resulting distribution will tend towards this nice, organic curve, clearly centered on the population mean.

But you need a lot of samples!

What is counter-intuitive is how many samples you need to get a reasonably accurate idea of the mean. A teacher told me a student once said they reckoned the experiment must require so many samples “just to make it require more work by the students”. Hmm.

That’s where DataClassroom Simulations can help. Using these, students can get an idea of what random variation looks like, and importantly, gain an intuitive understanding of why this is key to evaluating the significance of a result.

Here’s a histogram of 200 samples, randomly generated around a mean by the DataClassroom Simulator:

 

Histogram of 200 randomly-generated samples

 

200 would be quite a lot for a typical lab experiment, right? Now, they can play with and explore large sample sets quickly and easily.

And without exhausting the students’ enthusiasm with hours of repetitive work!

A simple exercise to try

You can quickly have a play with the simulator to see what it can do. A super-simple introductory simulation like the above can be found at this link:

Simulation: What is the mean?

When you click the Play button (see here for more detailed instructions) you’ll get a number of sample values generated, and plotted on a histogram. See how many you need to generate in order to have a good guess as to what the mean value in the model is (Yes, it’s a secret!).

 
 

Where’s the mean now?

Of course, the simulator can do a lot more than this. Including:

  • Simulating with categorical variables (like dice throws)

  • Multiple variables, with relationships like correlation / causation

  • Creation of datasets, so you can use the generated data to experiment with statistical tests

  • Generating dummy data, to assist in planning experiments or research

Much more in a later post! In the meantime, have a play and check out the User Guide here for what it can do.

Dan TempleComment