DataClassroom

View Original

How much does this dog weigh?

Wisdom of the crowd for estimating averages


Background

There exists a curious phenomenon that has been studied where crowds are said to have wisdom. This can often be seen when large numbers of individuals are asked to estimate some quantity. The average (or mean) value of multiple estimates has often been shown to be closer to the actual value of a quantity than any single estimate. This has often been called the wisdom of the crowd.

This phenomenon was first described by Francis Galton* in 1906. He was attending a farmer’s fair and was fascinated by a weight guessing game. The object of the game was to guess the weight of an ox. People who played the game wrote their guesses on paper tickets that they used as their entry. The closest guess would win a prize.

After the game was played and the winner received the prize, Galton asked if he could have the nearly 800 tickets that people had entered in the contest. He used these to conduct a statistical analysis with the goal of looking at the accuracy of the average guess as well as the variation of all the guesses around that average. What he found was remarkable. The average guess of the large crowd was actually better than the individual winning guess. The average guess of the crowd was within 1 lb of the actual 1,198 lbs weight of the ox.

Since that time this wisdom of the crowd has been studied many times and has been shown to be well supported. People have noted that the estimates of individuals tend to vary around the true value of the quantity, but that average guess tends to be very close to the true mean when the crowd is sufficiently large. This has some important implications for group decision making. Do we come to better conclusions when we collect a diversity of opinions?

As this wisdom of the crowd phenomenon has been studied, researchers have identified a set of conditions that are needed for a crowd to make strong collective guess or estimation.

1. Each member of the crowd must have their own independent source of information.
2. Individuals must make estimates that are not influenced by those around them
3. A mechanism must exist to collect the individual estimates without violating #2

However it has been noted thatthere are certain conditions that can lead the crowd to make a poor estimate. These include:

1. The crowd is defining its own question.
2. The answer cannot be evaluated by a simple result (like a single number).
3. The information that informs the crowd is biased in some way.



Dataset

We built a dataset by asking individuals to guess the weight of a specific dog. This dataset was collected in two different ways. Some observations were collected over social media (Twitter, Facebook, Instagram) with participants asked to estimate the mass of the dog when viewing a picture with a skewed perspective; a small girl (~50 lbs) holding the dog in front of her and close to the camera. (This is the picture right). The other observations were collected over live video (Zoom) where the dog was displayed for the camera and held by a man (~200 lbs).

The actual mass ofthe dog was 41.1 lbs.

Variables

Estimated Weight(lbs) - This numeric variable is the estimated mass of the dog in pounds.

Person for Scale - This categorical variable which type of person was included in the view that the crowd saw. It has two values; either child, or adult.

Person for scale: Child

Person for scale: Adult

Activity

  1. Make a graph showing the raw data with a jittered dot plot. Show estimated weight on Y with no variable on X. Paste the graph below:

2. What was the average guess ofthe crowd? Add descriptive stats with the check box to show the mean. Paste your new graph below:

3. How much variation do you see among the guesses ofthe dog’s weight?


4. How do you think the crowd did estimating the dog’s weight? Do you think this dataset supports the wisdom ofthe crowd idea?

5. Make a graph that splits the data by the variable called Information Sources. Show estimated weight on Y and Information Source on X. Paste your graph below:

6. Did the source of information (seeing the dog on Zoom or in the photo) seem to affectthe average guess? Refer to the data or the graph as evidence for your answer.

7. Were these data collected with conditions that allow a crowd to make a strong collective guess? Explain your answer.

8. What would you change about the way these data were collected in order to create better conditions for an accurate crowd guess ofthe dog’s weight?


Want an Answer Key? Fill out the form below.

See this content in the original post