The Datasaurus Dozen

A classroom activity guide for
exploring the value of summary statistics and graph visualization


These graphs, while vastly different in appearance, all have the exact same mean, standard deviation, regression line slope value, and r^2 values.

These data make for a very surprising lesson in how statistical summary values are not the whole story. In this activity, students dive into the importance of statistical summary values and compare them to the visual story of the data as well.


This activity allows for teachers to teach these data in one of two ways:

1) Break students into up to 13 groups (one for each unique dataset), and facilitate class discussion around each of the datasets, or…

2) students complete the assignment independently, and investigate 3 of the 13 datasets.

Below, find a guide for how to walk groups of students through class discussions, as well as links for both types of assignments.


Background

When exploring a new dataset, it’s common to start by calculating what are often called descriptive or summary statistics. Some of the most used summary statistics you can calculate are things like mean, median, range, a regression line, or r-squared value. These summary (or descriptive) statistics are often used as a kind of quick summary for how the data are distributed for a particular numeric variable or variables. However, sometimes those summaries don’t tell the full story. In fact, there are often surprises hiding in the data that can’t be seen with summary statistics.

Graphs are one of the most powerful tools for finding stories that are hiding in the data. Graphing the data can help us see trends, clusters, patterns or differences between groups that would be undetectable with just summary statistics. Furthermore, it is possible that two different datasets will have relatively similar summary statistics, but have data that tell very different stories. 

In this activity, you will explore a set of datasets through both their summary statistics and through graphing. This group of 13 datasets is collectively known as the Datasaurus Dozen, and was created by scientists to be used by students like you who are learning to work with data. Either as a group or individually, you will explore these datasets through summary statistics AND through graphing. As you work, keep in mind the question: what can a visualization show me that a few numbers cannot?


Overview

This Teacher Guide is a bit different from the usual DC format. Instead of being a direct answer key to a student handout, this guide is intended to support teachers through an activity with the intention of demonstrating how summary statistics alone are not enough to understand data. Graphical visualizations are equally important.

Student handouts remain the usual format, and are attached to each dataset.


There are two ways this lesson can go:

  1. As a partner and class activity where the whole class is broken into groups, and looking at the 13 datasets separately, but then sharing out in class discussions.

  2. As independent assignments where students are viewing and comparing values for 3 of the 13 datasets.

By the end of the lesson…

  • Students will complete a statistical summary of a dataset

  • Students will predict the distribution of data based on x and y graphs.

  • Students will compare different visualizations to their statistical summaries 

  • Students will reflect on how data visualization and summary statistics work together to tell the story of data.

Big Takeaway…

There are infinite amounts of datasets / configurations which will produce the exact same summary statistics.  By looking at stats alone, we can’t determine the whole story of the data. Understanding the visual representation of the data is an imperative piece to data analysis.



Setup:

Partner Activity with Group Discussions

  • Group students into 13 groups of 2-3 students each

  • Make accessible a class-wide google sheet to input their values to compare. We have made one for you here to use!

  • Assign each group one of the 13 datasets in DataClassroom. 

    • Each dataset is titled A through M, and already has the student activity attached within DC

    • Students can access their assigned dataset by clicking the link in their assigned row.

Search the Resource Library for “Datasaurus” or

Click below to find links to each of the 13 datasets.

Group: A. B. C. D. E. F. G. H. I. J. K. L. M.

Individual Assignment

There is no further instruction for this type of assignment - go ahead and get going!

See the student facing individual assignment
 

Lesson Progression (Class Discussions)

  1. Students make their way to their assigned dataset. They can do so by:

  • opening the Resource Library and searching “Datasaurus”. Students select Open Dataset for the group they have been assigned.

  • opening the shared google-sheet (we have provided a copy!) and clicking the dataset link from there (found under column titled “Link to Dataset”).

2. Students open their assigned dataset in DataClassroom. 

  • Remind them to keep their dataset image as secret as possible from other groups.

3. Students complete Part 1 of the attached assignment. 

    • The end of Part 1 directs students to fill in the google-sheet with their information. Be sure to clarify to students how that sheet can be found, based on your own favorite way to digitally distribute information.


4. Bring students back together as a group, and discuss what they see for values among the class sheet. 


5. Students complete Part 2, where they see another group’s visualization of Y and X, and try to predict what the X vs. Y  graph may look like. 

  • Each student group is given a new graph of X values and Y values which come from a different Datasaurus dataset.  It is already assigned to each different group.

  • Students will create an electronic prediction by double clicking the (already embedded) GoogleDrawing, and modify the data points to give a general, qualitative idea of their prediction. 

  • Students now have a shareable, electronic prediction graph they can share with you however makes the most sense in your classroom.

6. Group by group, students share their original X vs. Y graph from Part 1. Before the reveal, the other group who made a prediction about that graph shares their answers (from Part 2 - the prediction of what this graph looks like).  

This can happen through students sharing their screens, pasting their graphs into a shared google doc, or however else the class usually shares out group work. 

If the teacher wants to themselves pivot from one dataset to another within DC, all data are actually included in all datasets.  You can change which data are excluded and you are viewing by:

  • In graph view…

  • within any of the Datasaurus Datasets (group does not matter)…

  • clicking on the gear icon beneath the variable Sample Group.

  • Select all you wish to exclude with a checkmark, and deselect the dataset you wish to view.

7. Students complete Part 3 (a reflection) and turn in assignment.




Want an Answer Key? Fill out the form below.