9 Great datasets to get your students familiar with different graph types!


Interpreting graphs can be a challenge for students at all levels. Students often struggle to choose the right type of graph for communicating data that they have collected themselves. On standardized tests, seeing and being asked to interpret an unexpected graph type can be a stumbling block even when the material is otherwise familiar. Educators everywhere tell us that their students often struggle to successfully decipher all the graphs and data figures that appear in their curriculum.


But how to help? 

While we don’t think that there is any magic solution for helping students learn to interpret graphs, we do think there are strategies that work. Namely, practice makes improvement when it comes to student ability to interpret graphs. Like any skill based task, the tried and true method for getting better is meaningful practice that targets the skills that need improvement. Repetition works. 

The following activities were selected as general data sets that any science or math course grades 6-12 could pull from and use.  These and many more subject specific examples that can be found in our Resource Library. The links in the table below take you straight to the student facing datasets and any associated activities:


Type of GraphDataset(s)
Frequency Plots (aka Bar Graphs*)Snapshot of Biodiversity
Lency's Scrunchies Data Set
Pie ChartHarry Potter House Demographics
Graphs that show mean/median and shape of data (dot plots, histograms, error bars)NBA Salary
Cicada Sex Education
Line GraphFlatten the Curve
Does thanksgiving dinner get more expensive every year?
Scatter Plot with line of best fitResist! Exploring resistance of a resistor equation
Early Spring in Kyoto?
 

Frequency Plots (*Bar Graph)

Need to compare counts of different categories?  Frequency plots have your back.  With the X axis as a set of categories, and the Y showing how many samples fell into that category, students can easily compare one categorical value to the next.   If you have two categorical variables you can make a grouped frequency plot with multiple bars per category shown on X. Dataclassroom allows students to quickly and easily manipulate the appearance of their frequency plots, spending the cognitive energy to evaluate the data, rather than  just struggling to represent it.

Two good activities to make frequency plots are Snapshot of Biodiversity Part 1: Camera trap data from the Peruvian Amazon and Lency's Scrunchies Data Set. The former looks at camera trap data that records sightings of large mammal species in the Amazon and the latter is student collected data in an introductory activity turning common items into data. The Lency’s Scrunchies activity also has some slides that can be used as Slow Axis Reveal warm up activity for graph interpretation.

*Note:  While this graph is often called a "bar graph", that name can create confusion when you also consider that some types of graphs that display summaries of a numeric variable are also “bar graphs”. For example, histograms show the distribution of a numeric variable and are also made with bars. In other graphs the height of a bar shows the mean of the data. All of these might generically be called a bar graph, but are very different from a frequency plot that we are referring to in these activities. At DataClassroom, we try to avoid the term bar graph all together and use the terms rectangle based graphs and dot and line based graphs as general terms. When we are talking about a graph where the height of the bar indicates how many observations are from a given category, we call it a frequency plot. 

 

Pie Chart

Slicing the data up one meaningful category at a time, pie charts are a valuable way to show how data is divided as parts of a whole. In most situations where you can use a frequency plot, a pie chart will also work well.  Pie charts are especially  good if you want to show the count of a category in comparison to the total of all other categories. 

A good dataset to practice with pie charts is Harry Potter House Demographics. This dataset is part of the DataClassroom Raw Collection of datasets so there is not a formal activity with the dataset, but students should be able to generate and interpret numerous pie charts like the one shown above using data collected from the Harry Potter books.

 

Line Graph

A line graph is really just a visual aid that is laid on top of an XY scatter plot. Line graphs are good at highlighting the point to point fluctuations in a numeric variable over some period of time or range of values. Often the line implies the passage of time moving from left to right. Note that there is no mathematical function for fitting a line graph to a scatter plot. It is simply an exercise in connecting the dots. 

A good dataset and activity for a simple line graph is Does thanksgiving dinner get more expensive every year?. This activity looks at the change in cost overtime of a traditional Thanksgiving meal. It allows students to look at the data in actual cost as well as cost adjusted for inflation. 

Sometimes students will see graphs comparing line graphs that show more than one series of data. In DataClassroom this is achieved by plotting X and Y variables and then grouping by a categorical variable that we call Z. A good example of this is Flatten the Curve, a dataset and activity that compares the number of cases over time in two different American cities during the 1918 influenza pandemic. 



Graphs that show mean/median and shape of data

Graphs that show central tendency can include histograms, dot plots, and box and whiskers.  These graphs are great for displaying the average or median value of a quantitative (numeric) variable.  With Dataclassroom, student’s don’t have to choose just one view to display - they can visualize the dot plot right alongside (or on top of!) the box and whiskers or a bar where the height represents the mean.

NBA Salary Data 2019: Mean or Median? allows students to play with graphs showing measures of central tendency through an approachable context. 

A more complex dataset, Cicada Sex Education, also allows students to compare population level data of cicada head width across sexes and species groups.  This dataset is small enough to be approachable by older students, but has a lot of different ways that the data can be visualized. As an added bonus to this activity there is both a regular assignment version, and a special tutorial version meant to teach students how to use many of the tools within DataClassroom. 

 

Scatter plot with line of best fit

A line of best fit is often the first time a student sees a mathematical model fitted to a dataset. If both x- and y- axis show numeric variables, you’ve got yourself a scatter plot on DataClassroom.  The regression line choices within DataClassroom are quick to add and adjust starting with the checkbox to the right of the graph, allowing students to think about which line of best fit truly is the…best fit. 

Resist! Exploring resistance of a resistor equation is a physics activity that allows students to fit a power function to measures of resistance as the properties of a resistor within a circuit are manipulated. This is a great example of model fitting that very closely fits the data with very little variation left unexplained by the model. 

Oftentimes in biological examples there  is a lot of residual or unexplained variation around a line of best fit. A good example of this that has real world implications is 

Early Spring in Kyoto? Cherry Tree Blossom Dates- Historical Data. This fascinating dataset, with over 1200 years worth of data collected in Japan, can be used to look at the biological effects of climate change while examining a clear trend that has a lot of variation around the lines of best fit.