DataClassroom

View Original

Multiple line graphs

It’s not often we do a blog post that’s a kind of “how to”, but the simple multiple line graph illustrates some important points regarding data formatting and Tidy Data, which are important aspects of Data Literacy and useful for all kinds of data analysis. So here goes.

I’ll take a look at the Ready-to-Teach activity Reaction Rate Lab - you can read the instructions here - and its associated dataset, which you can open in DataClassroom with this link.

Lines or dots?

First thing to notice is that the graph type to use is called Dot / line based graph. This is because in DataClassroom, these are really the same thing - a plot of a number of points, which may or may not be:

  • displayed as dots (default, unless you select hide dots)

  • connected by lines

Progressive X-axis

The next thing to note is that for a nice line graph, the X axis needs to have a progressive sequence. If the X variable is numeric, this is no problem as the axis will automatically be ordered lowest to highest, but if the X variable is categorical with an implied sequence like January, February, March… then the categorical values need to be in the right order. If they are not, you can order them as described here.

Multiple lines

In DataClassroom, you separate points into multiple lines using a categorical variable. In the Reaction Rate lab dataset (here) this is the Compound variable, with three values: N2O5, NO2 and O2.

You add this third variable to any graph as a ‘Z axis’ variable, which can be used to both:

  • color points, and

  • group points

As you may have guessed, it’s the group function you use to split the points up into groups, so a line can be drawn through each group separately. You just check Group by Z once you’ve added the Z variable:

This then divides the points up into groups, and you can connect them by lines as you wish. You also get a legend showing the values and their colors. The colors are assigned automatically from a color scale - you can choose a different scale (here’s how) or you can assign colors directly to each value (here’s how) if you feel inspired.

What if you don’t have the extra variable?

This could be an issue, if your data is in a “non-tidy” format, where data from the different groups has been placed in different columns, like this:


On the face of it, there’s nothing wrong with this data - it’s just that the grouping of the measurements as being from Plant A or Plant B is done by column position, rather than the value of a variable. This table just needs converting to ‘Tidy’ format, which is fortunately quite simple and has a step-by-step explanation in our User Guide here. Remember, one of the main rules for Tidy Data is that each variable gets its own column and each observation gets its own row. In this example the variables are Day (numeric), Plant (categorical), and Height (numeric), so those should be the headings for your three columns. The dataset will then have 8 rows of values for the 8 total observations that were made (4 for each plant).

Advantages? Using Tidy Data format (i.e. having the grouping variable) is standard practice for scientific analysis of data, and a key aspect of Data Literacy.

Other ways to use Group By Z

The Group by Z option is quite powerful, and can be used in other circumstances, for example to group the bars in a histogram, or to separate data into groups for analysis, such as adding multiple regression lines (here’s an example of the same data as above, just with a quadratic regression line instead of joining the points:

Hope this gave a good overview of how grouping data works, and you get to make some nice, colorful graphs today!