Melting your data
It’s an important component of Data Literacy to understand how to set up a data table in a way that allows for easy analysis.
Tidy Data (aka Long format) has become the standard format for science and business because it easily allows people to turn a data table into graphs, analysis and insight.
So what is Tidy data?
In a Tidy, or Long format data table, all the values from the same variable are in the same column, even if they were measured on different subjects or under other conditions.
Each row therefore represents a single observation of all variables.
And other columns (other variables) tell you the details about the observation, like which subject or under which conditions.
Wide format
The way that many students would instinctively lay out a table for this data is in what is referred to as “wide” format, in which observations of the same variable for different subjects are separated into different columns, like this:
Wide format has the advantage of being short, but by not separating the data into columns that become variables in an analysis (‘Day’, ‘Plant’ and ‘Height’) it can be difficult for a student to identify the independent and dependent variables in a dataset. It’s always a good idea to learn how to convert into Tidy/Long format for analysis.
How to “melt”, or convert from Wide to Tidy
The process of conversion has been called “melting” in both Python and R. You can do this manually, and we have explanations in our User Guide and an older blog post.
DataClassroom also has a ‘Melt’ function, which can do all the data moving for you.
However you do it, the important part is understanding what you are doing! As Hadley Wickham, who coined the term Tidy Data in a widely cited 2014 paper put it:
“I think the organization of data is just as important as how we organize words into sentences and sentences into paragraphs. Sureyoucanreadasentencewithnospaces, but proper punctuation make things so much clearer! The same is even more true with data since most of the time, you'll be using a computer to work with it, and computers have a much smaller capability to "read between the lines" and understand what you're really trying to say.”
Have fun melting your data!