DataClassroom

View Original

Scaffolding for learning to prepare your own data

How can a digital tool support a student as they begin to work with their own data? It’s all very well to just provide the features they’ll need, but what exactly can a tool assist with at the data import and preparation stages?

Data import and formatting

Typically, a student will be entering their data in one of the following ways:

  • Uploading or linking to a spreadsheet

  • Copy / pasting it from a spreadsheet

  • Typing it in manually

They need to develop some best practices in terms of how data should be formatted when using it for analysis. One of the most common mistakes here is having extraneous content in the spreadsheet that does not fit a simple rows/columns format. For example, descriptive text in the sheet, filling the rows above the actual data. Computers are expecting data to be organized in rows and columns with nothing else on the spreadsheet. It can seem logical for students to use a spreadsheet like a piece of paper with notes and descriptions written at the top, but what is a computer going to make of that when reading the data?

A second very common problem is not realizing you need to head your columns with meaningful names of the variable contained in that column. The column will then be filled in with values whether they are numeric or categorical.

To help students with these common data formatting issues we built an interactive data formatting explainer the student can check before importing their data:

But people still make mistakes, right?

What could we do about that?

We experimented with various heuristics to try and interpret what the intention of the layout of the input data was, but our conclusion was that there were simply too many possible ways a student could have structured their data. And maybe more importantly, that if we just got the computer to rearrange it for them, they would be missing out on some important learning.

So we settled on a design principle: let students make the mistake, and then explain what’s wrong.

So we now have a series of helpful warning signs, that pop up when the tool detects something that could indicate a problem. Each warning includes detailed text explaining what the tools thinks is wrong (using the real data) and links to a short explanation of what we recommend as best practice, and why.

(We also added an Ignore button, just in case there is some good reason for things looking the way they do.)

These warning signs are used consistently across the app, where something has been detected that probably indicates a problem that should be sorted out. This gives the student - and the teacher - an “at a glance” overview of whether the data has passed our initial checks, and encourages a methodical approach.

Data preparation

Once the data has been imported, the next step is to specify to the tool what “type” of data we have. Is it numeric? Categorical? Something else? Again, this is a deliberately manual process, as learning about data types is very important. And there are plenty of special cases where the type of some data might not be obvious. Is your ZIP-code considered numeric data?

We insist that the user types all their data (OK, there is still an ignore button!) - and once they do, they can see that everything is ready to go on and have fun with visualization and analysis.

Isn’t this a lot of extra work?

Well, that’s now a bunch of hoops that the student has to jump through before they can get up and running with their data. Is that a good thing? We think so, for two main reasons:

  1. Students are going to develop good practices and get familiar with the concept of data types. Once you know what you are doing, ticking the boxes really takes only seconds, so it’s a good investment.

  2. Should they have any problems with the tool, and need to show a teacher what is happening (in the classroom or online) there is now a simple “sanity check” they can perform before calling for help - are all the warnings fixed?

What do you think?

Any opinions, good ideas or feature requests, let us know at info@dataclassroom.com, Facebook or Twitter.