DataClassroom

View Original

Data Science or Data Literacy?

What do the terms Data Science and Data Literacy mean? Are they interchangeable? Here’s my take. We want to enable students to:

Do Data Science, and achieve Data Literacy

So Data Science is what you do, and Data Literacy is what you know?

But it’s not quite that simple, in much the same way as it’s technically possible to understand a language without being able to write it, or indeed speak it fluently, it’s also possible to understand how to read data graphs without being fluent enough to frame your own questions, make visualizations or communicate with data.

The two areas are interdependent. Technical skills enable you to apply and obtain knowledge, while knowledge guides you in how and when to apply your technical skills, and what to learn next.

Data Literacy: your base knowledge and wisdom

If we were to list the various key elements of Data Literacy in an order suitable for learning, it might look something like this:

To get started, some concepts:

  • Some understanding of probability

  • Ditto, randomness

  • The fact you can draw insights from data you have collected

Then aspects of describing reality through data:

  • Visualizations of data, and their advantages and disadvantages

  • Descriptive statistics like mean and median, what they illustrate

  • That you can frame a hypothesis about reality and judge the level of support for it

And then some more analytical / statistical concepts:

  • The concept of statistical significance and understanding that anecdotes are not necessarily representative

  • That there are mathematical techniques called hypothesis tests

  • The importance of how data is collected, concepts like control groups

Are we there yet? Well, this is a pretty good start. Anyone with a good grasp of the above is well ahead of the game, and is already armed with many concepts they can use to think critically about information they are presented with in their everyday life, and especially in the media and advertising. A more capable, and data literate citizen, in other words.

“A new study shows….” - does it really? How many people did they ask? What was the actual question?

And that’s without being able to do a single actual mathematical operation or name a single statistical test. In theory.

Because of course, you could acquire all the above just by reading about it, but a real learning process which will build the actual intuition and confidence in abilities is hard to imagine without actually doing it for real. Doing Data Science. So let’s look at that.

Data Science: the toolkit

If we say Data Science is the process and techniques of working with data, then (in a roughly increasing order of complexity) here’s my take on the basic toolkit the student should acquire:

  • Calculating means and medians

  • Drawing basic graphs

  • Formatting data, using variables to represent data

  • Performing a scientific study, framing questions or designing observation techniques

  • Measuring variation using standard deviation or similar

  • Drawing regression lines

  • Graphing with several variables

  • Performing basic hypothesis tests like the T-test or Chi-square test

  • Evaluating significance with P-values

I could go on, and I’m sure there can be plenty of opinions about the ordering. But this is already a pretty good level to get to. Anyone who can do the above is well placed to be able to extend their skills to more advanced areas.

Considering these together with the Data Literacy concepts above, it also becomes apparent that these skills cannot stand alone. It’s one thing to be able to calculate a standard deviation, and another to be able to relate it to its meaning in the real world. Or to know when or why you might want to calculate it in the first place.

Confidence and communication

I would also like to emphasize the importance of these “softer” aspects, which I’d place firmly under Data Literacy. And they go hand in hand, in that confidence - built through overcoming challenges during the learning process - is a key base on which to start communicating the insights that can be achieved from data.

  • With confidence, a student will be ready to present their results, and prepared to discuss and defend conclusions.

  • With confidence, a student will be ready to think critically, and challenge and discuss the conclusions of others.

This is truly the basis of the field of scientific enquiry - the search for new knowledge through experiment and observation, where openness to new information and readiness to change opinions based on data are the virtues we all aspire to.

What about coding and computing skills?

Yes indeed, another hot topic! I see coding as orthogonal to the above - the ability to think algorithmically, and to write code to do actual useful tasks, is another great life skill to have. And you can enhance your ability to perform some operations within the sphere of Data Science by writing your own code in (say) Python or R.

But real scientists can (and do) do most of their analysis by using data analysis tools, much as an accountant uses a spreadsheet or book-keeping program.

So I see the ability to code as very valuable, but not a prerequisite for being fully data literate and having advanced data analysis skills.

Especially if you want to work with very large datasets, other aspects of Computer Science can also become relevant, for example use of SQL (and other) databases, and the ability to use distributed or cloud computing services.

To summarize

Data Literacy and Data Science are important life skills, both for jobs in fields like science, business, engineering but also for navigating as a citizen of an increasingly data-driven world.

Do support initiatives to prioritize these in both teaching and the training of future teachers, like the Data Science and Literacy Act. Sign the Letter of Support here.

Check out our PD offerings - DataClassroom offer free workshops for educators where you can learn how to enhance Data Science and Literacy in the classroom, as well as more dedicated training courses.

And if you’d like a demo of the DataClassroom tool, which is designed with pedagogy in mind to support this learning in real environments, just get in touch!