Witches, Spider-man, and Stranger Things in data
Making sense of cosplay trends in Google search data
Make sense of data trends by digging into the happenings and history behind the numbers. Students interpret graphs to suggest why certain Googlesearches (and likely purchases) spike around Halloween and other times. This is an accessible data science activity for grades 9-12+.
Background
Each Halloween season , retailers try and predict which costumes will be most popular so that they can fill their shelves so consumers can purchase them. Predicting the future requires understanding the past, which scientists do by looking at data. If we can understand which and why costumes become popular, chances increase for making money selling the costumes that the people want!
In this dataset, explore two different costume trends (witch, stranger things) in order to understand the trend of a third (spider-man). Create a purchase suggestion to a company around what this year’s Halloween’s needs may be!
Dataset
Data were found with Google Trends which makes Google search data available to the public. Data is available on Google Trends from the year 2004 - present, but this activity we created a subset of data from 2007-2023. The dataset shows number of Google searches for “Witch”, “Spider-Man”, and “Stranger Things” over that time period.
Variables
Month/Year - This variable has been coded as info, so it cannot be used in graphing. It describes the month and year the search term was summated.
Days since 1/1/2007 - This numeric variable counts the number of days from the beginning of the dataset (January 1, 2007) to the start of the month listed. Measured in days.
Costume - This categorical variable describes the search term (or costume) entered into google. It has the categorical value of Witch, Spider-Man, or Stranger Things.
Search Rate (per million) - This numeric variable describes the number of searches entered into google for that costume during the observation month per million total searches.
Activity
Get to know the data
Take a look at the searches for all costume types over time. Set Days since 1/1/2007 on the x-axis and searches (per million) on x-axis. Differentiate between costume types by setting it as the z-variable. Screenshot your graph below:
2. To get some clarity on what we’re looking at, create a subset of data by filtering out only one costume at a time. Filter the data so that we are only looking at witch data (set filter to costume:equals:witch).
Your graph should now show you a graph looking at the number of searches (y-axis) over time (x-axis) for witch costumes only (z-axis). Screenshot your graph below:
3. Float on various peaks on your graph to note the date (month/year) those peaks fall. What patterns can you determine about the surges and the month/year they fall? What inference can you make about the reason?
4. Let’s pivot to looking at the Stranger Things search. Go back and filter all costume data for Stranger Things costumes. Create a graph which allows you to look at the number of searches (y-axis) over time (x-axis) for witch costumes only (z-axis). Screenshot your graph below:
Hover with your pointer on the main peaks on this graph. Below, record the peak month/year. Additionally, write your reflection on what was happening in history to create this pattern. Note: below is a blurb from wikipedia about the release dates of each season for Stranger Things
(Stuck? Some things you can talk about: is Stranger Things data the same as the witch data? What date do the peaks start and why might that be true? Why might the peaks decrease and then increase again?)
5. Is there a tell in this data about which season was the most and least popular? What data do you use as evidence of this inference?
6. Let’s take a look at our final costume search, Spider-man! Filter all costume data for Spider-man costumes. Create a graph which allows you to look at the number of searches (y-axis) over time (x-axis) for witch costumes only (z-axis). Screenshot your graph below:
7. Using the previous trends from witch and Stranger Things searches, what might you predict is the reason behind the peaks or patterns for Spider-man costume searches? Note: Below is a summary of spiderman films from 2007-2023 to help you with your inferences. (The list below does not include movies with spider-man cameos, like Captain America: Civil War)
8. Lastly, look at all the costume data at once again: remove all filters of costume data, and once more screenshot the searches over time graph for all costumes (z-axis) below:
9. What kind of information does this graph tell you about costume popularity?
10. What was the benefit of subsetting the data and looking at the data for each search term by itself?
11. What costumes should Halloween stores invest in each Halloween (because they are the most popular), and which should they have available all year round? Write your answer as a recommendation to the company, and use data as support for your recommendation.