Home Field Advantage in the World Cup

Does the home team score more goals?

Investigate how South American teams perform.


Home field advantage in the world cup

Background

The idea that teams play better at home is one of the most common sports fan axioms out there - but how do the observations of fans hold up to the data?

South America is home to some of the greatest football players of all time, and some of the most passionate fan bases. There is a prevailing belief, though, that teams from South America tend to struggle when playing in Europe. To see if there’s any truth to this, let’s take a look at how Brazil, Argentina, and Uruguay have fared in all of the FIFA World Cups since 1930, when FIFA began hosting the World Cup in its current format. 

The short answer is that they have done incredibly well, accounting for 9 of the 21 FIFA World Cup Championships. Even more interesting, two of those victories came while playing in the host country, and four of the five times the World Cup has been played in South America, a team from South America won. Is the difference between how these teams play at home and abroad “statistically significant?” Let’s take a closer look, and consider exactly what “statistically significant” means along the way. 

Dataset

Each row in this dataset is an observation of a single country playing in a specific world cup tournament for a given year. Data was collected for Argentina, Brazil, and Uruguay. Dataset includes 21 World Cup tournaments from 1930 to 2018. Source of the data was Wikipedia and was verified with The Soccer World Cups.

Variables

  • Team - This categorical variable indicates which country the observation is for. In this dataset it can have the value of Brazil, Argentina, or Uruguay.

  • World Cup Location (Country) - This categorical variable indicates which country hosted the world cup for that observation.

  • World Cup Location (Continent) - This categorical variable indicates on which continent the world cup was hosted for a given year.

  • World Cup Location (Hemisphere) - This categorical variable indicates in which hemisphere the world cup was hosted in a given year. It can have the value of Eastern or Western.

  • Home Continent? - This categorical variable indicates whether or not the team was playing on their home continent or away from their home continent. It has the values of either Yes or No. Year - This numeric variable indicates in which year a given observation of that team occurred.

  • Goals Scored - This numeric variable measures the number of goals scored by a team in a given world cup tournament.

  • Games Played - This numeric variable measures the total number of games played by a team in a given world cup tournament.

  • Goals per Game - This numeric variable measures the mean number of goals scored per game by a team during a specific world cup tournament.

Activity

  1. Let’s start big-picture: Make a graph with the Number of Goals Scored on the y-axis and World Cup Location (Hemisphere) on the x-axis. Does it look like there might be a difference by hemisphere?

2. Now that you’ve made your graph, let’s add a measure of central tendency by checking the “descriptive statistics” box in the control panel to the right of your graph. Add your graph below by clicking the camera icon in the top right of the graph to copy it so you can paste it here.

3. Run a graph-driven hypothesis test by clicking on the button next to the Appearance button. What are the results of a running a t-test on this dataset?

So… what does that p-value mean? Well, p-value is just short for probability value, and represents the probability that we would observe the data in question with a difference in means at least as big as we observe here if the two groups really are (from) the same (population). The lower the p-value, the less likely it is that the results are just due to random chance. When the p-value is very small we can reject the null assumption that the two groups are the same. With that in mind, provide some context for what the p-value above tells us about the data in your own words.

4. Typically, scientists reject the assumption that the groups being studied are the same when the p-value is below .05, because this allows them to say with a certain degree of confidence that they can reject the null hypothesis and that there is a “statistically significant” difference between the groups. For now, let’s get back to the data.

Let’s take a closer look at how the location is related to Number of Goals Scored. Replace World Cup Location (Hemisphere) with (Continent) and place your graph below. Are there any differences between pairs of continents that stand out?

5. Focus on the two locations with the biggest difference by clicking on the “Values” button. Now, exclude all but the two continents with the biggest difference and run a t-test the same way you did before by clicking on the button next to the Appearance button. What is the p-value this time?

6. How would a scientist interpret these results? What do you think football fans have to say about the results? If you were a fan of these teams, how would that influence where you want the next World Cup to be played?

 
 

*Teachers can request an answer key through the form below.

Blake BlazeComment