Is Simone Biles the GOAT?

A look at gymnastics greatness in data


Background

There have been many articles recently claiming that Simone Biles, a gymnast who made her olympic debut in 2008, as the “Greatest of All Time” (or GOAT). Her own webpage displays the claim, and it is hard to find any sources really refuting it. But how can so many make such a bold statement with such confidence? Well she certainly has an impressive gymnastics resume that is piled high with accolades. She is the only American gymnast in history to win eight national titles. Her thirty seven medals at world championship or olympic events are the most of any gymnast in history, male or female. She also has five skills named after her in the official Code of Points, meaning she was the first to ever perform those skills in major international competition. Despite all those impressive achievements, at DataClassroom we were left wondering, how would her performances stack up if we explored data across the modern era of women’s gymnastics? Would the data alone show evidence for or against the claims that Simone Biles is the greatest of all time ?

The greatest of all time conversation revolves around winners of the all around competition.  These scores allow for the top gymnasts from all over the world to compete in every event, and have their scores combined.  This total gives us a look at which gymnast is the best across all gymnastic skills. 

With all the possible variables, directly comparing gymnastic performances across years can get tricky. Especially when you factor in that the entire scoring system changed in 2006 from each event maxing out at 10 points, to an open ended scoring system based on difficulty of routine.  Additionally, there are different judges from competition to competition. Check out this graph showing the total scores for the Women’s Individual All-Around competition from 2008 to 2023:

We know for a fact that gymnasts are executing much more difficult routines every year, yet their raw overall scores have been decreasing since the new scoring was implemented.  This is one indication that comparing raw scores across years may not be the best way to compare gymnastic performances.  Consider that many variables change each year - the judges, the combination of women, the routines themselves- a less biased method of comparing scores across years requires that we standardize this data so that scores from each can be better compared. In other words, we want to be able to see how good a gymnast was, in relation to their competition each year, rather than focusing on the raw scores themselves.

As preparation for you to look for evidence for or against the claim of Simone Biles as the GOAT,  we have laid out some background for you. First we have 1) a quick explainer on how women’s gymnastics is scored in general terms.  Next, we discuss 2) a good way to standardize this data (by converting to units of Standard Deviation va) and why. From there, the activity begins. 

Check it out, or skip ahead to the dataset. You do not necessarily need to dive into these explainers to do the activity, but we think you may find them helpful.


Women’s Gymnastics Explainer:

(Based on olympic competition rules, referenced from nbcolympics.com, June 2024)

Each gymnast can compete under three different competitions:

  1. Individual - highest score on any one event

  2. All - Around (must compete in all four events - Vault, Uneven Bars, Balance Beam, and Floor Exercise) for a total score. Gymnasts earn individual metals for their total scores. Only two gymnasts from each country can qualify for the top 24 Individual All-Around Finals slots.

  3. Team - Three gymnasts compete in each event for a total score which goes towards their team (in this case, country) winning a medal. If the team wins a medal, all gymnasts win a medal even if they only contributed to a single event. 


For this activity, we pulled all data from the “All Around” final results from each year, as that gave us a score in each event for the athlete.  There is a limitation in this choice since only two women from each country are allowed to compete, some athletes who scored high enough to compete in the all around are not included because they were the third or fourth best from their country.  This means that the mean score for each event may be higher in the individual events.

The scores are determined by both the difficulty and the execution. Difficulty scores start at 0 and increase as the elements are completed.  Execution scores start at 10, and decrease based on penalty.  

Final Score = [Difficulty + Execution] - Any neutral deductions

For more nitty-gritty on scoring specifics, check out this page at nbcolympics.com


Standardization-to-Mean Explainer:

As we saw in the graph above, the raw data isn’t the best way to compare scores across years. We need to standardize the data from year to year. To do this we  standardized to a mean of zero for each year!

Important math moment - how it’s calculated:



Z is the standardized score and x is the mean score within a given year.




By taking the raw score, subtracting the mean for each year, and then dividing by the standard deviation of scores for that year, we get a value which represents how much above or below the average score that particular score is within the year that it happened. After standardization, the score has then been converted into units of standard deviation. With this kind of standardization, 0.0 is exactly the  score for the average performance in the all around competition in that year. This means that a standardized score of 0 is average, 1 is good, and 2 is really amazing. Anything above 3 is a full three standard deviations above the average - so good it’s almost unheard of. We would only expect it to occur 0.1% of the time. 

Image by M. W. Toews, “A plot of normal distribution”. CC BY 2.5

This strategy was first applied to our exact scenario by Andrew Doss with this interesting article “What makes Simone Biles the GOAT?”. We  have modified these data to exclude the previous scoring system (prior to 2008) and also to include the most recent data we have through 2023.


Dataset

Data Gathered from the “All - Around” scores found on gymnasticsresults.com from the years 2008 - 2023.  Any years not featured on their homepage can be found in their “archive” section, by clicking on the year and then the event (Worlds or Olympics), and scrolling down to the All-Around results for women’s gymnastics.  Each year represents data from either the World Championship Competition or the Olympics. Note that the 2020 Olympics took place in the year 2021.  In our dataset, we have it listed as 2020 to avoid confusion. There is no data in the dataset for the 2021 world championships because it did not include Olympic competitors.

Variables


Gymnast Name - this info variable lists the name of each gymnast, as listed on the final results sheet.

Country - this categorical variable indicates the country represented by that gymnast. Each country is listed by its three-letter abbreviation. 

Year - This categorical variable represents the year the competition was completed. Values include all years spanning from 2008 - 2023 (except 2021).

Olympics or World Champ - this categorical variable describes the type of event. Variables include Olympics and World Championship.

Std. Vault Score - this numeric variable lists the final totaled score for the vault event, in units of standard deviation from the mean.  

Std. Uneven Bars  - this numeric variable lists the final totaled score for the Uneven Bars event, in units of standard deviation from the mean.  

Std. Balance Beam  - this numeric variable lists the final totaled score for the balance beam event, in units of standard deviation from the mean.  

Std. Floor Exercise - this numeric variable lists the final totaled score for the floor exercise event, in units of standard deviation from the mean.  

Std. Total Score - this numeric variable lists the final totaled score for all four events, in units of standard deviation from the mean.  


Activity

Part 1 - Find the top performers in the Individual events.

  1. Use the Make a Graph tool to investigate the top performer on the Uneven Bars for all years in the dataset.  Set Uneven Bars as your y-variable for a dot-plot.  Screenshot your graph below.

2. Move your cursor over the top dots to see the names and countries for any datapoint.  Who are the top five performances in the dataset? What value tells you that? Note: if you’re having trouble reading any data points, you can increase the jitter to separate them out a bit.

3. Use the Make a Graph tool to investigate the top performer on the Vault for all years in the dataset.  Set Vault as your y-variable for a dot-plot.  Screenshot your graph below.

4. Move your cursor over the top dots to see the names and countries for any datapoint.  Who are the top five performances for Vault in the dataset? 

5. Use the Make a Graph tool to investigate the top performer on the Balance Beam for all years in the dataset.  Set Balance Beam as your y-variable for a dot-plot.  Screenshot your graph below.

6. Move your cursor over the top dots to see the names and countries for any datapoint.  Who are the top five performances for Balance Beam in the dataset? 

7. Use the Make a Graph tool to investigate the top performer on the Floor Exercise for all years in the dataset.  Set Floor as your y-variable for a dot-plot.  Screenshot your graph below.

8. Move your cursor over the top dots to see the names and countries for any datapoint.  Who are the top five performances for Floor in the dataset? 

Part 2 - Does a strong performance in one event predict a strong performance in the other events? 


9.  Make a correlation matrix  with these variables:

  • Vault (SD)

  • Uneven Bars (SD)

  • Balance Beam (SD)

  • Floor Exercise (SD)

  • Total Score (SD)

…and screenshot the matrix below:

10. Which two events are the least correlated with each other? 

11. Which two events are the strongest predictor of a gymnast’s total score? 

12. Make a scatter plot graph with one of those variables (your answer to #11) on X and one on Y. Add a line of best fit, and screenshot your graph below: 

13. Describe the correlation between these two events you just graphed. Be sure to include the r squared or r value in your description. Tell us what this means in plain language.


Part 3 - Just how well has Simone Biles scored? 


14.  Make a scatter plot with Floor on X and Beam on Y. Add Total Score on Z. The more red the data point the better overall performance it was. We would expect these to be in the upper right of the graph.  Screenshot your graph below:

15.  Hover on the points in the upper right to see which gymnasts they represent. What pattern do you notice? 

Which gymnast(s) owns the top five highest overall standardized scores in the dataset? 

16. Now plot Total Score on Y with nothing on X. Show descriptive stats, including the mean and standard deviation. Screenshot your graph below:

17.  Hover on the highest point in the graph. How many standard deviations above the mean is Simone Biles’s best performance to date? 

 
 

*Teachers can request an answer key through the form below.