Cape Epic 2025 Analysis

Blog
Sport
Cycling
Published

March 28, 2025

Epic 2025 Analysis

Here is my take on some analyses of the Cape Epic results, through the eyes of an epidemiologist-competitor, if you will.

Intention

As a participant and epidemiologist myself, I was interested in a new analysis challenge. This is one of those things of when you are chatting in the car on the way back from Stage 7 saying, “flip, I wonder how many guys got chicked on that stage”, “who had the most consistent race”, and instead of just forgetting about it, I took it a bit further. While some of these questions are answered just by assuming what happened to the top teams on TV, I also wanted to place us mere-mortals into context. As a participant myself, these are some of the questions I had.

In summary, I wanted to know: - the “real” mortality rate of riders - the average time of categories - A tool to track your position over the stages compared to another team. - How well does prologue time predict overall performance? - The number of riders per capita from each country.

While other, more in depth, analyses are possible, I am limited by the data available online and my real job.

Data Collection

Data was scraped from the results page on the Cape Epic website using Selenium. Numbers may vary slightly from official ones reported by the Cape Epic. At the time of scraping, some data on riders was not available.

The real mortality rate

The Epic organisers usually report the drop out rate as the drop out rate of teams. While the epic is a team event, the drop out rate of individuals is much less. I present the survival rate of individuals and teams.

Table 1: Comparison of the reported Survival Rate reported by Epic and by the actual survival of individual riders
(a) Mortality rate by team

Stage

Number of Riders

Survival from Start (%)

Survival from last Stage (%)

Prologue

738

100.0

Stage 1

707

95.8

95.8

Stage 2

696

94.3

98.4

Stage 3

636

86.2

91.4

Stage 4

586

79.4

92.1

Stage 5

567

76.8

96.8

Stage 6

545

73.8

96.1

Stage 7

544

73.7

99.8

(b) Mortality rate by rider

Stage

Number of Riders

Survival from Start (%)

Survival from last Stage (%)

Prologue

1,483

100.0

Stage 1

1,446

97.5

97.5

Stage 2

1,437

96.9

99.4

Stage 3

1,367

92.2

95.1

Stage 4

1,297

87.5

94.9

Stage 5

1,273

85.8

98.1

Stage 6

1,245

84.0

97.8

Stage 7

1,241

83.7

99.7

How many riders get chicked?

Getting chicked is, colloquially, when a male rider is slower than a female rider. The Cape Epic is one of the few events where Male and Female riders ride the exact same course (and at fairly similar times), unlike Cape Town Cycle Tour or Tour De Femmes. There is emerging data that females may be better at handling fatigue than males.

Table 2: Comparing how many riders get chicked by the fastest and slowest UCI women
(a) Proportion of riders who got chicked by the fastest UCI team

Category

Prologue

Stage 1

Stage 2

Stage 3

Stage 4

Stage 5

Stage 6

Stage 7

Overall

UCI Men

4.5%

6.8%

6.8%

9.1%

6.8%

9.1%

18.2%

9.1%

9.1%

Individual Finishers

92.2%

92.2%

90.9%

90.9%

87.0%

87.0%

88.3%

88.3%

88.3%

Open Men

93.2%

95.6%

93.2%

95.7%

92.3%

91.6%

96.1%

86.9%

86.9%

UCI Women

94.1%

94.1%

94.1%

94.1%

94.1%

94.1%

94.1%

94.1%

94.1%

Masters Men

97.3%

96.2%

97.3%

97.2%

96.6%

96.6%

97.7%

96.0%

96.0%

Grand Masters Men

99.0%

98.1%

97.1%

99.0%

97.0%

97.0%

99.0%

96.0%

96.0%

Great Grand Masters Men

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

Masters Women

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

Mixed

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

98.4%

98.4%

Open Women

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

100.0%

(b) Proportion of riders who got chicked by the slowest UCI team

Category

Prologue

Stage 1

Stage 2

Stage 3

Stage 4

Stage 5

Stage 6

Stage 7

Overall

UCI Men

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

UCI Women

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

Open Men

25.5%

44.4%

42.2%

52.2%

40.4%

43.5%

49.0%

37.9%

37.9%

Individual Finishers

39.0%

58.4%

54.5%

74.0%

51.9%

55.8%

59.7%

50.6%

50.6%

Masters Men

49.2%

67.6%

64.3%

77.9%

66.7%

69.5%

74.1%

60.9%

60.9%

Grand Masters Men

50.5%

69.5%

69.5%

77.1%

72.3%

73.0%

73.7%

65.7%

65.7%

Great Grand Masters Men

60.9%

86.4%

77.3%

90.5%

84.2%

78.9%

89.5%

84.2%

84.2%

Mixed

71.4%

83.6%

83.8%

84.6%

84.1%

84.1%

85.5%

80.6%

80.6%

Open Women

73.3%

80.0%

80.0%

80.0%

86.7%

85.7%

93.3%

93.3%

93.3%

Masters Women

80.0%

100.0%

93.3%

100.0%

85.7%

92.9%

100.0%

92.9%

92.9%

Track the performance of your team over the stages

If you are interested to see how you paced compared to another team, or the pros, you can use the app below.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| echo: false
#| messages: false
#| viewerHeight: 1200

library(shiny)
library(tidyverse)
library(DT)

# Load from GitHub
data_url <- "https://raw.githubusercontent.com/bridaybrummer/study_stats_site/main/data/df_full.csv"
df_full <- read_csv(data_url)

# Treat stage as ordered factor only once
stage_levels <- c(
    "Prologue", "Stage 1", "Stage 2", "Stage 3", "Stage 4",
    "Stage 5", "Stage 6", "Stage 7", "Overall"
)

category_colors <- c(
    "UCI Men" = "#1f77b4", # Blue
    "Individual Finishers" = "#7f7f7f", # Grey
    "Masters Men" = "#2ca02c", # Green
    "Open Men" = "#ff7f0e", # Orange
    "Grand Masters Men" = "#9467bd", # Purple
    "UCI Women" = "#e377c2", # Pink
    "Mixed" = "#bcbd22", # Olive
    "Great Grand Masters Men" = "#8c564b", # Brown
    "Open Women" = "#17becf", # Teal
    "Masters Women" = "#d62728" # Red
)

df_full <- df_full %>%
    mutate(stage = factor(stage, levels = stage_levels))

ui <- fluidPage(
    titlePanel("Team Position by Stage"),
    tags$head(
        tags$style(HTML("
            div.top-left {
            float: left;
            }
            div.dataTables_filter {
            text-align: left !important;
            }
        "))
    ),
    tags$script(HTML("
    setTimeout(function() {
      document.querySelector('h4').innerText = 'Search for teams and riders';
    }, 4000);
  ")),
    sidebarLayout(
        sidebarPanel(
            h4("Welcome!"),
            helpText(
                "This app visualises stage-by-stage team positions.",
                "Select your team and compare it to another — e.g., Like the Ladies African Jersey winners (Team 62)",
                "If you can't remember your team number, search the table to find team numbers by name or team.",
                "Only categories of the selected teams are shown (others greyed out)."
            ),
            numericInput("selected_team", "Enter a team number (dashed):",
                value = 188,
                min = min(df_full$team_number),
                max = max(df_full$team_number)
            ),
            numericInput("selected_team_2", "Compare With Team Number(dotted):",
                value = 62,
                min = min(df_full$team_number),
                max = max(df_full$team_number)
            ),
            verbatimTextOutput("category_text")
        ),
        mainPanel(
            h5("Search for a Rider or Team:"),
            helpText("E.g., search 'HoneyComb' or your team name."),
            DTOutput("team_table"),
            plotOutput("team_plot", height = "500px")
        )
    )
)

# Server
server <- function(input, output) {
    output$team_table <- renderDT({
        df_full %>%
            select(team_number, team, rider, category) %>%
            distinct() %>%
            datatable(
                options = list(
                    pageLength = 5,
                    lengthChange = FALSE,
                    dom = '<"top-left"f>tip', # 'f' = search box, now wrapped in a class
                    scrollY = "200px",
                    scrollCollapse = TRUE
                ),
                rownames = FALSE,
                class = "compact stripe"
            )
    })

    output$team_plot <-
        renderPlot({
            req(input$selected_team, input$selected_team_2)

            team1 <- input$selected_team
            team2 <- input$selected_team_2

            selected_team <- input$selected_team
            selected_team_2 <- input$selected_team_2


            selected_category_1 <- df_full %>%
                filter(team_number == selected_team) %>%
                pull(category) %>%
                unique()

            selected_category_2 <- df_full %>%
                filter(team_number == selected_team_2) %>%
                pull(category) %>%
                unique()

            cat1 <- df_full %>%
                filter(team_number == team1) %>%
                pull(category) %>%
                unique()

            cat2 <- df_full %>%
                filter(team_number == team2) %>%
                pull(category) %>%
                unique()


            selected_cats <- unique(c(cat1, cat2))

            selected_categories <- unique(c(selected_category_1, selected_category_2))

            output$category_text <- renderText({
                paste("Category of Team 1:", cat1, "\nCategory of Team 2:", cat2)
            })

            top_category <- df_full %>%
                group_by(category, stage) %>%
                slice_min(position, with_ties = FALSE) %>%
                ungroup()

            top_ids <- top_category %>%
                mutate(stage = as.character(stage)) %>%
                transmute(top_id = paste(team_number, stage)) %>%
                pull(top_id)

            df_plot <- df_full %>%
                mutate(
                    focus = case_when(
                        team_number == selected_team ~ "Team 1",
                        team_number == selected_team_2 ~ "Team 2",
                        TRUE ~ "Other"
                    ),
                    # top_team = paste(team_number, stage) %in% top_ids,
                    category_plot = ifelse(category %in% selected_categories, category, "Other"),
                    alpha_level = ifelse(focus %in% c("Team 1", "Team 2"), 1,
                        ifelse(category %in% selected_categories, 0.2, 0.05)
                    ),
                    stage = factor(stage, levels = c(
                        "Prologue", "Stage 1", "Stage 2", "Stage 3", "Stage 4",
                        "Stage 5", "Stage 6", "Stage 7", "Overall"
                    ))
                ) %>%
                mutate(
                    stage_chr = as.character(stage),
                    focus = case_when(
                        team_number == team1 ~ "Team 1",
                        team_number == team2 ~ "Team 2",
                        TRUE ~ "Other"
                    ),
                    color_group = ifelse(category %in% selected_cats, category, "Other"),
                    color_val = case_when(
                        color_group == "Other" ~ "grey80",
                        color_group == selected_category_1 ~ category_colors[selected_category_1],
                        color_group == selected_category_2 ~ category_colors[selected_category_2],
                    ),
                    alpha_val = case_when(
                        focus %in% c("Team 1", "Team 2") ~ 1,
                        color_group == "Other" ~ 0.1,
                        TRUE ~ 0.3
                    ),
                    size_val = case_when(
                        focus %in% c("Team 1", "Team 2") ~ 1.2,
                        TRUE ~ 0.4
                    ),
                    linetype_val = case_when(
                        focus == "Team 1" ~ "dashed",
                        focus == "Team 2" ~ "dotted",
                        TRUE ~ "solid"
                    )
                ) %>%
                mutate(
                    stage = factor(
                        stage,
                        levels = stage_levels
                    )
                )


            ggplot(df_plot, aes(x = stage, y = -position, group = team)) +
                geom_line(aes(
                    color = color_val, alpha = alpha_val,
                    linetype = linetype_val, size = size_val
                )) +
                geom_point(aes(color = color_val, alpha = alpha_val), size = 0.5) +
                ggrepel::geom_text_repel(
                    data = df_plot %>%
                        filter(team_number %in% c(team1, team2)) %>%
                        group_by(team_number) %>%
                        filter(as.numeric(factor(stage_chr, levels = stage_levels)) == max(as.numeric(factor(stage_chr, levels = stage_levels)))),
                    aes(label = paste0("Team ", team_number)),
                    nudge_x = 0.2,
                    direction = "y",
                    size = 4,
                    hjust = 0,
                    segment.color = "grey50"
                ) +
                scale_size_identity() +
                scale_alpha_identity() +
                scale_linetype_identity() +
                scale_color_identity() +
                scale_x_discrete(
                    # create more space on the right side of the plot
                    expand = expansion(mult = c(0.1, 0.2))
                ) +
                labs(
                    x = "Stage",
                    y = "Position (higher is better)",
                    title = "Team Position by Stage",
                    subtitle = "Selected categories are shown in a bold dot-dashed line. The rest of the category of the selected teams are shown in a thin solid line in the same colour. All other categories are in a faint grey line."
                ) +
                theme_minimal() +
                theme(legend.position = "bottom")
        })
}

# Run app
shinyApp(ui, server)

Average time of categories

As an overall category, the average time of each category is shown below. This unfortunately, doesn’t do a good job of showing the other races that were happening within these categories.

Figure 1: Probability density function of the cumulative time of each category. The x-axis is the cumulative time in seconds and the y-axis is the probability density function. The lines are smoothed using a Gaussian kernel density estimate.
Table 3: Table of the mean time of each category. The table shows the mean time in minutes of each category and stage.

Characteristic

Prologue
N = 441

Stage 1
N = 441

Stage 2
N = 441

Stage 3
N = 441

Stage 4
N = 441

Stage 5
N = 441

Stage 6
N = 441

Stage 7
N = 441

UCI Men

66 (4)

255 (19)

146 (9)

227 (18)

204 (14)

285 (28)

260 (26)

103 (8)

UCI Women

81 (6)

313 (23)

179 (12)

288 (23)

255 (20)

354 (24)

323 (26)

126 (9)

Open Men

89 (14)

359 (55)

197 (28)

358 (80)

288 (45)

401 (64)

371 (62)

137 (21)

Individual Finishers

92 (15)

378 (67)

208 (35)

390 (88)

293 (45)

405 (60)

370 (58)

139 (21)

Grand Masters Men

95 (11)

385 (46)

212 (27)

413 (89)

312 (41)

435 (58)

404 (59)

149 (20)

Masters Men

98 (15)

397 (63)

216 (32)

410 (89)

319 (47)

444 (67)

414 (66)

151 (22)

Great Grand Masters Men

101 (11)

406 (42)

225 (27)

452 (84)

334 (41)

462 (59)

439 (58)

160 (19)

Open Women

105 (11)

419 (53)

229 (24)

455 (87)

343 (40)

477 (65)

452 (60)

169 (21)

Mixed

106 (16)

421 (59)

230 (31)

444 (93)

337 (43)

467 (63)

436 (61)

160 (20)

Masters Women

110 (13)

427 (46)

234 (26)

452 (79)

353 (36)

484 (49)

456 (47)

167 (16)

1Mean (SD)

Who what and where

Which country had the highest per capita number of epic riders from their country?

Table 4: Table of riders per 1 million population of their country

Country

Number of Riders

Population

Riders per 1 million population

Andorra

4

80,856

494.71

Isle of Man

1

84,165

118.81

South Africa

666

63,212,384

105.36

Switzerland

69

8,888,093

77.63

Malta

4

552,747

72.37

Costa Rica

23

5,105,525

45.05

Namibia

13

2,963,095

43.87

New Zealand

19

5,223,100

36.38

Belgium

37

11,787,423

31.39

Mauritius

3

1,261,041

23.79

Slovenia

4

2,120,461

18.86

Spain

83

48,347,910

17.17

Netherlands

30

17,877,117

16.78

Portugal

16

10,578,174

15.13

Czechia

16

10,864,042

14.73

Estonia

2

1,370,286

14.60

Austria

12

9,131,761

13.14

Latvia

2

1,877,445

10.65

Norway

5

5,519,594

9.06

Germany

74

83,280,000

8.89

Lesotho

2

2,311,472

8.65

Eswatini

1

1,230,506

8.13

Dominican Republic

9

11,331,265

7.94

Croatia

3

3,859,686

7.77

Denmark

4

5,946,952

6.73

Sweden

7

10,536,632

6.64

Australia

15

26,658,948

5.63

Finland

3

5,583,911

5.37

Canada

21

40,097,761

5.24

Singapore

3

5,917,648

5.07

United Kingdom

30

68,350,000

4.39

Israel

4

9,756,600

4.10

Italy

24

58,993,475

4.07

Botswana

1

2,480,244

4.03

Guatemala

7

18,124,838

3.86

Greece

4

10,405,588

3.84

France

26

68,287,487

3.81

Slovakia

2

5,426,740

3.69

Argentina

16

45,538,401

3.51

Lithuania

1

2,871,585

3.48

Zimbabwe

5

16,340,822

3.06

Ecuador

5

17,980,083

2.78

Hong Kong

2

7,536,100

2.65

Venezuela

6

28,300,854

2.12

Brazil

40

211,140,729

1.89

Ireland

1

5,307,600

1.88

United States of America

59

334,914,895

1.76

Peru

4

33,845,617

1.18

Poland

4

36,687,353

1.09

Chile

2

19,658,835

1.02

United Arab Emirates

1

10,483,751

0.95

Colombia

4

52,321,152

0.76

Angola

2

36,749,906

0.54

Mexico

5

129,739,759

0.39

Cameroon

1

28,372,687

0.35

Philippines

4

114,891,199

0.35

Nepal

1

29,694,614

0.34

Mozambique

1

33,635,160

0.30

Taiwan

4

6

Table 4 is sorted by the highest per capita of riders. Very small countries naturally skew results, however, teams from Switzerland, Namibia, Costa Rica and New Zealand deserve a special mention

How well does prologue result predict overall result?

We all know the saying “You don’t win on the prologue, but you can lose”, to me this is essentially saying, your prologue doesn’t predict your overall performance.

Figure 2: Scatter plot of prologue vs Overall time.

In Figure 2, the red line is the 1:1 line, where the prologue time is equal to the overall time. Those on the right of the red line indicate that riders that had a better placement overall than their prologue position. Those on the left of the red line indicate that riders that had a worse placement overall than their prologue position.

Table 5: Table of the prediction accuracy of the prologue time to predict overall time. The table shows the number of teams that were predicted correctly within a certain tolerance.

Stage

Spearman's Correlation

R^2

Prologue

1.00

1.00

Stage 1

0.95

0.90

Stage 2

0.93

0.85

Stage 3

0.92

0.84

Stage 4

0.93

0.84

Stage 5

0.91

0.81

Stage 6

0.91

0.80

Stage 7

0.90

0.78

Overall

0.94

0.87

Table 5 shows two metrics , Spearman’s correlation and R^2. These two metrics show that prologue positions and overall position are quite strongly correlated, although this correlation decreased over the stages. Neither of these metrics account for small changes in position, which may not show well when analyzing all riders, but for the individual riders, it is quite important.”

Figure 3: The accuracy of predicting the position of a team based on the prologue time.

Figure 3 reports on the accuracy of predicting the position of a team based on the prologue time. The X-axis is the tolerance of the prediction. A tolerance of 100 means that the predicted position is within 100 positions of the actual position. The Y-axis is the proportion of teams that were predicted correctly. Accuracy in this case is calculated as the proportion of correct predictions.

../after.html