Introduction

Did outsider win more often on Saturday? We have seen Saturdays when a few outsiders won on the same day. An impression of special Saturday comes to some of our mind. We may think that perhaps Saturday is too busy for most owner to join the racing event so weaker horses are enrolled and made the race competitive and harder to predict. However, if the race is harder to predict, it should be reflected in the odds and there should be no extreme outsider.

So, is Saturday really that unusual? The aim of this notebook is to investigate how different is the odds for Saturday from Wednesday and Sunday.

As a extra information for our analysis, in Hong Kong races are usually fixed on Wednesday and Sunday. The race will move from Sunday to Saturday in some situation such as grass maintainance, overseas broadcast or festivals, etc.

We will be using R for the analysis.

Loading the data

We will first load the library and the data sets. We will be using the subset of tidyverse packages and the data is made public in Kaggle.

library(dplyr)
library(readr)
library(tidyr)
library(stringr)
library(ggplot2)
library(lubridate)

race_result_horse <- read_csv("../input/race-result-horse.csv")
race_result_race <- read_csv("../input/race-result-race.csv")

# shortcut for setting up theme for plotting
my_theme <- function(){
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))
}

The first thing is to find out the weekday of the each race.

race_result <- race_result_race %>%
  # mutate(race_weekday = weekdays(race_date)) %>%
  mutate(race_weekday = wday(race_date, label = TRUE, abbr = FALSE)) %>%
  select(race_weekday, race_id, race_class) %>%
  right_join(race_result_horse) %>%
  filter(!is.na(win_odds)) # Remove all withdrawn horse

To fulfill our curiosity, let’s see the odds distribution of winning horse for different weekday before going deep into details.

race_result_winner <- race_result %>% 
  filter(finishing_position == 1) %>% # Ignore DH for easy analysis
  filter(race_weekday %in% c("Saturday", "Sunday", "Wednesday"))

ggplot(race_result_winner) + 
  geom_density(aes(fill = race_weekday, x = win_odds), alpha = 0.2) + 
  scale_x_log10() + # Take log to reduce skewness 
  labs(title = "The win odds of the winner of each race", 
       fill = "Weekday",
       x = "Log(odds)", y = "density") +
  my_theme() + 
  geom_segment(
    aes(x = 50, y = 0.3, xend = 40, yend=0.2),
    color = "blue", size = 2,
    arrow = arrow(length = unit(0.03, "npc"))
  )

From the winning odds density plot, it seems there are actually higher proportion of outsider winner on Saturday than on Wednesday/Sunday. Can we conclude this right away or we are just missing something?

Odds distribution

While we are noticing higher proportion of outsider winner on Saturday, there is higher proportion of favourite winner as well. Are these just a consequence of odds being skewed two-sided?

We can compute the variance (or standard deviation) of odds (or log odds) for each race.

race_day_odds_variance <- race_result %>% 
  filter(race_weekday %in% c("Saturday", "Sunday", "Wednesday")) %>%
  group_by(race_id, race_weekday, race_class) %>%
  summarise(log_odds_std = sd(log(win_odds)))

ggplot(race_day_odds_variance) + 
  geom_density(aes(fill = race_weekday, x = log_odds_std), alpha = 0.2) + 
  labs(title = "Distribution of standard deviation of log odds for each race", 
       fill = "Weekday",
       x = "Sd(Log(odds))", y = "density") +
  my_theme() 

There are not much high standard deviation data point for Saturday than Sunday, which doesn’t support our hypothesis. Let’s plot a histogram of odds to reconfirm this.

ggplot(filter(race_result, race_weekday %in% c("Saturday", "Sunday", "Wednesday"))) + 
  geom_histogram(aes(fill = race_weekday, x = win_odds), alpha = 0.2, position = "identity") + 
  labs(title = "Distribution of standard deviation of log odds for each race", 
       fill = "Weekday",
       x = "Win odds", y = "density") +
  my_theme() 

Again, there are not obvious difference in odds distribution between Saturday, Sunday or Wednesday.

Summary

It seems that there are actually more outsider winner on Saturday.

However, there could be many other factors. I will leave it to the reader to do further analysis. For example, is it because of the race class distribution? Is it because there were usually no superstar in the races?

Enjoy!

Click Here to read the previous post if you did not