Housing Affordability Gap in the GTA¶

Author: Somaya Alhadi
This notebook explores the widening affordability gap in the Greater Toronto Area (GTA) by analyzing the trends in:

  • Housing Price Index (HPI),
  • Composite Benchmark Prices,
  • Median After-Tax Income (individual),
  • And their interactions over time.

We’ll also calculate growth rates, plot dual-axis trends, test stationarity, and visualize affordability metrics.

Step 1: Load Required Libraries¶

We begin by loading the necessary libraries for data import and manipulation. These include:

  • readxl for reading Excel files.
  • dplyr for data wrangling.
  • ggplot2 for visualization.
  • zoo for handling missing values.
In [1]:
# Load necessary packages
library(readxl)
library(tidyverse)
library(lubridate)
library(dplyr)
library(zoo)
library(ggplot2)
library(tseries)
Warning message:
"package 'readxl' was built under R version 4.3.3"
Warning message:
"package 'tidyverse' was built under R version 4.3.3"
Warning message:
"package 'ggplot2' was built under R version 4.3.3"
Warning message:
"package 'tidyr' was built under R version 4.3.3"
Warning message:
"package 'readr' was built under R version 4.3.3"
Warning message:
"package 'purrr' was built under R version 4.3.1"
Warning message:
"package 'dplyr' was built under R version 4.3.3"
Warning message:
"package 'forcats' was built under R version 4.3.1"
Warning message:
"package 'lubridate' was built under R version 4.3.3"
── Attaching core tidyverse packages ──────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning message:
"package 'zoo' was built under R version 4.3.1"

Attaching package: 'zoo'


The following objects are masked from 'package:base':

    as.Date, as.Date.numeric


Warning message:
"package 'tseries' was built under R version 4.3.1"
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 

Step 2: Import Excel Data¶

We define the file path and specify the sheet names we want to load. Using lapply(), we load all the specified sheets into a list named all_data, and then assign each sheet its corresponding name.

In [2]:
# Set Excel file path
file_path <- "your_dataset.xlsx"  # path hidden for privacy

# Define sheet names to load
sheet_names <- c("HPI-SA-GTA", "On-afterTaxIncom", "BankRate-BOC", "PolicyRate-BOC", "Prate-FRED")

Step 3: Extract Individual Sheets¶

Now, we extract each dataset from the all_data list into its own variable so that we can work with them individually.

In [3]:
# Load all sheets into a list
all_data <- lapply(sheet_names, function(sheet) {
  read_excel(file_path, sheet = sheet)
})

# Name each sheet properly
names(all_data) <- sheet_names
Warning message:
"Expecting numeric in B1609 / R1609C2: got 'Bank holiday'"
Warning message:
"Expecting numeric in B1807 / R1807C2: got 'Bank holiday'"
Warning message:
"Expecting numeric in B1869 / R1869C2: got 'Bank holiday'"
Warning message:
"Expecting numeric in B1870 / R1870C2: got 'Bank holiday'"
Warning message:
"Expecting numeric in B1874 / R1874C2: got 'Bank holiday'"
Warning message:
"Expecting numeric in B2025 / R2025C2: got 'Bank holiday'"
Warning message:
"Expecting numeric in B2068 / R2068C2: got 'Bank holiday'"
Warning message:
"Expecting numeric in B2130 / R2130C2: got 'Bank holiday'"
Warning message:
"Expecting numeric in B2134 / R2134C2: got 'Bank holiday'"
Warning message:
"Expecting numeric in B2394 / R2394C2: got 'Bank holiday'"
In [5]:
HPI_GTA <- all_data[["HPI-SA-GTA"]]
On_ATincome <- all_data[["On-afterTaxIncom"]]
BankRate <- all_data[["BankRate-BOC"]]
policy_rate <- all_data[["PolicyRate-BOC"]]
Prate <- all_data[["Prate-FRED"]]

Let’s check if the data from the HPI-SA-GTA sheet was imported correctly.

In [4]:
# View HPI sheet
View(all_data[["HPI-SA-GTA"]])
A tibble: 236 × 13
DateComposite_HPI_SASingle_Family_HPI_SAOne_Storey_HPI_SATwo_Storey_HPI_SATownhouse_HPI_SAApartment_HPI_SAComposite_Benchmark_SASingle_Family_Benchmark_SAOne_Storey_Benchmark_SATwo_Storey_Benchmark_SATownhouse_Benchmark_SAApartment_Benchmark_SA
<dttm><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
2005-01-01100.0100.0100.0100.0100.0100.0315800363900307500383300212000189100
2005-02-01100.3100.4100.6100.2100.1100.3316600365200309400384100212200189600
2005-03-01100.8100.9100.8100.8100.6101.0318300367200309900386500213300191000
2005-04-01101.2101.3101.7101.1100.8101.2319500368700312700387400213600191400
2005-05-01101.7101.8101.8101.7101.2102.1321300370600313000389900214600193100
2005-06-01102.4102.5102.1102.4101.6102.8323300372900314100392400215300194400
2005-07-01102.9103.1102.6103.0102.0102.9325000375100315400394800216200194500
2005-08-01103.5103.8103.8103.5102.4103.4327000377600319200396700217100195500
2005-09-01103.7103.8103.6103.6102.8103.8327500377800318600397000218000196300
2005-10-01104.5104.8104.7104.5103.6104.0330100381200322100400400219600196600
2005-11-01105.1105.2105.3104.9103.9104.8331800382900323800402200220200198200
2005-12-01105.4105.4105.3105.2104.7105.7333000383700323700403300222000199900
2006-01-01106.2106.3106.6105.9105.1105.9335300386900327700406100222800200300
2006-02-01106.6106.7106.8106.4105.8107.1336800388400328500407700224200202500
2006-03-01107.2107.4107.9106.9105.9106.9338600390900331900409900224500202100
2006-04-01107.8108.0108.3107.5106.6107.5340300392900333000412100225900203200
2006-05-01108.1108.4109.2107.8106.8107.8341500394300335700413100226500203800
2006-06-01108.2108.4108.8107.9107.5107.7341700394600334600413700227800203700
2006-07-01108.3108.4108.8107.9107.6108.5342000394400334500413500228200205200
2006-08-01108.4108.5108.5108.1107.8108.8342300394900333500414500228600205800
2006-09-01108.9109.0109.2108.5108.6109.6343900396600335700415700230200207300
2006-10-01108.7108.8109.0108.3108.6109.7343400395800335200415000230300207500
2006-11-01109.2109.3109.9108.7109.3109.9344800397800337900416800231800207800
2006-12-01109.7109.8110.7109.1109.6110.5346400399500340300418300232400208900
2007-01-01109.8109.8110.5109.1110.2111.3346800399400339900418200233600210400
2007-02-01111.2111.4111.9110.9110.8111.7351200405400344000425000235000211200
2007-03-01111.9112.1112.7111.5111.6112.9353500407900346600427200236600213500
2007-04-01112.7112.8112.9112.3112.1113.9355800410600347200430500237700215300
2007-05-01113.3113.4113.6112.9112.7115.0357800412800349300432900238900217400
2007-06-01114.2114.3114.7113.7113.4116.4360800416000352700435900240500220100
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
2022-03-01403.9421.2427.1418.1441.9405.71275600153280013134001602500936800767100
2022-04-01393.8409.0412.5406.5435.0403.01243700148830012684001558100922300762100
2022-05-01383.6396.5396.5394.8426.6398.61211400144280012192001513300904400753800
2022-06-01371.5382.8378.8382.1416.7389.21173200139310011649001464400883400735900
2022-07-01362.0371.6367.0370.8409.7384.01143300135210011285001421300868500726100
2022-08-01355.6364.4359.8362.6397.9380.31123100132600011063001390000843600719100
2022-09-01352.6361.9352.8361.5394.1375.11113500131700010848001385500835500709400
2022-10-01349.9358.8351.8357.6391.2372.61104900130560010817001370800829400704600
2022-11-01348.3357.6350.9356.0388.3369.31099900130120010790001364400823100698400
2022-12-01345.8354.8347.9354.2383.3367.81092000129120010699001357700812700695600
2023-01-01340.9349.9343.5350.4380.1364.61076600127320010563001343200805800689400
2023-02-01339.0349.1345.5349.3375.2358.01070600127030010624001339000795400676900
2023-03-01340.6352.0350.1351.6378.3356.41075700128110010767001347500802000673900
2023-04-01347.5360.8358.0360.2384.2357.71097300131280011010001380800814500676500
2023-05-01356.3370.4367.3369.7392.1364.71125200134800011293001417100831200689700
2023-06-01362.4376.9373.1376.1396.2372.51144500137170011474001441600839900704400
2023-07-01364.7379.7376.1378.6398.8375.01151700138160011566001451200845500709200
2023-08-01363.2377.9372.0376.8402.1374.91147100137510011439001444400852500708900
2023-09-01360.1374.3369.4373.1402.1375.31137200136210011359001430000852500709700
2023-10-01354.6368.6363.2367.5398.2370.01119900134150011167001408800844100699700
2023-11-01349.4363.1357.8361.8390.0366.91103300132130011003001386700826900693900
2023-12-01345.6358.7352.6358.1388.3363.61091500130520010843001372700823300687500
2024-01-01343.4357.1353.5357.0383.4359.91084500129940010870001368400812900680600
2024-02-01344.1359.1355.0359.6383.7356.71086700130690010916001378200813500674600
2024-03-01344.2360.3354.4361.2382.3355.61087000131120010899001384400810500672500
2024-04-01344.8361.7357.4362.0382.2355.81088900131610010989001387700810200672800
2024-05-01343.5360.6356.8360.8381.4353.81084900131240010971001382800808600669100
2024-06-01344.3361.3358.5361.0382.8354.41087400131480011024001383600811500670100
2024-07-01344.7361.6357.3361.4381.3354.61088600131580010986001385300808300670500
2024-08-01344.8361.7358.4361.1379.6353.41089000131630011020001384000804700668300

Step 4: Clean and Prepare the Data¶

We'll now:

  • Convert string-based Years columns to proper Date formats.
  • Generate a date column for the HPI data (monthly).
  • Rename columns for easier access.
  • Handle missing values using the zoo::na.locf() method (last observation carried forward).
In [6]:
# Rename income column for clarity
On_ATincome <- On_ATincome %>% rename(M_AfterTincome = `M-AfterTincome`)

# Create a monthly date sequence for HPI data
HPI_GTA$date <- seq(as.Date("2005-01-01"), by = "month", length.out = nrow(HPI_GTA))

# Convert On_ATincome to proper date format (yearly start)
On_ATincome$date <- as.Date(paste0(On_ATincome$Years, "-01-01"))

# Fill missing HPI values using last observation carried forward
HPI_GTA$Composite_HPI_SA <- na.locf(HPI_GTA$Composite_HPI_SA, na.rm = FALSE)
HPI_GTA$Composite_Benchmark_SA <- na.locf(HPI_GTA$Composite_Benchmark_SA, na.rm = FALSE)

# Check if any NA values remain
sum(is.na(HPI_GTA$Composite_HPI_SA))
sum(is.na(On_ATincome$M_AfterTincome))
sum(is.na(HPI_GTA$Composite_Benchmark_SA))
0
0
0
In [12]:
str (On_ATincome)
tibble [47 × 3] (S3: tbl_df/tbl/data.frame)
 $ Years         : num [1:47] 1976 1977 1978 1979 1980 ...
 $ M_AfterTincome: num [1:47] 64400 65500 65600 65100 67000 65600 62100 59700 61800 63200 ...
 $ date          : Date[1:47], format: "1976-01-01" "1977-01-01" ...

Step 4: Create Date Columns¶

To ensure consistent date alignment across datasets, we assign monthly dates to the HPI data and convert yearly income data to proper Date objects.

In [13]:
# Assign monthly dates to HPI data
HPI_GTA$date <- seq(as.Date("2005-01-01"), by = "month", length.out = nrow(HPI_GTA))

# Convert Year column in On_ATincome to Date (annual)
On_ATincome$date <- as.Date(paste0(On_ATincome$Years, "-01-01"))

# Confirm the structure
str(HPI_GTA$date)
str(On_ATincome$date)
 Date[1:236], format: "2005-01-01" "2005-02-01" "2005-03-01" "2005-04-01" "2005-05-01" ...
 Date[1:47], format: "1976-01-01" "1977-01-01" "1978-01-01" "1979-01-01" "1980-01-01" ...

Step 5: Handle Missing Values (Forward Fill)¶

To maintain continuity in our time series analysis, we forward-fill any missing values in the Housing Price Index (HPI) and Benchmark Index using the na.locf() method from the zoo package.

In [14]:
library(zoo)

# Fill missing values in HPI and Benchmark columns
HPI_GTA$Composite_HPI_SA <- na.locf(HPI_GTA$Composite_HPI_SA, na.rm = FALSE)
HPI_GTA$Composite_Benchmark_SA <- na.locf(HPI_GTA$Composite_Benchmark_SA, na.rm = FALSE)

# Check that no missing values remain
sum(is.na(HPI_GTA$Composite_HPI_SA))
sum(is.na(On_ATincome$M_AfterTincome))
sum(is.na(HPI_GTA$Composite_Benchmark_SA))
0
0
0

Step 6: Aggregate Monthly Housing Data to Yearly Averages¶

To observe long-term trends and enable meaningful comparison with income data, we compute the yearly average of:

  • Composite_HPI_SA (Housing Price Index)
  • Composite_Benchmark_SA (Benchmark Price Index)

This aggregation is necessary because the after-tax income data is reported annually, so we align the time frequencies across datasets.

In [15]:
library(dplyr)
library(lubridate)

# Aggregate Composite HPI to yearly average
HPI_year_rate <- HPI_GTA %>%
  group_by(Year = year(date)) %>%
  summarize(Yearly_Average = mean(Composite_HPI_SA, na.rm = TRUE))

# Aggregate Benchmark Index to yearly average
Benchmark_year_rate <- HPI_GTA %>%
  group_by(Year = year(date)) %>%
  summarize(Yearly_Average = mean(Composite_Benchmark_SA, na.rm = TRUE))

# Convert Year to Date for plotting
HPI_year_rate$Year <- as.Date(paste0(HPI_year_rate$Year, "-01-01"))
Benchmark_year_rate$Year <- as.Date(paste0(Benchmark_year_rate$Year, "-01-01"))

Step 7: Plot HPI vs Median After-Tax Income (Dual Y-Axis)¶

In this step, we create a dual-axis plot to show the relationship between:

  • Composite_HPI_SA (blue, monthly housing prices)
  • M_AfterTincome (red, annual income)

Because they differ in scale, we apply a transformation to income so it aligns visually with the HPI curve.

In [17]:
# Calculate scaling factor to align income with HPI visually
income_scale_factor <- max(HPI_GTA$Composite_HPI_SA, na.rm = TRUE) / 
  max(On_ATincome$M_AfterTincome, na.rm = TRUE)

# Plot with dual axes
ggplot() +
  # HPI line and points
  geom_point(data = HPI_GTA, aes(x = date, y = Composite_HPI_SA), 
             color = "blue", size = 1) +
  geom_line(data = HPI_GTA, aes(x = date, y = Composite_HPI_SA), 
            color = "blue", alpha = 0.5) +

  # Income line and points scaled to match HPI range
  geom_point(data = On_ATincome, aes(x = date, y = M_AfterTincome * income_scale_factor), 
             color = "red", size = 3) +
  geom_line(data = On_ATincome, aes(x = date, y = M_AfterTincome * income_scale_factor), 
            color = "red", linetype = "dashed") +

  # Format axis and labels
  scale_x_date(date_labels = "%Y", date_breaks = "1 year",
               limits = c(as.Date("2005-01-01"), Sys.Date())) +
  scale_y_continuous(name = "Housing Price Index (Blue)",
                     sec.axis = sec_axis(~ . / income_scale_factor, 
                                         name = "Median After-Tax Income (Red)")) +

  labs(title = "HPI and Median After-Tax Income Over Time",
       x = "Year") +
  theme_minimal()
Warning message:
"Removed 29 rows containing missing values or values outside the scale range (`geom_point()`)."
Warning message:
"Removed 29 rows containing missing values or values outside the scale range (`geom_line()`)."
No description has been provided for this image

Step 8 : Normalize Yearly HPI and Income (Base Year = 2005)¶

To compare relative growth, we use 2005 as the base year and index both HPI and Income to 100. This avoids mixing monthly vs. yearly scales and provides a clearer view of the affordability gap.

In [20]:
# Aggregate HPI to yearly average first
HPI_yearly <- HPI_GTA %>%
  group_by(Year = year(date)) %>%
  summarize(HPI = mean(Composite_HPI_SA, na.rm = TRUE))

# Prepare income as yearly with proper Year column
Income_yearly <- On_ATincome %>%
  mutate(Year = year(date)) %>%
  group_by(Year) %>%
  summarize(Income = mean(M_AfterTincome, na.rm = TRUE))

# Merge to ensure same years
growth_data <- inner_join(HPI_yearly, Income_yearly, by = "Year")

# Normalize using 2005 as base year
base_HPI <- growth_data$HPI[growth_data$Year == 2005]
base_income <- growth_data$Income[growth_data$Year == 2005]

growth_data <- growth_data %>%
  mutate(HPI_indexed = (HPI / base_HPI) * 100,
         Income_indexed = (Income / base_income) * 100)

# Plot
ggplot(growth_data, aes(x = Year)) +
  geom_line(aes(y = HPI_indexed, color = "HPI (Housing Price Index)"), size = 1.2) +
  geom_line(aes(y = Income_indexed, color = "After-Tax Income"), linetype = "dashed", size = 1.2) +
  scale_color_manual(values = c("HPI (Housing Price Index)" = "blue", "After-Tax Income" = "red")) +
  labs(title = "Indexed Growth of HPI vs After-Tax Income (Base = 2005)",
       y = "Indexed Value (2005 = 100)", x = "Year", color = "Legend") +
  theme_minimal()
No description has been provided for this image

Step 9: Calculate and Visualize the Affordability Ratio (HPI / Income)¶

The affordability ratio tells us how expensive housing has become relative to after-tax income.

By dividing the Housing Price Index (HPI) by income, we can observe how much income is needed to afford housing. A rising ratio indicates worsening affordability.

In [21]:
# Calculate affordability ratio (HPI / Income)
growth_data <- growth_data %>%
  mutate(Affordability_Ratio = HPI / Income)

# Plot the affordability ratio over time
ggplot(growth_data, aes(x = Year, y = Affordability_Ratio)) +
  geom_line(color = "darkgreen", size = 1.2) +
  labs(title = "Affordability Ratio Over Time: HPI / After-Tax Income",
       x = "Year",
       y = "Affordability Ratio",
       caption = "Higher values indicate lower affordability") +
  theme_minimal()
No description has been provided for this image

This pattern poses important questions for policymakers: Are housing prices driven more by investment speculation or constrained supply? How should income inequality and housing policy respond to this affordability divergence?

📉 Interpretation: The Growing Gap Between Income and Housing Prices¶

The chart clearly shows that:


🔺 Affordability has sharply worsened:¶

  • The HPI-to-Income ratio nearly tripled between 2005 and 2023.
  • Even though incomes have risen gradually, housing prices have surged much faster, pushing homeownership further out of reach for the average household.

⚠️ Critical surges in 2015–2017 and 2020–2023:¶

  • 2015–2017 marks the first major affordability shock, possibly driven by speculation and loose monetary policy.
  • A second, more dramatic spike followed post-2020, likely fueled by:
    • Record-low interest rates
    • Pandemic-era demand shifts
    • Supply bottlenecks

📉 Income hasn’t kept pace:¶

  • Income has grown modestly, but nowhere near the exponential increase
In [28]:
# Trim income_ts to match HPI range (start in 2005)
income_ts_short <- window(income_ts, start = 2005)
In [29]:
# Set base values from 2005
base_HPI <- hpi_yearly$Composite_HPI_SA[1]
base_income <- income_ts_short[1]

# Compute indexed (normalized) values
hpi_yearly$HPI_indexed <- (hpi_yearly$Composite_HPI_SA / base_HPI) * 100
income_indexed <- (income_ts_short / base_income) * 100
In [32]:
# First check how many rows you have in HPI yearly data
length(hpi_yearly$Composite_HPI_SA)  # let's say this returns 20

# Then subset income_ts accordingly (starting from 2005)
income_ts_short <- window(income_ts, start = 2005, end = 2005 + 19)
20
In [34]:
# Normalize to base year
base_HPI <- hpi_yearly$Composite_HPI_SA[1]
base_income <- income_ts_short[1]

hpi_yearly$HPI_indexed <- (hpi_yearly$Composite_HPI_SA / base_HPI) * 100
income_indexed <- (income_ts_short / base_income) * 100
In [36]:
# Create a unified data frame
indexed_data <- data.frame(
  Year = time(income_ts_short),        # Year vector (e.g., 2005 to 2024)
  HPI_indexed = hpi_yearly$HPI_indexed,
  Income_indexed = income_indexed
)
In [37]:
ggplot(indexed_data, aes(x = Year)) +
  geom_line(aes(y = HPI_indexed, color = "HPI Indexed"), size = 1) +
  geom_line(aes(y = Income_indexed, color = "Income Indexed"), size = 1, linetype = "dashed") +
  labs(title = "Indexed Growth: HPI vs Median After-Tax Income (Base = 2005)",
       y = "Indexed Value (Base = 100)", color = "Legend") +
  theme_minimal()
Don't know how to automatically pick scale for object of type <ts>. Defaulting to continuous.
No description has been provided for this image
In [ ]: