::install_github("LukeCe/spflow")
devtools# Check that version number should be at least 0.1.0.9010
5: Spatial Econometric Interaction Model
Getting started
The key packages to be used in this exercise are:
- tmap: for thematic mapping
- sf: for geospatial data wrangling
- spdep: for spatial weights matrix
- spflow: for spatial econometric models of O-D flow
- tidyverse: for non-spatial data wrangling
However, as the latest version of the spflow package is not available on CRAN yet, the install_github()
function of devtools package will be used to install the spflow package first.
The code chunk below then uses p_load()
of pacman package to check if the required packages have been installed on the computer. If they are, the packages will be launched.
::p_load(tmap, sf, spdep, sp, Matrix, spflow, knitr, tidyverse) pacman
Data Preparation
Before calibrating Spatial Econometric Interaction Models using spflow, three data sets are required:
- spatial weights
- tibble data frame O-D flow matrix (with distance)
- tibble data frame consisting of explanatory variables (does not need to differentiate origin vs. destination variables)
<- read_rds("data/rds/mpsz_nb.rds")
mpsz_nb <- read_rds("data/rds/mpsz_flow.rds")
mpsz_flow <- read_rds("data/rds/mpsz_var.rds") mpsz_var
Creating spflow_network-class
objects
spflow_network-class
is an S4 class containing information on a spatial network which is composed by a set of nodes that are linked by some neighbourhood relation. It can be created using spflow_network()
function of spflow package. For our model, we choose the contiguity based neighbourhood structure.
<- spflow_network(
mpsz_net id_net = "sg", # assign an id name, can give it any input
node_neighborhood = nb2mat(mpsz_nb$by_contiguity),
node_data = mpsz_var,
node_key_column = "SZ_CODE"
)
mpsz_net
Spatial network nodes with id: sg
--------------------------------------------------
Number of nodes: 313
Average number of links per node: 6.077
Density of the neighborhood matrix: 1.94% (non-zero connections)
Data on nodes:
SZ_NAME SZ_CODE BUSSTOP_COUNT AGE7_12 AGE13_24 AGE25_64
1 INSTITUTION HILL RVSZ05 2 330 360 2260
2 ROBERTSON QUAY SRSZ01 10 320 350 2200
3 FORT CANNING MUSZ02 6 0 10 30
4 MARINA EAST (MP) MPSZ05 2 0 0 0
5 SENTOSA SISZ01 1 200 260 1440
6 CITY TERMINALS BMSZ17 10 0 0 0
--- --- --- --- --- --- ---
308 NEE SOON YSSZ07 12 90 140 590
309 UPPER THOMSON BSSZ01 47 1590 3660 15980
310 SHANGRI-LA AMSZ05 12 810 1920 9650
311 TOWNSVILLE AMSZ04 9 980 2000 11320
312 MARYMOUNT BSSZ02 25 1610 4060 16860
313 TUAS VIEW EXTENSION TSSZ06 11 0 0 0
SCHOOL_COUNT BUSINESS_COUNT RETAILS_COUNT FINSERV_COUNT ENTERTN_COUNT
1 1 6 26 3 0
2 0 4 207 18 6
3 0 7 17 0 3
4 0 0 0 0 0
5 0 1 84 29 2
6 0 11 14 4 0
--- --- --- --- --- ---
308 0 0 7 0 0
309 3 21 305 30 0
310 3 0 53 9 0
311 1 0 83 11 0
312 3 19 135 8 0
313 0 53 3 1 0
FB_COUNT LR_COUNT COORD_X COORD_Y
1 4 3 103.84 1.29
2 38 11 103.84 1.29
3 4 7 103.85 1.29
4 0 0 103.88 1.29
5 38 20 103.83 1.25
6 15 0 103.85 1.26
--- --- --- --- ---
308 0 0 103.81 1.4
309 5 11 103.83 1.36
310 0 0 103.84 1.37
311 1 1 103.85 1.36
312 3 11 103.84 1.35
313 0 0 103.61 1.26
spflow_network_pair-class
is an S4 class containing information on O-D pairs. It can be created using spflow_network_pair()
function of spflow package.
<- spflow_network_pair(
mpsz_net_pairs id_orig_net = "sg",
id_dest_net = "sg",
pair_data = mpsz_flow,
orig_key_column = "ORIGIN_SZ",
dest_key_column = "DESTIN_SZ"
)
mpsz_net_pairs
Spatial network pair with id: sg_sg
--------------------------------------------------
Origin network id: sg (with 313 nodes)
Destination network id: sg (with 313 nodes)
Number of pairs: 97969
Completeness of pairs: 100.00% (97969/97969)
Data on node-pairs:
DESTIN_SZ ORIGIN_SZ DISTANCE TRIPS
1 RVSZ05 RVSZ05 0 67
314 SRSZ01 RVSZ05 305.74 251
627 MUSZ02 RVSZ05 951.83 0
940 MPSZ05 RVSZ05 5254.07 0
1253 SISZ01 RVSZ05 4975 0
1566 BMSZ17 RVSZ05 3176.16 0
--- --- --- --- ---
96404 YSSZ07 TSSZ06 26972.97 0
96717 BSSZ01 TSSZ06 25582.48 0
97030 AMSZ05 TSSZ06 26714.79 0
97343 AMSZ04 TSSZ06 27572.74 0
97656 BSSZ02 TSSZ06 26681.7 0
97969 TSSZ06 TSSZ06 0 270
spflow_network_multi-class
is an S4 class containing information on the origins, the destinations, and the O-D pairs. It can be created using spflow_network_multi()
function of spflow package and only works on spflow_network-class
and spflow_network_pair-class
.
<- spflow_network_multi(mpsz_net,
mpsz_multi_net
mpsz_net_pairs)
mpsz_multi_net
Collection of spatial network nodes and pairs
--------------------------------------------------
Contains 1 spatial network nodes
With id : sg
Contains 1 spatial network pairs
With id : sg_sg
Availability of origin-destination pair information:
ID_ORIG_NET ID_DEST_NET ID_NET_PAIR COMPLETENESS C_PAIRS C_ORIG C_DEST
sg sg sg_sg 100.00% 97969/97969 313/313 313/313
Check for Multicollinearity
<- log(1 + TRIPS) ~
cor_formula +
BUSSTOP_COUNT +
AGE7_12 +
AGE13_24 +
AGE25_64 +
SCHOOL_COUNT +
BUSINESS_COUNT +
RETAILS_COUNT +
FINSERV_COUNT P_(log(DISTANCE + 1)) # P stands for the impedence
<- pair_cor(
cor_mat
mpsz_multi_net,spflow_formula = cor_formula,
add_lags_x = FALSE
)
# Labels
colnames(cor_mat) <- paste0(substr(colnames(cor_mat),1,3),"...")
cor_image(cor_mat)
Base Model Calibration
There are currently three estimators of spatial econometric interaction models supported by spflow package:
- Maximum likelihood estimation (MLE) – default estimation procedure.
- Spatial two-stage least squares (S2SLS) – activate the S2SLS estimation via the
estimation_control
argument using the inputspflow_control(estimation_method = "s2sls")
. - Bayesian Markov Chain Monte Carlo (MCMC) – activate the MCMC estimation via the
estimation_control
argument using the inputspflow_control(estimation_method = "mcmc")
.
The function offers a formula interface adapted to spatial interaction models, which has the following structure:
Y ~ O_(X1) + D_(X2) + I_(X3) + P_(X4)
- O_(…) and D_(…) indicate which variables are used as characteristics of the origins and destinations respectively
- I_(…) indicates variables that should be used for the intra-regional parameters
- P_(…) declares which variables describe origin-destination pairs, which usually will include a measure of distance (distance decay).
<- spflow(
base_model spflow_formula = log(1 + TRIPS) ~
O_(BUSSTOP_COUNT +
+
AGE25_64) D_(SCHOOL_COUNT +
+
BUSINESS_COUNT +
RETAILS_COUNT +
FINSERV_COUNT) P_(log(DISTANCE + 1)),
spflow_networks = mpsz_multi_net
)
base_model
--------------------------------------------------
Spatial interaction model estimated by: MLE
Spatial correlation structure: SDM (model_9)
Dependent variable: log(1 + TRIPS)
--------------------------------------------------
Coefficients:
est sd t.stat p.val
rho_d 0.680 0.004 192.554 0.000
rho_o 0.678 0.004 187.732 0.000
rho_w -0.396 0.006 -65.591 0.000
(Intercept) 0.410 0.065 6.266 0.000
(Intra) 1.313 0.081 16.263 0.000
D_SCHOOL_COUNT 0.017 0.002 7.885 0.000
D_SCHOOL_COUNT.lag1 0.002 0.004 0.551 0.581
D_BUSINESS_COUNT 0.000 0.000 3.015 0.003
D_BUSINESS_COUNT.lag1 0.000 0.000 -0.249 0.804
D_RETAILS_COUNT 0.000 0.000 -0.306 0.759
D_RETAILS_COUNT.lag1 0.000 0.000 0.152 0.880
D_FINSERV_COUNT 0.002 0.000 6.787 0.000
D_FINSERV_COUNT.lag1 -0.002 0.001 -3.767 0.000
O_BUSSTOP_COUNT 0.002 0.000 6.806 0.000
O_BUSSTOP_COUNT.lag1 -0.001 0.000 -2.364 0.018
O_AGE25_64 0.000 0.000 7.336 0.000
O_AGE25_64.lag1 0.000 0.000 -2.797 0.005
P_log(DISTANCE + 1) -0.050 0.007 -6.793 0.000
--------------------------------------------------
R2_corr: 0.6942944
Observations: 97969
Model coherence: Validated
Based on the print result above, model 9 and MLE estimator is used (defaults). The R2 value is 0.694, which means the model accounts for 69.42% of the variation of flows.
rho_d - destination constrain rho_o - origin constrain rho_w - intrazonal constrain
D_SCHOOL_COUNT
has a t.stat that is statistically significant, but the t.stat of its lag is not statistically significant. Similarly for D_BUSINESS_COUNT
and its lag.
- School and Business counts in the zone will affect the attractiveness, but not school and business counts in neighbouring zones.
D_RETAILS_COUNT
and its lag are both not statistically significant.
- Not a good explanatory variable for weekday morning peak.
D_FINSERV_COUNT
and its lag are both statistically significant.
- Good explanatory variable: Financial services counts in the zone and in neighbouring zones will affect the attractiveness
O_BUSSTOP_COUNT
and its lag are both statistically significant. Simialrly for O_AGE25_64
and and its lag.
- Good explanatory variable: Financial services counts in the zone and in neighbouring zones will affect the attractiveness
<- par(mfrow = c(1,3),
old_par mar = c(2,2,2,2))
spflow_moran_plots(base_model)
par(old_par)
Next, pair_cor()
can be used to inspect the relationship of the residual and the explanatory variables by using the code chunk below.
Check for multi-collinearity again.
<- pair_cor(base_model)
corr_residual colnames(corr_residual) <- substr(colnames(corr_residual),1,3)
cor_image(corr_residual)
Working with model control
Example of running MLE Model 8:
<- log(1 + TRIPS) ~
spflow_formula O_(BUSSTOP_COUNT +
+
AGE25_64) D_(SCHOOL_COUNT +
+
BUSINESS_COUNT +
RETAILS_COUNT +
FINSERV_COUNT) P_(log(DISTANCE + 1))
<- spflow_control(
model_control estimation_method = "mle",
model = "model_8"
)
<- spflow(
mle_model8
spflow_formula,spflow_networks = mpsz_multi_net,
estimation_control = model_control
)
mle_model8
--------------------------------------------------
Spatial interaction model estimated by: MLE
Spatial correlation structure: SDM (model_8)
Dependent variable: log(1 + TRIPS)
--------------------------------------------------
Coefficients:
est sd t.stat p.val
rho_d 0.689 0.003 196.832 0.000
rho_o 0.687 0.004 192.214 0.000
rho_w -0.473 0.003 -142.469 0.000
(Intercept) 1.086 0.049 22.275 0.000
(Intra) 0.840 0.075 11.255 0.000
D_SCHOOL_COUNT 0.019 0.002 8.896 0.000
D_SCHOOL_COUNT.lag1 0.019 0.004 5.130 0.000
D_BUSINESS_COUNT 0.000 0.000 3.328 0.001
D_BUSINESS_COUNT.lag1 0.000 0.000 1.664 0.096
D_RETAILS_COUNT 0.000 0.000 -0.414 0.679
D_RETAILS_COUNT.lag1 0.000 0.000 -0.171 0.864
D_FINSERV_COUNT 0.002 0.000 6.150 0.000
D_FINSERV_COUNT.lag1 -0.003 0.001 -4.601 0.000
O_BUSSTOP_COUNT 0.003 0.000 7.676 0.000
O_BUSSTOP_COUNT.lag1 0.000 0.000 0.552 0.581
O_AGE25_64 0.000 0.000 6.870 0.000
O_AGE25_64.lag1 0.000 0.000 -0.462 0.644
P_log(DISTANCE + 1) -0.125 0.005 -22.865 0.000
--------------------------------------------------
R2_corr: 0.6965974
Observations: 97969
Model coherence: Validated
The R2 of Model 8 is 0.696, meaning the model accounts for 69.65% of the variation of flows – slightly higher than that for Model 9.
Example of running MLE Model 1 (unconstrained):
<- spflow_control(
model_control estimation_method = "mle",
model = "model_1"
)
<- spflow(
mle_model1
spflow_formula,spflow_networks = mpsz_multi_net,
estimation_control = model_control
)
mle_model1
--------------------------------------------------
Spatial interaction model estimated by: OLS
Spatial correlation structure: SLX (model_1)
Dependent variable: log(1 + TRIPS)
--------------------------------------------------
Coefficients:
est sd t.stat p.val
(Intercept) 11.384 0.069 164.255 0.000
(Intra) -6.006 0.112 -53.393 0.000
D_SCHOOL_COUNT 0.093 0.003 28.599 0.000
D_SCHOOL_COUNT.lag1 0.255 0.006 44.905 0.000
D_BUSINESS_COUNT 0.001 0.000 10.036 0.000
D_BUSINESS_COUNT.lag1 0.003 0.000 18.274 0.000
D_RETAILS_COUNT 0.000 0.000 -1.940 0.052
D_RETAILS_COUNT.lag1 0.000 0.000 -2.581 0.010
D_FINSERV_COUNT 0.005 0.000 10.979 0.000
D_FINSERV_COUNT.lag1 -0.016 0.001 -17.134 0.000
O_BUSSTOP_COUNT 0.014 0.001 25.865 0.000
O_BUSSTOP_COUNT.lag1 0.015 0.001 21.728 0.000
O_AGE25_64 0.000 0.000 14.479 0.000
O_AGE25_64.lag1 0.000 0.000 14.452 0.000
P_log(DISTANCE + 1) -1.281 0.008 -165.327 0.000
--------------------------------------------------
R2_corr: 0.2831458
Observations: 97969
Model coherence: Validated
The R2 of Model 1 (unconstrained) is only 0.2831, meaning the model accounts for 28.31% of the variation of flows.
Example of running MLE Model 2 (constrained):
<- spflow_control(
model_control estimation_method = "mle",
model = "model_2"
)
<- spflow(
mle_model2
spflow_formula,spflow_networks = mpsz_multi_net,
estimation_control = model_control
)
mle_model2
--------------------------------------------------
Spatial interaction model estimated by: MLE
Spatial correlation structure: SDM (model_2)
Dependent variable: log(1 + TRIPS)
--------------------------------------------------
Coefficients:
est sd t.stat p.val
rho_d 0.731 0.003 218.603 0.000
(Intercept) 3.068 0.067 45.899 0.000
(Intra) -0.398 0.093 -4.291 0.000
D_SCHOOL_COUNT 0.063 0.003 24.291 0.000
D_SCHOOL_COUNT.lag1 0.054 0.005 11.802 0.000
D_BUSINESS_COUNT 0.001 0.000 8.790 0.000
D_BUSINESS_COUNT.lag1 0.001 0.000 3.925 0.000
D_RETAILS_COUNT 0.000 0.000 -1.076 0.282
D_RETAILS_COUNT.lag1 0.000 0.000 0.251 0.802
D_FINSERV_COUNT 0.007 0.000 18.054 0.000
D_FINSERV_COUNT.lag1 -0.009 0.001 -12.691 0.000
O_BUSSTOP_COUNT 0.003 0.000 8.269 0.000
O_BUSSTOP_COUNT.lag1 0.004 0.001 7.808 0.000
O_AGE25_64 0.000 0.000 3.918 0.000
O_AGE25_64.lag1 0.000 0.000 4.421 0.000
P_log(DISTANCE + 1) -0.351 0.007 -47.000 0.000
--------------------------------------------------
R2_corr: 0.5527887
Observations: 97969
Model coherence: Validated
The R2 of Model 2 (constrained) is 0.5527, meaning the model accounts for 55.27% of the variation of flows – higher than that of Model 1 (unconstrained).