Code
::p_load(sf, sfdep, tmap, plotly, tidyverse, knitr) pacman
Magdalene Chan
November 25, 2023
December 3, 2023
As city-wide urban infrastructures become increasingly digital, datasets from technologies like GPS and RFID on vehicles offer opportunities to track movement patterns over space and time. For instance, smart cards and GPS devices on public buses collect routes and ridership data, containing valuable structure and patterns for understanding human movement and behavior within cities. Despite their potential, the practical use of these extensive location-aware datasets often remains limited to basic tracking and mapping within GIS applications due to the lack of comprehensive spatial and spatio-temporal analysis functions in conventional GIS tools.
Exploratory Spatial Data Analysis (ESDA) holds tremendous potential to address such complex problems. In this study, appropriate Local Indicators of Spatial Association (GLISA) and Emerging Hot Spot Analysis (EHSA) will be applied to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.
The following tasks will be undertaken in this exercise:
With reference to the passenger trips generated by origin at the hexagon level for the four time periods given above:
The code chunk below uses p_load()
of pacman
package to check if the required packages have been installed on the computer. If they are, the packages will be launched. The following packages will be used:
sf
package is used for importing, managing, and processing geospatial data.sfdep
package is used to create spatial weights matrix and LISA objects using the sf class to represent spatial data.tmap
package is used for thematic mapping.plotly
package is used to create interactive graphs and charts.tidyverse
package is used for aspatial data wrangling.knitr
package is used for dynamic report generation in R.The data sets used are:
The code chunk below uses the st_read()
function of sf package to import BusStop
shapefile into R as a simple feature data frame called BusStop
. As BusStop
uses svy21 projected coordinate system, the crs
is set to 3414.
Reading layer `BusStop' from data source
`C:\magdalenecjw\ISSS624 Geospatial\Take_Home_Exercise\Ex1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
Classes 'sf' and 'data.frame': 5161 obs. of 4 variables:
$ BUS_STOP_N: chr "22069" "32071" "44331" "96081" ...
$ BUS_ROOF_N: chr "B06" "B23" "B01" "B05" ...
$ LOC_DESC : chr "OPP CEVA LOGISTICS" "AFT TRACK 13" "BLK 239" "GRACE INDEPENDENT CH" ...
$ geometry :sfc_POINT of length 5161; first list element: 'XY' num 13576 32884
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA
..- attr(*, "names")= chr [1:3] "BUS_STOP_N" "BUS_ROOF_N" "LOC_DESC"
There are a total of 5161 features in the BusStop
shapefile. Notably, BUS_STOP_N
is listed as a character variable. As this variable will be used as the identifier to link to the aspatial data, it should be transformed to a factor so that R treats it as a grouping variable.
Classes 'sf' and 'data.frame': 5161 obs. of 4 variables:
$ BUS_STOP_N: Factor w/ 5145 levels "01012","01013",..: 1008 1724 2118 4972 431 3787 1160 2941 1610 4983 ...
$ BUS_ROOF_N: chr "B06" "B23" "B01" "B05" ...
$ LOC_DESC : chr "OPP CEVA LOGISTICS" "AFT TRACK 13" "BLK 239" "GRACE INDEPENDENT CH" ...
$ geometry :sfc_POINT of length 5161; first list element: 'XY' num 13576 32884
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA
..- attr(*, "names")= chr [1:3] "BUS_STOP_N" "BUS_ROOF_N" "LOC_DESC"
Based on the output above, BUS_STOP_N
is now a factor of 5145 levels. However, there are a total of 5161 observations. As such, the duplicate records should be checked. The code chunk below shows a list of duplicated bus stop codes.
BUS_STOP_N | BUS_ROOF_N | LOC_DESC | geometry |
---|---|---|---|
11009 | B04 | Ghim Moh Ter | POINT (23101.34 32594.17) |
11009 | B04-TMNL | GHIM MOH TER | POINT (23100.58 32604.36) |
22501 | B02 | Blk 662A | POINT (13489.09 35536.4) |
22501 | B02 | BLK 662A | POINT (13488.02 35537.88) |
43709 | B06 | BLK 644 | POINT (18963.42 36762.8) |
43709 | B06 | BLK 644 | POINT (18952.02 36751.83) |
47201 | UNK | NA | POINT (22616.75 47793.68) |
47201 | NIL | W’LANDS NTH STN | POINT (22632.92 47934) |
51071 | B21 | MACRITCHIE RESERVOIR | POINT (28311.27 36036.92) |
51071 | B21 | MACRITCHIE RESERVOIR | POINT (28282.54 36033.93) |
52059 | B03 | OPP BLK 65 | POINT (30770.3 34460.06) |
52059 | B09 | BLK 219 | POINT (30565.45 36133.15) |
53041 | B05 | Upp Thomson Road | POINT (28105.8 37246.76) |
53041 | B07 | Upp Thomson Road | POINT (27956.34 37379.29) |
58031 | UNK | OPP CANBERRA DR | POINT (27089.69 47570.9) |
58031 | UNK | OPP CANBERRA DR | POINT (27111.07 47517.77) |
62251 | B03 | Bef Blk 471B | POINT (35500.54 39943.41) |
62251 | B03 | BEF BLK 471B | POINT (35500.36 39943.34) |
67421 | B01 | CHENG LIM STN EXIT B | POINT (34548.54 42052.15) |
67421 | NIL | CHENG LIM STN EXIT B | POINT (34741.77 42004.21) |
68091 | B01 | AFT BAKER ST | POINT (32164.11 42695.98) |
68091 | B08 | AFT BAKER ST | POINT (32038.84 43298.68) |
68099 | B02 | BEF BAKER ST | POINT (32154.9 42742.82) |
68099 | B07 | BEF BAKER ST | POINT (32004.05 43320.34) |
77329 | B01 | BEF PASIR RIS ST 53 | POINT (40765.35 39452.18) |
77329 | B03 | Pasir Ris Central | POINT (40728.15 39438.15) |
82221 | B01 | BLK 3A | POINT (35323.6 33257.05) |
82221 | B01 | Blk 3A | POINT (35308.74 33335.17) |
96319 | NA | Yusen Logistics | POINT (42187.23 34995.78) |
96319 | NIL | YUSEN LOGISTICS | POINT (42187.23 34995.78) |
97079 | B14 | OPP ST. JOHN’S CRES | POINT (44144.57 38980.25) |
97079 | B14 | OPP ST. JOHN’S CRES | POINT (44055.75 38908.5) |
The code chunk below uses the distinct()
function to keep only the unique rows based on the BUS_STOP_N
while preserving all other fields (.keep_all = TRUE
). By default, distinct()
keeps the first occurrence of each unique combination of values in the specified columns.
Classes 'sf' and 'data.frame': 5145 obs. of 4 variables:
$ BUS_STOP_N: Factor w/ 5145 levels "01012","01013",..: 1008 1724 2118 4972 431 3787 1160 2941 1610 4983 ...
$ BUS_ROOF_N: chr "B06" "B23" "B01" "B05" ...
$ LOC_DESC : chr "OPP CEVA LOGISTICS" "AFT TRACK 13" "BLK 239" "GRACE INDEPENDENT CH" ...
$ geometry :sfc_POINT of length 5145; first list element: 'XY' num 13576 32884
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA
..- attr(*, "names")= chr [1:3] "BUS_STOP_N" "BUS_ROOF_N" "LOC_DESC"
There are now a total of 5145 observations, aligned with the number of factor levels.
The code chunk below uses the read_csv()
function of readr
package (imported with the tidyverse
package) to import the csv files into R.
spc_tbl_ [5,709,512 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ YEAR_MONTH : chr [1:5709512] "2023-08" "2023-08" "2023-08" "2023-08" ...
$ DAY_TYPE : chr [1:5709512] "WEEKDAY" "WEEKENDS/HOLIDAY" "WEEKENDS/HOLIDAY" "WEEKDAY" ...
$ TIME_PER_HOUR : num [1:5709512] 16 16 14 14 17 17 17 17 7 17 ...
$ PT_TYPE : chr [1:5709512] "BUS" "BUS" "BUS" "BUS" ...
$ ORIGIN_PT_CODE : chr [1:5709512] "04168" "04168" "80119" "80119" ...
$ DESTINATION_PT_CODE: chr [1:5709512] "10051" "10051" "90079" "90079" ...
$ TOTAL_TRIPS : num [1:5709512] 7 2 3 10 5 4 3 22 3 3 ...
- attr(*, "spec")=
.. cols(
.. YEAR_MONTH = col_character(),
.. DAY_TYPE = col_character(),
.. TIME_PER_HOUR = col_double(),
.. PT_TYPE = col_character(),
.. ORIGIN_PT_CODE = col_character(),
.. DESTINATION_PT_CODE = col_character(),
.. TOTAL_TRIPS = col_double()
.. )
- attr(*, "problems")=<externalptr>
Based on the data frame structure seen above, ORIGIN_PT_CODE
and DESTINATION_PT_CODE
are listed as character variables. These variables are equivalent to BUS_STOP_N
of BusStop
sf data frame and should be transformed to factors so that R treats them as grouping variables.
spc_tbl_ [5,709,512 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ YEAR_MONTH : chr [1:5709512] "2023-08" "2023-08" "2023-08" "2023-08" ...
$ DAY_TYPE : chr [1:5709512] "WEEKDAY" "WEEKENDS/HOLIDAY" "WEEKENDS/HOLIDAY" "WEEKDAY" ...
$ TIME_PER_HOUR : num [1:5709512] 16 16 14 14 17 17 17 17 7 17 ...
$ PT_TYPE : chr [1:5709512] "BUS" "BUS" "BUS" "BUS" ...
$ ORIGIN_PT_CODE : Factor w/ 5067 levels "01012","01013",..: 104 104 4422 4422 2008 2008 832 832 779 355 ...
$ DESTINATION_PT_CODE: Factor w/ 5071 levels "01012","01013",..: 239 239 4736 4736 691 691 807 807 234 107 ...
$ TOTAL_TRIPS : num [1:5709512] 7 2 3 10 5 4 3 22 3 3 ...
- attr(*, "spec")=
.. cols(
.. YEAR_MONTH = col_character(),
.. DAY_TYPE = col_character(),
.. TIME_PER_HOUR = col_double(),
.. PT_TYPE = col_character(),
.. ORIGIN_PT_CODE = col_character(),
.. DESTINATION_PT_CODE = col_character(),
.. TOTAL_TRIPS = col_double()
.. )
- attr(*, "problems")=<externalptr>
Based on the output above, ORIGIN_PT_CODE
and DESTINATION_PT_CODE
are now factors.
The code chunk below extracts commuting flows for the four target time periods. For completeness, a list of all possible Bus Stops x Day Time x Time combinations is created. Following which, the commuting flow for each Bus Stops x Day Time x Time combination within the odbus
data frame is computed and missing combinations are imputed with a value of 0. Lastly the resultant data frame is pivoted wider to form a data frame where each row represents one bus stop code, similar to the BusStop sf data frame.
# Create a list of possible bus stop code, day type and time per hour for checking
BusStopList <- expand.grid(ORIGIN_PT_CODE = unique(BusStop$BUS_STOP_N),
DAY_TYPE = c("WEEKDAY", "WEEKENDS/HOLIDAY"),
TIME_PER_HOUR = unique(odbus$TIME_PER_HOUR))
# Commute commuting flow by bus stop code, day type and time per hour
odbus_recoded <- odbus %>%
group_by(ORIGIN_PT_CODE, DAY_TYPE, TIME_PER_HOUR) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
# Identify missing Bus Stops and impute a 0 value
odbus_missing <- BusStopList %>%
anti_join(odbus_recoded, by = c("ORIGIN_PT_CODE", "DAY_TYPE", "TIME_PER_HOUR")) %>%
mutate(TRIPS = 0)
# Combine the above two data frames
odbus_long <- rbind(odbus_recoded, odbus_missing) %>%
mutate(peak_period = case_when(
DAY_TYPE == "WEEKDAY" & TIME_PER_HOUR >= 6 & TIME_PER_HOUR <=9 ~
"weekday_morn",
DAY_TYPE == "WEEKDAY" & TIME_PER_HOUR >= 17 & TIME_PER_HOUR <=20 ~
"weekday_evening",
DAY_TYPE == "WEEKENDS/HOLIDAY" & TIME_PER_HOUR >= 11 & TIME_PER_HOUR <=14 ~
"weekend_morn",
DAY_TYPE == "WEEKENDS/HOLIDAY" & TIME_PER_HOUR >= 16 & TIME_PER_HOUR <=19 ~
"weekend_evening",
TRUE ~ "non_peak"))
# Pivot wider to form data frame with one bus stop code per row
odbus_wide_peak <- odbus_long %>%
group_by(ORIGIN_PT_CODE, peak_period) %>%
summarise(TOTAL_TRIPS = sum(TRIPS)) %>%
pivot_wider(names_from = peak_period,
values_from = TOTAL_TRIPS)
odbus_wide <- odbus_long %>%
filter(!(peak_period == "non_peak")) %>%
group_by(ORIGIN_PT_CODE, DAY_TYPE, TIME_PER_HOUR) %>%
summarise(TOTAL_TRIPS = sum(TRIPS)) %>%
mutate(Day_Time = paste(DAY_TYPE, TIME_PER_HOUR)) %>%
ungroup %>%
select(-c(DAY_TYPE, TIME_PER_HOUR)) %>%
pivot_wider(names_from = Day_Time,
values_from = TOTAL_TRIPS) %>%
left_join(odbus_wide_peak)
left_join()
of dplyr is then used to join the geographical data and commuting flow data table using the Bus Stop Code as the common identifier. Left join is done to ensure that the geospatial properties (geometry column) of the BusStop
sf data frame is retained and bus stops not found within the geospatial data set are dropped.
Classes 'sf' and 'data.frame': 5145 obs. of 25 variables:
$ BUS_STOP_N : Factor w/ 5199 levels "01012","01013",..: 1008 1724 2118 4972 431 3787 1160 2941 1610 4983 ...
$ BUS_ROOF_N : chr "B06" "B23" "B01" "B05" ...
$ LOC_DESC : chr "OPP CEVA LOGISTICS" "AFT TRACK 13" "BLK 239" "GRACE INDEPENDENT CH" ...
$ WEEKDAY 6 : num 0 0 497 67 46 ...
$ WEEKDAY 7 : num 9 0 623 128 36 ...
$ WEEKDAY 8 : num 1 0 631 62 60 ...
$ WEEKDAY 9 : num 5 0 373 50 48 ...
$ WEEKDAY 17 : num 19 0 654 254 47 ...
$ WEEKDAY 18 : num 17 0 485 104 60 ...
$ WEEKDAY 19 : num 6 0 382 33 58 70 149 284 841 221 ...
$ WEEKDAY 20 : num 1 0 221 8 30 42 18 140 610 91 ...
$ WEEKENDS/HOLIDAY 11: num 0 0 178 27 12 57 3 415 502 234 ...
$ WEEKENDS/HOLIDAY 12: num 2 0 201 39 32 44 3 447 551 246 ...
$ WEEKENDS/HOLIDAY 13: num 1 0 183 17 24 27 4 320 560 206 ...
$ WEEKENDS/HOLIDAY 14: num 3 0 138 27 13 30 0 254 357 172 ...
$ WEEKENDS/HOLIDAY 16: num 6 1 130 21 21 34 17 131 465 232 ...
$ WEEKENDS/HOLIDAY 17: num 4 0 151 23 10 41 37 133 541 171 ...
$ WEEKENDS/HOLIDAY 18: num 1 0 135 29 11 57 15 142 499 167 ...
$ WEEKENDS/HOLIDAY 19: num 1 0 130 11 19 41 15 91 368 140 ...
$ non_peak : num 163 6 3813 935 412 ...
$ weekday_evening : num 43 0 1742 399 195 ...
$ weekday_morn : num 15 0 2124 307 190 ...
$ weekend_evening : num 12 1 546 84 61 ...
$ weekend_morn : num 6 0 700 110 81 ...
$ geometry :sfc_POINT of length 5145; first list element: 'XY' num 13576 32884
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr [1:24] "BUS_STOP_N" "BUS_ROOF_N" "LOC_DESC" "WEEKDAY 6" ...
There are a total of 5145 observations, which matches the number of features in the BusStop
sf data frame.
Spatial grids are commonly used in spatial analysis to divide the study area into equal-sized, regular polygons that can tessellate the whole study area. After that, a variable is summarized within each polygon. For this exercise, a hexagon layer of 250m (where 250m is the perpendicular distance between the hexagon centre and its edges) will be created.
In the code chunk below, st_make_grid
is used to create the hexagon grids using the svy21
projected coordinate system. In st_make_grid
, cellsize
argument refers to the cell size in the units that the crs of the spatial data is using. The cellsize can be defined as the distance between opposite edges. Since the data is set in SVY21 projected coordinate system, which uses metres as the unit, the value is set as c(500,500)
to create a hexagon layer of 250m. The resulting data frame is then converted into a sf data frame and an index is added for each hexagon grid.
In the code chunk below, st_join
and st_intersects()
is used to return a list of bus stops lying within each hexagon grid. A left join is done in this process (using the argument left = TRUE
) to ensure that the geospatial properties (geometry column) of the hex_grid
sf data frame is retained. After filtering out hexagon grids that do not contain any bus stops, the number of bus stops and the total number of trips at each time period is then computed for each hexagon grid.
hex_grid_count <- st_join(hex_grid, odBusStop, join = st_intersects, left = TRUE) %>%
filter(!(is.na(BUS_STOP_N))) %>%
group_by(index) %>%
summarise(
count = n(),
bus_stop_codes = paste(BUS_STOP_N, collapse = ", "),
bus_stop_names = paste(LOC_DESC, collapse = ", "),
across(starts_with("week"), sum)
) %>%
ungroup()
The hex_grid_count
sf data frame can now be used for plotting the geographical distribution of peak hour bus stops and commuting flow using tmap.
tmap package can be used to help visualise the geographical distribution of the commuting flow over the four target time periods. Such geovisualisations can help identify areas with high commuting flow, which can be further used for other operational needs such as traffic management, bus route planning and crowd control management.
For efficiency of plotting, only the columns that will be used for this section will be selected.
The following code chunks will review the geographical distribution of the weekdays and weekends peak hour commuting flows using a continuous scale fill.
The geographical distribution of the weekdays peak hour commuting flows is plotted side by side using tm_facets()
function. The free.scales
argument is set to FALSE
to show a consistent scale across both time periods.
The visualisation above shows a heavily right-skewed distribution across all the weekday peak hour time periods. Most bus stops have less than 100k commuter trips at each of these time periods. Given the current geographical distribution, it is challenging to identify the areas with high commuting flow. However, notable hexagon grids that can be distinctively identified as having high commuting flow >300k are:
The code chunk below displays the hexagon grids with significantly high commuting flow >300k in greater detail.
index | bus_stop_names | weekday_morn | weekday_evening | geometry |
---|---|---|---|---|
1251 | BLK 664C, Blk 662A, BLK 662D, BOON LAY MRT STN, BOON LAY INT, OPP BLK 662C, OPP BOON LAY MRT STN | 396701 | 551636 | POLYGON ((13720.12 35286.69… |
2411 | BLK 515, BEF W’LANDS STN EXIT 7, WOODLANDS REG INT, W’LANDS CIVIC CTR, BLK 894C, W’LANDS STN EXIT 4, W’LANDS STN EXIT 5, OPP W’LANDS CIVIC CTR | 313192 | 474311 | POLYGON ((22970.12 46112.01… |
3239 | BLK 322, BLK 422, AFT ANG MO KIO INT, AFT ANG MO KIO STN EXIT A, ANG MO KIO INT, ANG MO KIO STN | 212340 | 325858 | POLYGON ((29720.12 38750.79… |
4349 | BEDOK STN EXIT B, PANASONIC, OPP PANASONIC, BEDOK BUS INT, BEDOK STN EXIT A | 187052 | 370328 | POLYGON ((38720.12 33554.64… |
4539 | TAMPINES INT, TAMPINES INT, UOB TAMPINES CTR, OPP CENTURY SQ, OPP OUR TAMPINES HUB, OUR TAMPINES HUB, Aft Tampines Stn Exit E, Tampines Stn Exit D | 202366 | 446207 | POLYGON ((40220.12 37018.74… |
The geographical distribution of the weekends / holidays peak hour commuting flows is plotted side by side using tm_facets()
function. The free.scales
argument is set to FALSE
to show a consistent scale across both time periods.
tmap_mode("view")
tm_shape(hex_grid_count2) +
tm_fill(col = c("weekend_morn", "weekend_evening"),
palette = "Purples",
style = "cont",
popup.vars = c("Bus Stops" = "bus_stop_codes",
"Weekend/PH Morning" = "weekend_morn",
"Weekend/PH Evening" = "weekend_evening")) +
tm_facets(ncol = 1, free.scales = FALSE)
Similarly, the visualisation above shows a heavily right-skewed distribution across all the weekend peak hour time periods. However, the continuous scale suggests that weekend peak hour commuting flow volume is approximately one-fifth of weekday peak hour commuting flow volume. Most bus stops have less than 20k commuter trips at each of these time periods. Again, it is challenging to identify the areas with high commuting flow but notable hexagon grids that can be distinctively identified as having high commuting flow >80k are:
The code chunk below displays the hexagon grids with significantly more commuting flow >80k in greater detail.
index | bus_stop_names | weekend_morn | weekend_evening | geometry |
---|---|---|---|---|
1251 | BLK 664C, Blk 662A, BLK 662D, BOON LAY MRT STN, BOON LAY INT, OPP BLK 662C, OPP BOON LAY MRT STN | 108551 | 149047 | POLYGON ((13720.12 35286.69… |
2411 | BLK 515, BEF W’LANDS STN EXIT 7, WOODLANDS REG INT, W’LANDS CIVIC CTR, BLK 894C, W’LANDS STN EXIT 4, W’LANDS STN EXIT 5, OPP W’LANDS CIVIC CTR | 99396 | 123274 | POLYGON ((22970.12 46112.01… |
3239 | BLK 322, BLK 422, AFT ANG MO KIO INT, AFT ANG MO KIO STN EXIT A, ANG MO KIO INT, ANG MO KIO STN | 79535 | 94824 | POLYGON ((29720.12 38750.79… |
3578 | BLK 421, SERANGOON INT, S’GOON STN EXIT A / BLK 413, S’GOON STN EXIT C / BLK 201, S’GOON STN EXIT E, S’GOON STN EXIT H, BLK 409, S’GOON STN EXIT D / BLK 416 | 57084 | 80507 | POLYGON ((32470.12 36585.73… |
4349 | BEDOK STN EXIT B, PANASONIC, OPP PANASONIC, BEDOK BUS INT, BEDOK STN EXIT A | 101960 | 118967 | POLYGON ((38720.12 33554.64… |
4539 | TAMPINES INT, TAMPINES INT, UOB TAMPINES CTR, OPP CENTURY SQ, OPP OUR TAMPINES HUB, OUR TAMPINES HUB, Aft Tampines Stn Exit E, Tampines Stn Exit D | 95639 | 124569 | POLYGON ((40220.12 37018.74… |
However, due to the heavily right-skewed distribution, not much can be glimpsed from the current geovisualisation using continuous scale fill. A data classification method will be used to replot the maps.
The Jenks optimization method, also called the Jenks natural breaks classification method, is a data clustering method designed to determine the best arrangement of values into different classes by seeking to reduce the variance within classes and maximize the variance between classes. For this exercise, the number of breaks is defined at n=6
.
The following code chunks will review the geographical distribution of the weekdays and weekends peak hour commuting flows using a continuous scale fill.
The geographical distribution of the weekdays peak hour commuting flows is plotted side by side using tm_facets()
function. The free.scales
argument is set to FALSE
to show a consistent scale across both time periods.
tmap_mode("view")
tm_shape(hex_grid_count2) +
tm_fill(col = c("weekday_morn", "weekday_evening"),
palette = "Blues",
n = 6, style = "jenks",
popup.vars = c("Bus Stops" = "bus_stop_codes",
"Weekday Morning" = "weekday_morn",
"Weekday Evening" = "weekday_evening")) +
tm_facets(ncol = 1, free.scales = FALSE)
index | bus_stop_names | weekday_morn | weekday_evening | geometry |
---|---|---|---|---|
1251 | BLK 664C, Blk 662A, BLK 662D, BOON LAY MRT STN, BOON LAY INT, OPP BLK 662C, OPP BOON LAY MRT STN | 396701 | 551636 | POLYGON ((13720.12 35286.69… |
1438 | LAKESIDE STN, OPP BLK 203, BLK 517, OPP LAKESIDE STN | 94194 | 172864 | POLYGON ((15220.12 36152.72… |
1815 | CHOA CHU KANG INT, LOT 1 / CHOA CHU KANG STN, OPP NORTHVALE CONDO, OPP CHOA CHU KANG STN | 156028 | 289879 | POLYGON ((18220.12 40482.84… |
1904 | BLK 644, BLK 231, BLK 503, BLK 225, BT BATOK INT, BLK 240 | 137323 | 164006 | POLYGON ((18970.12 36585.73… |
2054 | BLK 431, CLEMENTI STN, CLEMENTI STN | 202467 | 184460 | POLYGON ((20220.12 32688.62… |
2062 | BT PANJANG STN EXIT A/ LRT, OPP BT PANJANG STN, AFT BT PANJANG STN, BT PANJANG INT, THE LINEAR, BT PANJANG STN EXIT B, OPP BLK 180, BLK 180 | 95420 | 230072 | POLYGON ((20220.12 39616.82… |
2411 | BLK 515, BEF W’LANDS STN EXIT 7, WOODLANDS REG INT, W’LANDS CIVIC CTR, BLK 894C, W’LANDS STN EXIT 4, W’LANDS STN EXIT 5, OPP W’LANDS CIVIC CTR | 313192 | 474311 | POLYGON ((22970.12 46112.01… |
3204 | BLK 84B, OPP TOA PAYOH STN, BLK 138B, BLK 79C, TOA PAYOH BUS INT, TOA PAYOH BUS INT, TOA PAYOH STN | 166638 | 248170 | POLYGON ((29470.12 34853.68… |
3239 | BLK 322, BLK 422, AFT ANG MO KIO INT, AFT ANG MO KIO STN EXIT A, ANG MO KIO INT, ANG MO KIO STN | 212340 | 325858 | POLYGON ((29720.12 38750.79… |
3578 | BLK 421, SERANGOON INT, S’GOON STN EXIT A / BLK 413, S’GOON STN EXIT C / BLK 201, S’GOON STN EXIT E, S’GOON STN EXIT H, BLK 409, S’GOON STN EXIT D / BLK 416 | 111428 | 299786 | POLYGON ((32470.12 36585.73… |
3859 | NAUNG RESIDENCE, THE MIDTOWN, HOUGANG CTRL INT, BLK 302, HOUGANG CTRL INT, BLK 327 | 106496 | 183771 | POLYGON ((34720.12 38750.79… |
3924 | BLK 248A, SENGKANG INT, COMPASSVALE INT, BLK 240, Sengkang Stn, BLK 259C, BLK 250A | 77090 | 211180 | POLYGON ((35220.12 41348.87… |
3957 | PUNGGOL STN / WATERWAY PT, PUNGGOL BUS INTERCHANGE, BEF BLK 264, OPP BLK 272C, BLK 272C | 88871 | 179509 | POLYGON ((35470.12 42647.91… |
4018 | BLK 102C, BEF PUNGGOL RD, AFT PUNGGOL RD, OPP BLK 103A | 207583 | 146562 | POLYGON ((35970.12 41781.88… |
4349 | BEDOK STN EXIT B, PANASONIC, OPP PANASONIC, BEDOK BUS INT, BEDOK STN EXIT A | 187052 | 370328 | POLYGON ((38720.12 33554.64… |
4539 | TAMPINES INT, TAMPINES INT, UOB TAMPINES CTR, OPP CENTURY SQ, OPP OUR TAMPINES HUB, OUR TAMPINES HUB, Aft Tampines Stn Exit E, Tampines Stn Exit D | 202366 | 446207 | POLYGON ((40220.12 37018.74… |
4635 | OPP PASIR RIS STN EXIT A, PASIR RIS INT, PASIR RIS INT, AFT PASIR RIS ST 53, AFT PASIR RIS DR 3, BLK 586, BEF PASIR RIS ST 53, AFT PASIR RIS STN EXIT A, BEF PASIR RIS STN | 88685 | 220510 | POLYGON ((40970.12 39183.81… |
The geographical distribution of weekday peak hour commuting flow is much clearer in this geovisualisation using Jenks Natural Breaks. Areas with high commuting flow could be defined as the last two classes (commuting flow >156k) and the code chunk above displays these hexagon grids in greater detail. It can also be observed that all of the highest commuting flow areas are in the housing estates.
The geographical distribution of the weekends / holidays peak hour commuting flows is plotted side by side using tm_facets()
function. The free.scales
argument is set to FALSE
to show a consistent scale across both time periods.
tmap_mode("view")
tm_shape(hex_grid_count2) +
tm_fill(col = c("weekend_morn", "weekend_evening"),
palette = "Purples",
n = 6, style = "jenks",
popup.vars = c("Bus Stops" = "bus_stop_codes",
"Weekend/PH Morning" = "weekend_morn",
"Weekend/PH Evening" = "weekend_evening")) +
tm_facets(ncol = 1, free.scales = FALSE)
The code chunk below displays the hexagon grids that fall into the last two classes (>31k) in greater detail.
index | bus_stop_names | weekend_morn | weekend_evening | geometry |
---|---|---|---|---|
1251 | BLK 664C, Blk 662A, BLK 662D, BOON LAY MRT STN, BOON LAY INT, OPP BLK 662C, OPP BOON LAY MRT STN | 108551 | 149047 | POLYGON ((13720.12 35286.69… |
1438 | LAKESIDE STN, OPP BLK 203, BLK 517, OPP LAKESIDE STN | 23964 | 37003 | POLYGON ((15220.12 36152.72… |
1778 | J GATEWAY, JURONG EAST TEMP INT, OPP JEM | 25234 | 37976 | POLYGON ((17970.12 34853.68… |
1815 | CHOA CHU KANG INT, LOT 1 / CHOA CHU KANG STN, OPP NORTHVALE CONDO, OPP CHOA CHU KANG STN | 62447 | 78360 | POLYGON ((18220.12 40482.84… |
1904 | BLK 644, BLK 231, BLK 503, BLK 225, BT BATOK INT, BLK 240 | 39157 | 43066 | POLYGON ((18970.12 36585.73… |
2024 | OPP REGENT PK, NAN HUA PR SCH, CLEMENTI INT, REGENT PK | 37828 | 40308 | POLYGON ((19970.12 33121.63… |
2054 | BLK 431, CLEMENTI STN, CLEMENTI STN | 44441 | 52125 | POLYGON ((20220.12 32688.62… |
2062 | BT PANJANG STN EXIT A/ LRT, OPP BT PANJANG STN, AFT BT PANJANG STN, BT PANJANG INT, THE LINEAR, BT PANJANG STN EXIT B, OPP BLK 180, BLK 180 | 48824 | 64095 | POLYGON ((20220.12 39616.82… |
2102 | W’LANDS CHECKPT | 42251 | 38354 | POLYGON ((20470.12 46978.04… |
2133 | W’LANDS CHECKPT | 28240 | 38286 | POLYGON ((20720.12 47411.05… |
2135 | JOHOR BAHRU CHECKPT, JOHOR BAHRU CHECKPT | 31773 | 38431 | POLYGON ((20720.12 49143.1,… |
2411 | BLK 515, BEF W’LANDS STN EXIT 7, WOODLANDS REG INT, W’LANDS CIVIC CTR, BLK 894C, W’LANDS STN EXIT 4, W’LANDS STN EXIT 5, OPP W’LANDS CIVIC CTR | 99396 | 123274 | POLYGON ((22970.12 46112.01… |
2854 | HARBOURFRONT STN EXIT A, OPP VIVOCITY, HARBOURFRONT/VIVOCITY | 24091 | 41675 | POLYGON ((26720.12 27492.46… |
2950 | OPP TIONG BAHRU STN, TIONG BAHRU STN, CTRL GREEN CONDO | 37980 | 35154 | POLYGON ((27470.12 29657.53… |
3014 | OPP NGEE ANN CITY, BEF ORCHARD STN EXIT B, ORCHARD STN / LUCKY PLAZA, ORCHARD STN / TANGS PLAZA, OPP ORCHARD STN / ION | 35047 | 54816 | POLYGON ((27970.12 31389.58… |
3166 | OPP HONG LIM CPLX, CHINATOWN STN EXIT E, MAXWELL RD FC, NEW BRIDGE CTR, OPP SRI MARIAMMAN TP, CHINATOWN STN EXIT C | 43599 | 45553 | POLYGON ((29220.12 29224.51… |
3204 | BLK 84B, OPP TOA PAYOH STN, BLK 138B, BLK 79C, TOA PAYOH BUS INT, TOA PAYOH BUS INT, TOA PAYOH STN | 71600 | 78369 | POLYGON ((29470.12 34853.68… |
3239 | BLK 322, BLK 422, AFT ANG MO KIO INT, AFT ANG MO KIO STN EXIT A, ANG MO KIO INT, ANG MO KIO STN | 79535 | 94824 | POLYGON ((29720.12 38750.79… |
3292 | ST JOSEPH’S CH, AFT BUGIS STN EXIT C, OPP BUGIS STN EXIT C, BRAS BASAH CPLX, HOTEL GRAND PACIFIC, BEF BENCOOLEN STN EXIT A, AFT WATERLOO ST, BEF WATERLOO ST | 29286 | 38048 | POLYGON ((30220.12 30956.57… |
3578 | BLK 421, SERANGOON INT, S’GOON STN EXIT A / BLK 413, S’GOON STN EXIT C / BLK 201, S’GOON STN EXIT E, S’GOON STN EXIT H, BLK 409, S’GOON STN EXIT D / BLK 416 | 57084 | 80507 | POLYGON ((32470.12 36585.73… |
3822 | SIMS VILLE, BLK 1015, PAYA LEBAR STN EXIT D, OPP BLK 1015, PAYA LEBAR STN EXIT C, PAYA LEBAR STN EXIT B | 28803 | 34103 | POLYGON ((34470.12 33121.63… |
3859 | NAUNG RESIDENCE, THE MIDTOWN, HOUGANG CTRL INT, BLK 302, HOUGANG CTRL INT, BLK 327 | 46128 | 50760 | POLYGON ((34720.12 38750.79… |
3924 | BLK 248A, SENGKANG INT, COMPASSVALE INT, BLK 240, Sengkang Stn, BLK 259C, BLK 250A | 44740 | 53323 | POLYGON ((35220.12 41348.87… |
3957 | PUNGGOL STN / WATERWAY PT, PUNGGOL BUS INTERCHANGE, BEF BLK 264, OPP BLK 272C, BLK 272C | 41764 | 49806 | POLYGON ((35470.12 42647.91… |
4006 | OPP PARKWAY PARADE, PARKWAY PARADE, OPP 112 KATONG | 31559 | 37707 | POLYGON ((35970.12 31389.58… |
4018 | BLK 102C, BEF PUNGGOL RD, AFT PUNGGOL RD, OPP BLK 103A | 39349 | 41052 | POLYGON ((35970.12 41781.88… |
4349 | BEDOK STN EXIT B, PANASONIC, OPP PANASONIC, BEDOK BUS INT, BEDOK STN EXIT A | 101960 | 118967 | POLYGON ((38720.12 33554.64… |
4539 | TAMPINES INT, TAMPINES INT, UOB TAMPINES CTR, OPP CENTURY SQ, OPP OUR TAMPINES HUB, OUR TAMPINES HUB, Aft Tampines Stn Exit E, Tampines Stn Exit D | 95639 | 124569 | POLYGON ((40220.12 37018.74… |
4635 | OPP PASIR RIS STN EXIT A, PASIR RIS INT, PASIR RIS INT, AFT PASIR RIS ST 53, AFT PASIR RIS DR 3, BLK 586, BEF PASIR RIS ST 53, AFT PASIR RIS STN EXIT A, BEF PASIR RIS STN | 40921 | 54305 | POLYGON ((40970.12 39183.81… |
For the weekend peak hour commuting flow, areas with high commuting flow could be defined as the last two classes (commuting flow >31k) and the code chunk above displays these hexagon grids in greater detail. Using Jenks Natural Breaks for data classification also reveals more high commuting flow areas that are not just in the major housing estates, but also in the town area and the area surrounding Woodlands Checkpoint.
Given that each target time period spans across four whole hours, it would also be useful to identify time trends across the four hours among these high commuting flow areas. This could help guide better operational decisions such as staffing decisions. Emerging Hot Spot Analysis (EHSA) could be performed to uncover this as it aims to classify each location into one of seventeen categories based on ESRI’s emerging hot spot classification criteria.
To perform EHSA, time series cubes are needed.
Spatio-temporal data are typically presented in long formats, where a row identifies a unique location and time observation represented by a column dedicated to time and another to locations. However, such data are traditionally not linked to the geographies as they contain only an identifier of the location, but not the spatial representation that makes the data meaningful for geospatial analysis.
The spacetime
class in sfdep package can be used to create lattice data that is a representation of spatio-temporal data for a set of regions containing the geometries over a number of different time-periods. There are two possible layouts: spatio-temporal full grid and sparse grids.
Given a number of spatial features n
, and time periods m
, a spatio-temporal full grid contains n×m
rows. Each location has a recorded observation for each of the time periods in m
. When there are missing observations for some locations or time periods and they are entirely omitted from the data set, that is a spatio-temporal sparse grid.
For this analysis, time series cubes will be created for each of the four target time periods respectively using the earlier created hex_grid_count
.
In the code chunks below, relevant columns that correspond to the target time periods will be selected from hex_grid_count
and joined with hex_grid
to create the n
number of spatial features. The corresponding sf data frame will then be pivoted longer to create a time series data containing the m
time periods. Once the data frame has nxm
rows, apply the spacetime()
function to create the time series cube.
# Select relevant columns from hex_grid_count
weekday_morn_hex <- hex_grid_count %>%
select(index, count, bus_stop_codes, bus_stop_names, geometry,
`WEEKDAY 6`, `WEEKDAY 7`, `WEEKDAY 8`, `WEEKDAY 9`)
# Join with hex_grid and pivot longer to create time series data
weekday_morn_full <- st_join(hex_grid, weekday_morn_hex, join = st_within) %>%
select(-c(index.y)) %>%
rename(index = index.x) %>%
rename_with(~gsub("WEEKDAY ", "", .), matches("^WEEKDAY \\d+$")) %>%
pivot_longer(cols = c(`6`, `7`, `8`, `9`),
names_to = "TIME_PER_HOUR", values_to = "TRIPS",
names_transform = as.integer, values_transform = as.integer) %>%
mutate(across(where(is.integer), ~ifelse(is.na(.), 0, .))) %>%
select(index, TIME_PER_HOUR, TRIPS) %>%
st_set_geometry(NULL)
# Apply spacetime() function
weekday_morn_cube <- spacetime(weekday_morn_full, hex_grid,
.loc_col = "index", .time_col = "TIME_PER_HOUR")
# Verify that resultant output is a spacetime cube
is_spacetime_cube(weekday_morn_cube)
[1] TRUE
The TRUE
return confirms that weekday_morn_cube
object is indeed an time-space cube.
# Select relevant columns from hex_grid_count
weekday_evening_hex <- hex_grid_count %>%
select(index, count, bus_stop_codes, bus_stop_names, geometry,
`WEEKDAY 17`, `WEEKDAY 18`, `WEEKDAY 19`, `WEEKDAY 20`)
# Join with hex_grid and pivot longer to create time series data
weekday_evening_full <- st_join(hex_grid, weekday_evening_hex, join = st_within) %>%
select(-c(index.y)) %>%
rename(index = index.x) %>%
rename_with(~gsub("WEEKDAY ", "", .), matches("^WEEKDAY \\d+$")) %>%
pivot_longer(cols = c(`17`, `18`, `19`, `20`),
names_to = "TIME_PER_HOUR", values_to = "TRIPS",
names_transform = as.integer, values_transform = as.integer) %>%
mutate(across(where(is.integer), ~ifelse(is.na(.), 0, .))) %>%
select(index, TIME_PER_HOUR, TRIPS) %>%
st_set_geometry(NULL)
# Apply spacetime() function
weekday_evening_cube <- spacetime(weekday_evening_full, hex_grid,
.loc_col = "index", .time_col = "TIME_PER_HOUR")
# Verify that resultant output is a spacetime cube
is_spacetime_cube(weekday_evening_cube)
[1] TRUE
The TRUE
return confirms that weekday_evening_cube
object is indeed an time-space cube.
# Select relevant columns from hex_grid_count
weekend_morn_hex <- hex_grid_count %>%
select(index, count, bus_stop_codes, bus_stop_names, geometry,
`WEEKENDS/HOLIDAY 11`, `WEEKENDS/HOLIDAY 12`,
`WEEKENDS/HOLIDAY 13`, `WEEKENDS/HOLIDAY 14`)
# Join with hex_grid and pivot longer to create time series data
weekend_morn_full <- st_join(hex_grid, weekend_morn_hex, join = st_within) %>%
select(-c(index.y)) %>%
rename(index = index.x) %>%
rename_with(~gsub("WEEKENDS/HOLIDAY ", "", .), matches("^WEEKENDS/HOLIDAY \\d+$")) %>%
pivot_longer(cols = c(`11`, `12`, `13`, `14`),
names_to = "TIME_PER_HOUR", values_to = "TRIPS",
names_transform = as.integer, values_transform = as.integer) %>%
mutate(across(where(is.integer), ~ifelse(is.na(.), 0, .))) %>%
select(index, TIME_PER_HOUR, TRIPS) %>%
st_set_geometry(NULL)
# Apply spacetime() function
weekend_morn_cube <- spacetime(weekend_morn_full, hex_grid,
.loc_col = "index", .time_col = "TIME_PER_HOUR")
# Verify that resultant output is a spacetime cube
is_spacetime_cube(weekend_morn_cube)
[1] TRUE
The TRUE
return confirms that weekend_morn_cube
object is indeed an time-space cube.
# Select relevant columns from hex_grid_count
weekend_evening_hex <- hex_grid_count %>%
select(index, count, bus_stop_codes, bus_stop_names, geometry,
`WEEKENDS/HOLIDAY 16`, `WEEKENDS/HOLIDAY 17`,
`WEEKENDS/HOLIDAY 18`, `WEEKENDS/HOLIDAY 19`)
# Join with hex_grid and pivot longer to create time series data
weekend_evening_full <- st_join(hex_grid, weekend_evening_hex, join = st_within) %>%
select(-c(index.y)) %>%
rename(index = index.x) %>%
rename_with(~gsub("WEEKENDS/HOLIDAY ", "", .), matches("^WEEKENDS/HOLIDAY \\d+$")) %>%
pivot_longer(cols = c(`16`, `17`, `18`, `19`),
names_to = "TIME_PER_HOUR", values_to = "TRIPS",
names_transform = as.integer, values_transform = as.integer) %>%
mutate(across(where(is.integer), ~ifelse(is.na(.), 0, .))) %>%
select(index, TIME_PER_HOUR, TRIPS) %>%
st_set_geometry(NULL)
# Apply spacetime() function
weekend_evening_cube <- spacetime(weekend_evening_full, hex_grid,
.loc_col = "index", .time_col = "TIME_PER_HOUR")
# Verify that resultant output is a spacetime cube
is_spacetime_cube(weekend_evening_cube)
[1] TRUE
The TRUE
return confirms that weekend_evening_cube
object is indeed an time-space cube.
To compute the local Gi* statistics, spatial weights must first be derived. The code chunks below will be used to identify neighbors and to derive an inverse distance weights for each target time period. activate()
is used to first activate the geometry context, followed by mutate()
to create two new columns nb
and wt
. Then activate the data context again and copy over the nb
and wt
columns to each time-slice using set_nbs()
and set_wts()
.
In the code chunks below, the new columns are used to manually calculate the local Gi* for each location. This is done by grouping by TIME_PER_HOUR
and using local_gstar_perm()
of sfdep package. After which, unnest()
is used to unnest the gi_star
column of the newly created output.
With the above Gi* statistics, it is possible to evaluate each location for a trend using the Mann-Kendall test.
The code chunk below performs the Mann-Kendall Test for the following high commuting flow hexagon grids:
[1] 1251 1438 1815 1904 2054 2062 2411 3204 3239 3578 3859 3924 3957 4018 4349
[16] 4539 4635
plot_weekdaymorn <- gi_weekday_morn %>%
ungroup() %>%
filter(index %in% weekday_156k$index) |>
select(index, TIME_PER_HOUR, gi_star)
# Plot result using ggplot2 functions and ggplotly() of plotly package
p1 <- ggplot(data = plot_weekdaymorn,
aes(x = TIME_PER_HOUR,
y = gi_star,
color = as.factor(index))) +
geom_line() +
labs(color = "Hexagon Grid") +
theme_light()
ggplotly(p1)
The Mann-Kendall test is done for each hexagon grid in the code chunk below.
# A tibble: 17 × 7
index tau sl S D varS geometry
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <POLYGON [m]>
1 3859 1.00 0.0894 6 6.00 8.67 ((34720.12 38750.79, 34470.12 38895.13…
2 1438 -0.667 0.308 -4 6.00 8.67 ((15220.12 36152.72, 14970.12 36297.06…
3 2054 0.667 0.308 4 6.00 8.67 ((20220.12 32688.62, 19970.12 32832.95…
4 2062 -0.667 0.308 -4 6.00 8.67 ((20220.12 39616.82, 19970.12 39761.16…
5 2411 -0.667 0.308 -4 6.00 8.67 ((22970.12 46112.01, 22720.12 46256.35…
6 3204 0.667 0.308 4 6.00 8.67 ((29470.12 34853.68, 29220.12 34998.02…
7 3578 0.667 0.308 4 6.00 8.67 ((32470.12 36585.73, 32220.12 36730.07…
8 3924 -0.667 0.308 -4 6.00 8.67 ((35220.12 41348.87, 34970.12 41493.21…
9 3957 -0.667 0.308 -4 6.00 8.67 ((35470.12 42647.91, 35220.12 42792.25…
10 4018 -0.667 0.308 -4 6.00 8.67 ((35970.12 41781.88, 35720.12 41926.22…
11 4635 -0.667 0.308 -4 6.00 8.67 ((40970.12 39183.81, 40720.12 39328.14…
12 1815 -0.333 0.734 -2 6.00 8.67 ((18220.12 40482.84, 17970.12 40627.18…
13 1904 -0.333 0.734 -2 6.00 8.67 ((18970.12 36585.73, 18720.12 36730.07…
14 3239 0.333 0.734 2 6.00 8.67 ((29720.12 38750.79, 29470.12 38895.13…
15 4539 0.333 0.734 2 6.00 8.67 ((40220.12 37018.74, 39970.12 37163.08…
16 1251 0 1 0 6.00 8.67 ((13720.12 35286.69, 13470.12 35431.03…
17 4349 0 1 0 6.00 8.67 ((38720.12 33554.64, 38470.12 33698.98…
In the above result, sl is the p-value.
The chart and the table above suggests a slight upward trend across the 4 hours of the weekday morning peak period for hexagon grid 3859.
For all the other hexagon grids, there is no significant upward or downward trend in commuting flow across the 4 hours of the weekday morning peak period.
The code chunk below performs the Mann-Kendall Test for the following high commuting flow hexagon grids:
[1] 1251 1438 1815 1904 2054 2062 2411 3204 3239 3578 3859 3924 3957 4018 4349
[16] 4539 4635
plot_weekdayevening <- gi_weekday_evening %>%
ungroup() %>%
filter(index %in% weekday_156k$index) |>
select(index, TIME_PER_HOUR, gi_star)
# Plot result using ggplot2 functions and ggplotly() of plotly package
p2 <- ggplot(data = plot_weekdayevening,
aes(x = TIME_PER_HOUR,
y = gi_star,
color = as.factor(index))) +
geom_line() +
labs(color = "Hexagon Grid") +
theme_light()
ggplotly(p2)
The Mann-Kendall test is done for each hexagon grid in the code chunk below.
# A tibble: 17 × 7
index tau sl S D varS geometry
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <POLYGON [m]>
1 1904 -1.00 0.0894 -6 6.00 8.67 ((18970.12 36585.73, 18720.12 36730.07…
2 4018 -1.00 0.0894 -6 6.00 8.67 ((35970.12 41781.88, 35720.12 41926.22…
3 1251 0.667 0.308 4 6.00 8.67 ((13720.12 35286.69, 13470.12 35431.03…
4 1438 -0.667 0.308 -4 6.00 8.67 ((15220.12 36152.72, 14970.12 36297.06…
5 2062 0.667 0.308 4 6.00 8.67 ((20220.12 39616.82, 19970.12 39761.16…
6 3204 -0.667 0.308 -4 6.00 8.67 ((29470.12 34853.68, 29220.12 34998.02…
7 3239 0.667 0.308 4 6.00 8.67 ((29720.12 38750.79, 29470.12 38895.13…
8 3578 0.667 0.308 4 6.00 8.67 ((32470.12 36585.73, 32220.12 36730.07…
9 4349 0.667 0.308 4 6.00 8.67 ((38720.12 33554.64, 38470.12 33698.98…
10 4539 0.667 0.308 4 6.00 8.67 ((40220.12 37018.74, 39970.12 37163.08…
11 4635 0.667 0.308 4 6.00 8.67 ((40970.12 39183.81, 40720.12 39328.14…
12 2054 -0.333 0.734 -2 6.00 8.67 ((20220.12 32688.62, 19970.12 32832.95…
13 3859 0.333 0.734 2 6.00 8.67 ((34720.12 38750.79, 34470.12 38895.13…
14 3957 0.333 0.734 2 6.00 8.67 ((35470.12 42647.91, 35220.12 42792.25…
15 1815 0 1 0 6.00 8.67 ((18220.12 40482.84, 17970.12 40627.18…
16 2411 0 1 0 6.00 8.67 ((22970.12 46112.01, 22720.12 46256.35…
17 3924 0 1 0 6.00 8.67 ((35220.12 41348.87, 34970.12 41493.21…
In the above result, sl is the p-value.
The chart and the table above suggests a slight downward trend across the 4 hours of the weekday evening peak period for hexagon grids 1904 and 4018.
For all the other hexagon grids, there is no significant upward or downward trend in commuting flow across the 4 hours of the weekday evening peak period.
The code chunk below performs the Mann-Kendall Test for the following high commuting flow hexagon grids:
[1] 1251 1438 1778 1815 1904 2024 2054 2062 2102 2133 2135 2411 2854 2950 3014
[16] 3166 3204 3239 3292 3578 3822 3859 3924 3957 4006 4018 4349 4539 4635
plot_weekendmorn <- gi_weekend_morn %>%
ungroup() %>%
filter(index %in% weekend_31k$index) |>
select(index, TIME_PER_HOUR, gi_star)
# Plot result using ggplot2 functions and ggplotly() of plotly package
p3 <- ggplot(data = plot_weekendmorn,
aes(x = TIME_PER_HOUR,
y = gi_star,
color = as.factor(index))) +
geom_line() +
labs(color = "Hexagon Grid") +
theme_light()
ggplotly(p3)
The Mann-Kendall test is done for each hexagon grid in the code chunk below.
# A tibble: 29 × 7
index tau sl S D varS geometry
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <POLYGON [m]>
1 1438 -1.00 0.0894 -6 6.00 8.67 ((15220.12 36152.72, 14970.12 36297.06,…
2 2102 -1.00 0.0894 -6 6.00 8.67 ((20470.12 46978.04, 20220.12 47122.37,…
3 2133 -1.00 0.0894 -6 6.00 8.67 ((20720.12 47411.05, 20470.12 47555.39,…
4 2950 -1.00 0.0894 -6 6.00 8.67 ((27470.12 29657.53, 27220.12 29801.86,…
5 3859 -1.00 0.0894 -6 6.00 8.67 ((34720.12 38750.79, 34470.12 38895.13,…
6 4018 -1.00 0.0894 -6 6.00 8.67 ((35970.12 41781.88, 35720.12 41926.22,…
7 1251 1.00 0.0894 6 6.00 8.67 ((13720.12 35286.69, 13470.12 35431.03,…
8 1778 1.00 0.0894 6 6.00 8.67 ((17970.12 34853.68, 17720.12 34998.02,…
9 2411 1.00 0.0894 6 6.00 8.67 ((22970.12 46112.01, 22720.12 46256.35,…
10 2854 1.00 0.0894 6 6.00 8.67 ((26720.12 27492.46, 26470.12 27636.8, …
# … with 19 more rows
In the above result, sl is the p-value.
The chart and the table above suggests a slight downward trend across the 4 hours of the weekend morning peak period for hexagon grids 1438, 2102, 2133, 2950, 3859 and 4018.
Conversely, there is a slight upward trend across the 4 hours of the weekend morning peak period for hexagon grids 1251, 1778, 2411, 2854, 3014, 3166, 3292, 3578, 3924, 3957, 4349, 4539 and 4635.
For all the other hexagon grids, there is no significant upward or downward trend in commuting flow across the 4 hours of the weekend morning peak period.
The code chunk below performs the Mann-Kendall Test for the following high commuting flow hexagon grids:
[1] 1251 1438 1778 1815 1904 2024 2054 2062 2102 2133 2135 2411 2854 2950 3014
[16] 3166 3204 3239 3292 3578 3822 3859 3924 3957 4006 4018 4349 4539 4635
plot_weekendevening <- gi_weekend_evening %>%
ungroup() %>%
filter(index %in% weekend_31k$index) |>
select(index, TIME_PER_HOUR, gi_star)
# Plot result using ggplot2 functions and ggplotly() of plotly package
p4 <- ggplot(data = plot_weekendevening,
aes(x = TIME_PER_HOUR,
y = gi_star,
color = as.factor(index))) +
geom_line() +
labs(color = "Hexagon Grid") +
theme_light()
ggplotly(p4)
The Mann-Kendall test is done for each hexagon grid in the code chunk below.
# A tibble: 29 × 7
index tau sl S D varS geometry
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <POLYGON [m]>
1 2950 -1.00 0.0894 -6 6.00 8.67 ((27470.12 29657.53, 27220.12 29801.86,…
2 3166 -1.00 0.0894 -6 6.00 8.67 ((29220.12 29224.51, 28970.12 29368.85,…
3 3292 -1.00 0.0894 -6 6.00 8.67 ((30220.12 30956.57, 29970.12 31100.9, …
4 3924 -1.00 0.0894 -6 6.00 8.67 ((35220.12 41348.87, 34970.12 41493.21,…
5 4018 -1.00 0.0894 -6 6.00 8.67 ((35970.12 41781.88, 35720.12 41926.22,…
6 1251 1.00 0.0894 6 6.00 8.67 ((13720.12 35286.69, 13470.12 35431.03,…
7 1778 1.00 0.0894 6 6.00 8.67 ((17970.12 34853.68, 17720.12 34998.02,…
8 2062 1.00 0.0894 6 6.00 8.67 ((20220.12 39616.82, 19970.12 39761.16,…
9 2411 1.00 0.0894 6 6.00 8.67 ((22970.12 46112.01, 22720.12 46256.35,…
10 2854 1.00 0.0894 6 6.00 8.67 ((26720.12 27492.46, 26470.12 27636.8, …
# … with 19 more rows
In the above result, sl is the p-value.
The chart and the table above suggests a slight downward trend across the 4 hours of the weekend evening peak period for hexagon grids 2950, 3166, 3292, 3924 and 4018.
Conversely, there is a slight upward trend across the 4 hours of the weekend morning peak period for hexagon grids 1251, 1778, 2062, 2411, 2854 and 4635.
For all the other hexagon grids, there is no significant upward or downward trend in commuting flow across the 4 hours of the weekend morning peak period.
In the code chunks below, emerging_hotspot_analysis()
of sfdep package is used to perform Emerging Hot Spot Analysis on a spacetime object e.g. weekday_morn_cube
. The .var
argument takes the name of a variable of interest (e.g. TRIPS
), the k
argument is used to specify the number of time lags which is set to 1 by default, and the nsim
argument maps the number of simulations to be performed.
For visualisation, ggplot2 functions are used to reveal the distribution of EHSA classes as a bar chart. To visualise the geographic distribution of the EHSA classes, hex_grid
sf data frame is joined to the EHSA data frames. Then, tmap functions will be used to plot categorical choropleth maps.
More than half of the hexagon grids had no pattern detected - likely due to the large number of 0s in the data frame. The next largest class of hexagon grids are “sporadic cold spots”, which refers to locations that are on-again then off-again cold spots.
The following choropleth map is charted at an alpha value of 0.1 (CI 90%). - Based on the Mann-Kendall Test performed earlier, there are no areas with a significant trend observed at alpha value of 0.05 (CI 95%).
wkd_morn_ehsa_grid <- hex_grid %>%
left_join(wkd_morn_ehsa,
by = c("index" = "location"))
# Plot the choropleth map for significant values
morn_ehsa_sig <- wkd_morn_ehsa_grid %>%
# Remove coldspot areas since most of these areas have low bus stop density
filter(p_value < 0.1)
tmap_mode("view")
tm_shape(morn_ehsa_sig) +
tm_fill("classification", palette = my_palette) +
tm_borders(alpha = 0.4)
Based on the map, there are clusters of hot spots in several areas of Singapore during weekday morning peak hours. Among the grids identified earlier as high commuting flow regions, most are also hot spot areas. However, there are also clusters which were previously not identified as a high commuting flow region but should also be reviewed when doing further analysis.
More than half of the hexagon grids had no pattern detected - likely due to the large number of 0s in the data frame. The next largest class of hexagon grids are “sporadic cold spots”. Among the “hot spots” classes, more areas are “sporadic hot spots”.
The following choropleth map is charted at an alpha value of 0.1 (CI 90%). - Based on the Mann-Kendall Test performed earlier, there are no areas with a significant trend observed at alpha value of 0.05 (CI 95%).
wkd_evening_ehsa_grid <- hex_grid %>%
left_join(wkd_evening_ehsa,
by = c("index" = "location"))
# Plot the choropleth map for significant values
ehsa_sig <- wkd_evening_ehsa_grid %>%
# alpha value set at 0.1 based on Mann-Kendall Test performed earlier
filter(p_value < 0.1)
tm_shape(ehsa_sig) +
tm_fill("classification", palette = my_palette) +
tm_borders(alpha = 0.4)
Based on the map, there are clusters of hot spots in several areas of Singapore during weekday evening peak hours. Among the grids identified earlier as high commuting flow regions, most are also hot spot areas. However, there are also clusters which were previously not identified as a high commuting flow region but should also be reviewed when doing further analysis.
More than half of the hexagon grids had no pattern detected - likely due to the large number of 0s in the data frame. The next largest class of hexagon grids are “sporadic cold spots”, which refers to locations that are on-again then off-again cold spots.
The following choropleth map is charted at an alpha value of 0.1 (CI 90%). - Based on the Mann-Kendall Test performed earlier, there are no areas with a significant trend observed at alpha value of 0.05 (CI 95%).
wke_morn_ehsa_grid <- hex_grid %>%
left_join(wke_morn_ehsa,
by = c("index" = "location"))
# Plot the choropleth map for significant values
wkemorn_ehsa_sig <- wke_morn_ehsa_grid %>%
# alpha value set at 0.1 based on Mann-Kendall Test performed earlier
filter(p_value < 0.1)
tm_shape(wkemorn_ehsa_sig) +
tm_fill("classification", palette = my_palette) +
tm_borders(alpha = 0.4)
Based on the map, there are clusters of hot spots in several areas of Singapore during weekend morning peak hours. However, not all grids identified earlier as high commuting flow regions are also hot spot areas. The biggest cluster of hot spot area is in the Downtown region, though some larger housing estates such as Tampines, Boon Lay, Punggol/Sengkang are also hot spot areas. This could be due to the timings used for the weekend morning peak hour definition being around lunch time and considerably later than the weekday morning peak hours.
More than half of the hexagon grids had no pattern detected - likely due to the large number of 0s in the data frame. The next largest class of hexagon grids are “sporadic cold spots”, which refers to locations that are on-again then off-again cold spots.
The following choropleth map is charted at an alpha value of 0.1 (CI 90%). - Based on the Mann-Kendall Test performed earlier, there are no areas with a significant trend observed at alpha value of 0.05 (CI 95%).
wke_evening_ehsa_grid <- hex_grid %>%
left_join(wke_evening_ehsa,
by = c("index" = "location"))
# Plot the choropleth map for significant values
evening_ehsa_sig <- wke_evening_ehsa_grid %>%
# alpha value set at 0.1 based on Mann-Kendall Test performed earlier
filter(p_value < 0.1)
tm_shape(evening_ehsa_sig) +
tm_fill("classification", palette = my_palette) +
tm_borders(alpha = 0.4)
Based on the map, there are clusters of hot spots in several areas of Singapore during weekend/PH evening peak hours. However, not all grids identified earlier as high commuting flow regions are also hot spot areas. Most of the clusters are found in the housing estates and around Woodlands Checkpoint. The Downtown area, while still a high commuting flow region, is no longer a hot spot area by evening time.
The geographical distribution of weekday and weekend peak hour commuting flow is heavily right-skewed, and can hence, be better identified in geovisualisation using Jenks Natural Breaks data classification. On weekdays, all of the highest commuting flow areas are in the housing estates, while on weekends / PH, high commuting flow areas that are not just in the major housing estates, but also in the town area and the area surrounding Woodlands Checkpoint.
There are clusters of hot spots in several areas of Singapore during all four peak hour time periods. Among the grids identified earlier as high commuting flow regions for weekday mornings and evenings, most are also hot spot areas. However, there are also clusters which were previously not identified as a high commuting flow region but should also be reviewed when doing further analysis.
For weekends/PH, not all grids identified as high commuting flow regions are also hot spot areas. During the weekend / PH morning peak, the biggest cluster of hot spot area is in the Downtown region, though some larger housing estates such as Tampines, Boon Lay, Punggol/Sengkang are also hot spot areas. This could be due to the timings used for the weekend morning peak hour definition being around lunch time and considerably later than the weekday morning peak hours. Conversely, during the weekend / PH evening peak, most clusters are found in the housing estates and around Woodlands Checkpoint. The Downtown area, while still a high commuting flow region, is no longer a hot spot area by evening time.