Introduction
Analyzing public transit feeds is important to understand its
territorial coverage and dynamics, both on its spatial and temporal
dimensions. GTFShift
provides several methods that
encapsulate pre-defined methodologies for them. This document explores
their applicability with simple examples.
This article uses a GTFS feed from the library GTFS database for Portugal as an example. Refer to the vignette(“download”) for more details.
# Get GTFS from library GTFS database for Portugal
data = read.csv(system.file("extdata", "gtfs_sources_pt.csv", package = "GTFShift"))
gtfs_id = "lisboa"
gtfs = GTFShift::load_feed(data$URL[data$ID == gtfs_id], create_transfers=FALSE)
Analyse hourly frequency per stop
To analyse frequencies at stops, use
GTFShift::get_stop_frequency_hourly()
, producing, for each,
an aggregated counting of bus servicing it per hour.
By default, the analysis is performed for next business Wednesday, in Portugal. Refer to
GTFShift::calendar_nextBusinessWednesday()
, for more details. You can override this, usingdate
parameter.
# Perform frequency analysis
frequencies_stop = GTFShift::get_stop_frequency_hourly(gtfs)
summary(frequencies_stop)
#> stop_id hour frequency geometry
#> Length:39040 Min. : 6.0 Min. : 1.000 POINT :39040
#> Class :character 1st Qu.:10.0 1st Qu.: 3.000 epsg:4326 : 0
#> Mode :character Median :14.0 Median : 6.000 +proj=long...: 0
#> Mean :14.2 Mean : 7.063
#> 3rd Qu.:18.0 3rd Qu.: 9.000
#> Max. :23.0 Max. :41.000
Its returns an sf
data.frame
that can be
displayed using mapview, or stored in GeoPackage format.
# Display map
mapview::mapview(
frequencies_stop |>
filter(hour == 8 &
frequency > 2),
zcol = "frequency",
legend = TRUE,
cex = 4,
layer.name = "Frequency (hour)"
)
# Store in GeoPackage format
# st_write(frequencies_stop, "database/transit/bus_stop_frequency.gpkg", append=FALSE, quiet = TRUE)
Analyse hourly frequency per route
The frequency analysis can also be performed route wise. For this
purpose, use GTFShift::get_route_frequency_hourly()
,
returning aggregated results per hour and route.
The analysis can be performed for each route individually.
By default, the analysis is performed for next business Wednesday, in Portugal. Refer to
GTFShift::calendar_nextBusinessWednesday()
, for more details. You can override this, usingdate
parameter.
frequencies_route = GTFShift::get_route_frequency_hourly(gtfs)
summary(frequencies_route)
#> route_id route_short_name direction_id hour
#> Length:3336 Length:3336 Min. :0.0000 Min. : 0.00
#> Class :character Class :character 1st Qu.:0.0000 1st Qu.: 9.00
#> Mode :character Mode :character Median :0.0000 Median :13.00
#> Mean :0.4472 Mean :13.14
#> 3rd Qu.:1.0000 3rd Qu.:18.00
#> Max. :1.0000 Max. :23.00
#> frequency shape_id geometry
#> Min. : 1.000 Length:3336 LINESTRING :3336
#> 1st Qu.: 2.000 Class :character epsg:4326 : 0
#> Median : 3.000 Mode :character +proj=long...: 0
#> Mean : 3.204
#> 3rd Qu.: 4.000
#> Max. :11.000
quantile(frequencies_route$frequency)
#> 0% 25% 50% 75% 100%
#> 1 2 3 4 11
The overline
parameter allows for an even more
aggregated screening of the operation, clustering routes that overlap
and converting them into a single route network. This allows for a
better visualization of the volumes of frequencies per each segment of
the network and can help prioritizing interventions in the network.
frequencies_route_overline = GTFShift::get_route_frequency_hourly(gtfs, overline = TRUE)
summary(frequencies_route_overline)
#> frequency hour geometry
#> Min. : 1.000 Min. : 0.00 LINESTRING :134450
#> 1st Qu.: 3.000 1st Qu.: 9.00 epsg:4326 : 0
#> Median : 6.000 Median :13.00 +proj=long...: 0
#> Mean : 9.374 Mean :13.02
#> 3rd Qu.: 12.000 3rd Qu.:18.00
#> Max. :114.000 Max. :23.00
quantile(frequencies_route_overline$frequency)
#> 0% 25% 50% 75% 100%
#> 1 3 6 12 114
Improve visualization
Using the overline
attribute in
GTFShift::get_route_frequency_hourly()
might not be the
best option if the GTFS shapes do not share exactly the same geometry.
In those cases, the overlapping lines might not be merged correctly,
causing inconsistent results, such as a street with different
frequencies along it, despite not having any bus stop in between those
differences.

This is a known issue of
stplanr::overline2()
, the method used for the network
aggregation that has not been solved yet.
As an alternative, GTFShift
provides some methods that
allow to overcome this issue, by correcting its geometry or aggregating
the network with open data.
Correcting geometry with OSM open data
GTFShift offers several methods that allow to get routes geometry from OpenStreetMaps. Refer to vignette(“osm”) for more details.
Aggregating the network with OSM open data
There are several methods to aggregate a transit network. One approach is through the determination of the centerlines of the roads where the vehicles operate. GTFShift provides a method that encapsulates Python neatnet package for this purpose. Refer to vignette(“osm”) for more details.
During the development of this project, no R packages were found suiting this purpose. Centerline package has this feature in its roadmap. Currently, there are available solutions for Python or ArcGis.
Aggregating frequencies over a target network
As an alternative to the
GTFShift::get_route_frequency_hourly()
method using the
overline=TRUE
parameter,
GTFShift::network_overline()
provides a different frequency
aggregation functionality.
Given a target network, it identifies the segments corresponding to each route and uses them to aggregate the attribute defined in the parameters.
Below is provided an example, that uses the centerlines for the Carris network as a target network, generated using ArcGis.
network = sf::st_read(
system.file("extdata", "centerline_carris.gpkg", package = "GTFShift"),
quiet = TRUE
)
frequencies_route_overline_improved = GTFShift::network_overline(
network,
frequencies_route |> filter(hour == 8),
attr = "frequency"
)
quantile(frequencies_route_overline_improved$frequency)
#> 0% 25% 50% 75% 100%
#> 1 5 10 18 114