Skip to contents

Introduction

Analyzing public transit feeds is important to understand its territorial coverage and dynamics, both on its spatial and temporal dimensions. GTFShift provides several methods that encapsulate pre-defined methodologies for them. This document explores their applicability with simple examples.

This article uses a GTFS feed from the library GTFS database for Portugal as an example. Refer to the vignette(“download”) for more details.

# Get GTFS from library GTFS database for Portugal
data = read.csv(system.file("extdata", "gtfs_sources_pt.csv", package = "GTFShift"))
gtfs_id = "lisboa"
gtfs = GTFShift::load_feed(data$URL[data$ID == gtfs_id], create_transfers=FALSE)

Analyse hourly frequency per stop

To analyse frequencies at stops, use GTFShift::get_stop_frequency_hourly(), producing, for each, an aggregated counting of bus servicing it per hour.

By default, the analysis is performed for next business Wednesday, in Portugal. Refer to GTFShift::calendar_nextBusinessWednesday(), for more details. You can override this, using date parameter.

# Perform frequency analysis
frequencies_stop = GTFShift::get_stop_frequency_hourly(gtfs)
summary(frequencies_stop)
#>    stop_id               hour        frequency               geometry    
#>  Length:39040       Min.   : 6.0   Min.   : 1.000   POINT        :39040  
#>  Class :character   1st Qu.:10.0   1st Qu.: 3.000   epsg:4326    :    0  
#>  Mode  :character   Median :14.0   Median : 6.000   +proj=long...:    0  
#>                     Mean   :14.2   Mean   : 7.063                        
#>                     3rd Qu.:18.0   3rd Qu.: 9.000                        
#>                     Max.   :23.0   Max.   :41.000

Its returns an sf data.frame that can be displayed using mapview, or stored in GeoPackage format.

# Display map
mapview::mapview(
  frequencies_stop |>
    filter(hour == 8 &
           frequency > 2),
  zcol = "frequency",
  legend = TRUE,
  cex = 4,
  layer.name = "Frequency (hour)"
)

# Store in GeoPackage format
# st_write(frequencies_stop, "database/transit/bus_stop_frequency.gpkg", append=FALSE, quiet = TRUE)

Analyse hourly frequency per route

The frequency analysis can also be performed route wise. For this purpose, use GTFShift::get_route_frequency_hourly(), returning aggregated results per hour and route.

The analysis can be performed for each route individually.

By default, the analysis is performed for next business Wednesday, in Portugal. Refer to GTFShift::calendar_nextBusinessWednesday(), for more details. You can override this, using date parameter.

frequencies_route = GTFShift::get_route_frequency_hourly(gtfs)
summary(frequencies_route)
#>    route_id         route_short_name    direction_id         hour      
#>  Length:3336        Length:3336        Min.   :0.0000   Min.   : 0.00  
#>  Class :character   Class :character   1st Qu.:0.0000   1st Qu.: 9.00  
#>  Mode  :character   Mode  :character   Median :0.0000   Median :13.00  
#>                                        Mean   :0.4472   Mean   :13.14  
#>                                        3rd Qu.:1.0000   3rd Qu.:18.00  
#>                                        Max.   :1.0000   Max.   :23.00  
#>    frequency        shape_id                  geometry   
#>  Min.   : 1.000   Length:3336        LINESTRING   :3336  
#>  1st Qu.: 2.000   Class :character   epsg:4326    :   0  
#>  Median : 3.000   Mode  :character   +proj=long...:   0  
#>  Mean   : 3.204                                          
#>  3rd Qu.: 4.000                                          
#>  Max.   :11.000
quantile(frequencies_route$frequency)
#>   0%  25%  50%  75% 100% 
#>    1    2    3    4   11

The overline parameter allows for an even more aggregated screening of the operation, clustering routes that overlap and converting them into a single route network. This allows for a better visualization of the volumes of frequencies per each segment of the network and can help prioritizing interventions in the network.

frequencies_route_overline = GTFShift::get_route_frequency_hourly(gtfs, overline = TRUE)
summary(frequencies_route_overline)
#>    frequency            hour                geometry     
#>  Min.   :  1.000   Min.   : 0.00   LINESTRING   :134450  
#>  1st Qu.:  3.000   1st Qu.: 9.00   epsg:4326    :     0  
#>  Median :  6.000   Median :13.00   +proj=long...:     0  
#>  Mean   :  9.374   Mean   :13.02                         
#>  3rd Qu.: 12.000   3rd Qu.:18.00                         
#>  Max.   :114.000   Max.   :23.00
quantile(frequencies_route_overline$frequency)
#>   0%  25%  50%  75% 100% 
#>    1    3    6   12  114

Aggregated frequencies for 8 a.m.

mapview::mapview(
  frequencies_route |> filter(hour == 8 & frequency > 2),
  zcol = "frequency",
  layer.name = "Frequency (hour)"
)

Aggregated frequencies for routes overline for 8 a.m.

# above 2 per hour
mapview::mapview(
  frequencies_route_overline |> filter(
    hour == 8 &
      frequency > 2 # optional
  ),
  zcol = "frequency",
  # lwd = "frequency",
  layer.name = "Frequency (hour)"
)

Improve visualization

Using the overline attribute in GTFShift::get_route_frequency_hourly() might not be the best option if the GTFS shapes do not share exactly the same geometry. In those cases, the overlapping lines might not be merged correctly, causing inconsistent results, such as a street with different frequencies along it, despite not having any bus stop in between those differences.

Overline error example
Overline error example

This is a known issue of stplanr::overline2(), the method used for the network aggregation that has not been solved yet.

As an alternative, GTFShift provides some methods that allow to overcome this issue, by correcting its geometry or aggregating the network with open data.

Correcting geometry with OSM open data

GTFShift offers several methods that allow to get routes geometry from OpenStreetMaps. Refer to vignette(“osm”) for more details.

Aggregating the network with OSM open data

There are several methods to aggregate a transit network. One approach is through the determination of the centerlines of the roads where the vehicles operate. GTFShift provides a method that encapsulates Python neatnet package for this purpose. Refer to vignette(“osm”) for more details.

During the development of this project, no R packages were found suiting this purpose. Centerline package has this feature in its roadmap. Currently, there are available solutions for Python or ArcGis.

Aggregating frequencies over a target network

As an alternative to the GTFShift::get_route_frequency_hourly() method using the overline=TRUE parameter, GTFShift::network_overline() provides a different frequency aggregation functionality.

Given a target network, it identifies the segments corresponding to each route and uses them to aggregate the attribute defined in the parameters.

Below is provided an example, that uses the centerlines for the Carris network as a target network, generated using ArcGis.

network = sf::st_read(
  system.file("extdata", "centerline_carris.gpkg", package = "GTFShift"), 
  quiet = TRUE
)

frequencies_route_overline_improved = GTFShift::network_overline(
  network, 
  frequencies_route |> filter(hour == 8),
  attr = "frequency"
)

quantile(frequencies_route_overline_improved$frequency)
#>   0%  25%  50%  75% 100% 
#>    1    5   10   18  114
mapview::mapview(
  frequencies_route_overline_improved |> filter(frequency > 20),
  zcol = "frequency",
  layer.name = "Frequency (hour)"
)