[13]{.chapter-number}  [Euclidean and routing distances]{.chapter-title}

13 Euclidean and routing distances

We will show how to estimate euclidean distances (as crown flights) using sf package, and the distances using a road network using openrouteservice package.

13.1 Euclidean distances

Taking the survey respondents’ location, we will estimate the distance to the university (IST) using the sf package.

13.1.1 Import survey data frame convert to sf

We will use a survey dataset with 200 observations, with the following variables: ID, Affiliation, Age, Sex, Transport Mode to IST, and latitude and longitude coordinates.

library(dplyr)

SURVEY = read.csv("../data/SURVEY.txt", sep = "\t") # tab delimiter
names(SURVEY)

[1] "ID"   "AFF"  "AGE"  "SEX"  "MODE" "lat"  "lon"

As we have the coordinates, we can convert this data frame to a spatial feature, as explained in the Introduction to spatial data section.

library(sf)

SURVEYgeo = st_as_sf(SURVEY, coords = c("lon", "lat"), crs = 4326) # convert to as sf data

13.1.2 Create new point at the university

Using coordinates from Instituto Superior Técnico, we can directly create a simple feature and assign its crs.

UNIVERSITY = data.frame(place = "IST",
                        lon = -9.1397404,
                        lat = 38.7370168) |>  # first a dataframe
  st_as_sf(coords = c("lon", "lat"), # then a spacial feature
           crs = 4326)

Visualize in a map:

library(mapview)
mapview(SURVEYgeo, zcol = "MODE") + mapview(UNIVERSITY, col.region = "red", cex = 12)

13.1.3 Straight lines

First we will create lines connecting the survey locations to the university, using the st_nearest_points() function.

This function finds returns the nearest points between two geometries, and creates a line between them. This can be useful to find the nearest train station to each point, for instance.

As we only have 1 point at UNIVERSITY layer, we will have the same number of lines as number of surveys = 200.

SURVEYeuclidean = st_nearest_points(SURVEYgeo, UNIVERSITY, pairwise = TRUE) |>
  st_as_sf() # this creates lines

mapview(SURVEYeuclidean)

Note that if we have more than one point in the second layer, the pairwise = TRUE will create a line for each combination of points. Set to FALSE if, for instance, you have the same number of points in both layers and want to create a line between the corresponding points.

13.1.4 Distance

Now we can estimate the distance using the st_length() function.

# compute the line length and add directly in the first survey layer
SURVEYgeo = SURVEYgeo |> 
  mutate(distance = st_length(SURVEYeuclidean))

# remove the units - can be useful
SURVEYgeo$distance = units::drop_units(SURVEYgeo$distance)

We could also estimate the distance using the st_distance() function directly, although we would not get and sf with lines.

SURVEYgeo = SURVEYgeo |> 
  mutate(distance = st_distance(SURVEYgeo, UNIVERSITY)[,1] |>  # in meters
           units::drop_units()) |>  # remove units
  mutate(distance = round(distance)) # round to integer

summary(SURVEYgeo$distance)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    298    1106    2186    2658    3683    8600

SURVEYgeo is still a points’ sf.

13.2 Routing Engines

There are different types of routing engines, regarding the type of network they use, the type of transportation they consider, and the type of data they need. We can have:

Uni-modal vs. Multi-modal
- One mode per trip vs. One trip with multiple legs that can be made with different modes
- Multi-modal routing may require GTFS data (realistic Public Transit)
Output level of the results
- Routes (1 journey = 1 route)
- Legs
- Segments
Routing profiles
- Type of user
- fastest / shortest path
- avoid barriers / tolls, etc

Local vs. Remote (service request - usually web API)
- Speed vs. Quota limits / price
- Hard vs. Easy set up
- Hardware limitations in local routing
- Global coverage in remote routing, with frequent updates

Examples: OSRM, Dodgr, r5r, Googleway, CycleStreets, HERE.

13.3 Routing distances with `openrouteservice`

We will use the openrouteservice r package to estimate the distances using a road network (Oleś 2025).

To properly use the openrouteservice package, you need to setup the ORS provider. See the setup instructions for more details.

We will use only respondents with a distance to the university less than 2 km.

SURVEYsample = SURVEYgeo |> filter(distance <= 2000)
nrow(SURVEYsample)

[1] 95

We need an id (unique identifier) for each survey location, so we can compare them later.

Also, we need to extract the coordinates, from both datasets, to be used in the routing functions of openrouteservice::directions().

# create id columns for both datasets
SURVEYsample = SURVEYsample |>
  mutate(id = c(1:nrow(SURVEYsample))) |>  # from 1 to the number of rows
  mutate(coordinates = st_coordinates(geometry)) # extract coordinates

UNIVERSITY = UNIVERSITY |>
  mutate(id = 1) |> # only one row
  mutate(coordinates = st_coordinates(geometry)) # extract coordinates

13.3.1 Distances by car

Estimate the routes with time and distance by car, from survey locations to University.

This one is not that easy to set-up because the function is prepared to retrieve only one result per request :( So we do a loop function.

library(openrouteservice)

SURVEYcar = data.frame() # initial empty data frame

# loop - the origin (i) is the survey location, and the UNIVERSITY is always the same destination
for (i in 1:nrow(SURVEYsample)) {
  ROUTES1 = ors_directions(
    data.frame(
      lon = c(SURVEYsample$coordinates[i, 1], UNIVERSITY$coordinates[1, 1]),
      lat = c(SURVEYsample$coordinates[i, 2], UNIVERSITY$coordinates[1, 2])
    ),
    profile = "driving-car", # or cycling-regular foot-walking
    preference = "fastest", # or shortest
    output = "sf"
  )
  ROUTES1$distance = ROUTES1$summary[[1]]$distance # extract these values from summary
  ROUTES1$duration = ROUTES1$summary[[1]]$duration
  
  SURVEYcar = rbind(SURVEYcar, ROUTES1) # to keep adding in the same df
}

SURVEYcar = SURVEYcar |>
  select(distance, duration, geometry) |> # discard unnecessary variables
  mutate(ID = SURVEYsample$ID) # cbind with syrvey ID

13.3.2 Distances by foot

Repeat the same for foot-walking.

Code

SURVEYwalk = data.frame() # initial empty data frame

# loop - the origin (i) is the survey location, and the UNIVERSITY is always the same destination
for (i in 1:nrow(SURVEYsample)) {
  ROUTES1 = ors_directions(
    data.frame(
      lon = c(SURVEYsample$coordinates[i, 1], UNIVERSITY$coordinates[1, 1]),
      lat = c(SURVEYsample$coordinates[i, 2], UNIVERSITY$coordinates[1, 2])
    ),
    profile = "foot-walking", # or driving-car cycling-regular cycling-electric
    preference = "fastest", # or shortest
    output = "sf"
  )
  ROUTES1$distance = ROUTES1$summary[[1]]$distance # extract these values from summary
  ROUTES1$duration = ROUTES1$summary[[1]]$duration
  
  SURVEYwalk = rbind(SURVEYwalk, ROUTES1) # to keep adding in the same df
}

SURVEYwalk = SURVEYwalk |>
  select(distance, duration, geometry) |> # discard unnecessary variables
  mutate(ID = SURVEYsample$ID) # cbind with survey ID

13.3.3 Distances by bike

And for cycling-regular.

Code

SURVEYbike = data.frame() # initial empty data frame

# loop - the origin (i) is the survey location, and the UNIVERSITY is always the same destination
for (i in 1:nrow(SURVEYsample)) {
  ROUTES1 = ors_directions(
    data.frame(
      lon = c(SURVEYsample$coordinates[i, 1], UNIVERSITY$coordinates[1, 1]),
      lat = c(SURVEYsample$coordinates[i, 2], UNIVERSITY$coordinates[1, 2])
    ),
    profile = "cycling-regular", 
    preference = "fastest", # or shortest
    output = "sf"
  )
  ROUTES1$distance = ROUTES1$summary[[1]]$distance # extract these values from summary
  ROUTES1$duration = ROUTES1$summary[[1]]$duration
  
  SURVEYbike = rbind(SURVEYbike, ROUTES1) # to keep adding in the same df
}

SURVEYbike = SURVEYbike |>
  select(distance, duration, geometry) |> # discard unnecessary variables
  mutate(ID = SURVEYsample$ID) # cbind with survey ID

names(SURVEYcar)

[1] "distance" "duration" "ID"       "geometry"

If we want to know only time and distance, and not the route itself, we can use the ors_matrix().

13.4 Compare distances

We can now compare the euclidean and routing distances that we estimated for the survey locations under 2 km.

summary(SURVEYsample$distance) # Euclidean

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    298     790    1046    1112    1470    1963

summary(SURVEYwalk$distance) # Walk

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  467.3  1009.0  1447.3  1471.8  1953.5  2769.4

summary(SURVEYcar$distance) # Car

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  435.4  1227.0  1686.7  1771.4  2210.3  3503.2

summary(SURVEYbike$distance) # Bike

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    436    1247    1752    1791    2225    3758

What can you understand from this results?

13.4.1 Circuity

Compare 1 single route.

Code

mapview(SURVEYeuclidean[165,], color = "black") + # 1556 meters
  mapview(SURVEYwalk[78,], color = "red") + # 2126 meters
  mapview(SURVEYcar[78,], color = "blue") + # 2058 meters
  mapview(SURVEYbike[78,], color = "gold") # 2025 meters

With this we can see the circuity of the routes, a measure of route / transportation efficiency, which is the ratio between the routing distance and the euclidean distance.

The circuity for car (1.32) is usually lower than for walking (1.37) or biking, for longer distances, and higher opposite for shorter distances.

13.5 Visualize routes

Visualize with transparency of 30%, to get a clue when they overlay.

mapview(SURVEYwalk, alpha = 0.3)

mapview(SURVEYcar, alpha = 0.3, color = "red")

mapview(SURVEYbike, alpha = 0.3, color = "darkgreen")

We can also use the overline() function from stplanr package to break up the routes when they overline, and add them up.

# we create a value that we can later sum
# it can be the number of trips represented by this route
SURVEYwalk$trips = 1 # in this case is only one respondent per route

SURVEYwalk_overline = stplanr::overline(
  SURVEYwalk,
  attrib = "trips",
  fun = sum
)

mapview(SURVEYwalk_overline, zcol = "trips", lwd = 3)

With this we can visually inform on how many people travel along a route, from the survey dataset¹.

Questions

How many people are entering IST by the stairs near Bar de Civil?
And by the North gate?
And from Alameda stairs?

Assuming all travel by the shortest path.↩︎

--- number-depth: 3 code-fold: false format: pdf: prefer-html: true --- # Euclidean and routing distances We will show how to estimate euclidean distances (*as crown flights*) using `sf` package, and the distances using a road network using `openrouteservice` package. ## Euclidean distances Taking the survey respondents' location, we will estimate the distance to the university (IST) using the `sf` package. ### Import survey data frame convert to sf We will use a survey dataset with 200 observations, with the following variables: ID, Affiliation, Age, Sex, Transport Mode to IST, and latitude and longitude coordinates. ```{r} #| message: false library(dplyr) SURVEY = read.csv("../data/SURVEY.txt", sep = "\t") # tab delimiter names(SURVEY) ``` As we have the coordinates, we can convert this data frame to a spatial feature, as explained in the [Introduction to spatial data](spatial-data.qmd#create-spatial-data-from-coordinates) section. ```{r} #| message: false library(sf) SURVEYgeo = st_as_sf(SURVEY, coords = c("lon", "lat"), crs = 4326) # convert to as sf data ``` ### Create new point at the university Using coordinates from Instituto Superior Técnico, we can directly create a simple feature and assign its crs. ```{r} UNIVERSITY = data.frame(place = "IST", lon = -9.1397404, lat = 38.7370168) |> # first a dataframe st_as_sf(coords = c("lon", "lat"), # then a spacial feature crs = 4326) ``` Visualize in a map: ```{r} #| fig-format: png #| message: false library(mapview) mapview(SURVEYgeo, zcol = "MODE") + mapview(UNIVERSITY, col.region = "red", cex = 12) ``` ### Straight lines First we will create lines connecting the survey locations to the university, using the `st_nearest_points()` function. This function finds returns the nearest points between two geometries, and creates a line between them. This can be useful to find the nearest train station to each point, for instance. As we only have 1 point at UNIVERSITY layer, we will have the same number of lines as number of surveys = `r nrow(SURVEY)`. ```{r} #| warning: false #| fig-format: png SURVEYeuclidean = st_nearest_points(SURVEYgeo, UNIVERSITY, pairwise = TRUE) |> st_as_sf() # this creates lines mapview(SURVEYeuclidean) ``` Note that if we have more than one point in the second layer, the `pairwise = TRUE` will create a line for each combination of points. Set to `FALSE` if, for instance, you have the same number of points in both layers and want to create a line between the corresponding points. ### Distance Now we can estimate the distance using the `st_length()` function. ```{r} #| eval: false # compute the line length and add directly in the first survey layer SURVEYgeo = SURVEYgeo |> mutate(distance = st_length(SURVEYeuclidean)) # remove the units - can be useful SURVEYgeo$distance = units::drop_units(SURVEYgeo$distance) ``` We could also estimate the distance using the `st_distance()` function **directly**, although we would not get and sf with lines. ```{r} SURVEYgeo = SURVEYgeo |> mutate(distance = st_distance(SURVEYgeo, UNIVERSITY)[,1] |> # in meters units::drop_units()) |> # remove units mutate(distance = round(distance)) # round to integer summary(SURVEYgeo$distance) ``` `SURVEYgeo` is still a points' sf. ## Routing Engines There are different types of routing engines, regarding the type of network they use, the type of transportation they consider, and the type of data they need. We can have: - Uni-modal vs. Multi-modal - One mode per trip vs. One trip with multiple legs that can be made with different modes - Multi-modal routing may require GTFS data (realistic Public Transit) - Output level of the results - Routes (1 journey = 1 route) - Legs - Segments - Routing profiles - Type of user - fastest / shortest path - avoid barriers / tolls, etc ![Routing options in [OpenRouteService](https://maps.openrouteservice.org/)](images/clipboard-1370155720.png){fig-align="center"} - Local vs. Remote (service request - usually web API) - Speed vs. Quota limits / price - Hard vs. Easy set up - Hardware limitations in local routing - Global coverage in remote routing, with frequent updates Examples: [OSRM](https://project-osrm.org/), [Dodgr](https://urbananalyst.github.io/dodgr/), [r5r](https://ipeagit.github.io/r5r/), [Googleway](https://symbolixau.github.io/googleway/reference/access_result.html), [CycleStreets](https://m.cyclestreets.net/journey), [HERE](https://munterfi.github.io/hereR/).                                                                               ## Routing distances with `openrouteservice` We will use the `openrouteservice` r package to estimate the distances using a road network [@openrouteservicer]. ::: {.callout-note appearance="simple"} To properly use the `openrouteservice` package, you need to setup the ORS provider. See the [setup instructions](software.qmd#ors) for more details. ::: We will use only respondents with a distance to the university less than 2 km. ```{r} SURVEYsample = SURVEYgeo |> filter(distance <= 2000) nrow(SURVEYsample) ``` We need an id (unique identifier) for each survey location, so we can [compare](#compare) them later. Also, we need to extract the coordinates, from both datasets, to be used in the routing functions of [`openrouteservice::directions()`](https://giscience.github.io/openrouteservice-r/reference/ors_directions.html). ```{r} # create id columns for both datasets SURVEYsample = SURVEYsample |> mutate(id = c(1:nrow(SURVEYsample))) |> # from 1 to the number of rows mutate(coordinates = st_coordinates(geometry)) # extract coordinates UNIVERSITY = UNIVERSITY |> mutate(id = 1) |> # only one row mutate(coordinates = st_coordinates(geometry)) # extract coordinates ``` ### Distances by car Estimate the routes with time and distance by car, from survey locations to University. ::: {.callout-warning appearance="simple"} This one is not that easy to set-up because the function is prepared to retrieve only one result per request :( So we do a loop function. ::: ```{r orsloop1} #| eval: false #| include: true library(openrouteservice) SURVEYcar = data.frame() # initial empty data frame # loop - the origin (i) is the survey location, and the UNIVERSITY is always the same destination for (i in 1:nrow(SURVEYsample)) { ROUTES1 = ors_directions( data.frame( lon = c(SURVEYsample$coordinates[i, 1], UNIVERSITY$coordinates[1, 1]), lat = c(SURVEYsample$coordinates[i, 2], UNIVERSITY$coordinates[1, 2]) ), profile = "driving-car", # or cycling-regular foot-walking preference = "fastest", # or shortest output = "sf" ) ROUTES1$distance = ROUTES1$summary[[1]]$distance # extract these values from summary ROUTES1$duration = ROUTES1$summary[[1]]$duration SURVEYcar = rbind(SURVEYcar, ROUTES1) # to keep adding in the same df } SURVEYcar = SURVEYcar |> select(distance, duration, geometry) |> # discard unnecessary variables mutate(ID = SURVEYsample$ID) # cbind with syrvey ID ``` ### Distances by foot Repeat the same for `foot-walking`. ```{r orsloop2} #| code-fold: true #| eval: false #| include: true SURVEYwalk = data.frame() # initial empty data frame # loop - the origin (i) is the survey location, and the UNIVERSITY is always the same destination for (i in 1:nrow(SURVEYsample)) { ROUTES1 = ors_directions( data.frame( lon = c(SURVEYsample$coordinates[i, 1], UNIVERSITY$coordinates[1, 1]), lat = c(SURVEYsample$coordinates[i, 2], UNIVERSITY$coordinates[1, 2]) ), profile = "foot-walking", # or driving-car cycling-regular cycling-electric preference = "fastest", # or shortest output = "sf" ) ROUTES1$distance = ROUTES1$summary[[1]]$distance # extract these values from summary ROUTES1$duration = ROUTES1$summary[[1]]$duration SURVEYwalk = rbind(SURVEYwalk, ROUTES1) # to keep adding in the same df } SURVEYwalk = SURVEYwalk |> select(distance, duration, geometry) |> # discard unnecessary variables mutate(ID = SURVEYsample$ID) # cbind with survey ID ``` ### Distances by bike And for `cycling-regular`. ```{r orsloop3} #| code-fold: true #| eval: false #| include: true SURVEYbike = data.frame() # initial empty data frame # loop - the origin (i) is the survey location, and the UNIVERSITY is always the same destination for (i in 1:nrow(SURVEYsample)) { ROUTES1 = ors_directions( data.frame( lon = c(SURVEYsample$coordinates[i, 1], UNIVERSITY$coordinates[1, 1]), lat = c(SURVEYsample$coordinates[i, 2], UNIVERSITY$coordinates[1, 2]) ), profile = "cycling-regular", preference = "fastest", # or shortest output = "sf" ) ROUTES1$distance = ROUTES1$summary[[1]]$distance # extract these values from summary ROUTES1$duration = ROUTES1$summary[[1]]$duration SURVEYbike = rbind(SURVEYbike, ROUTES1) # to keep adding in the same df } SURVEYbike = SURVEYbike |> select(distance, duration, geometry) |> # discard unnecessary variables mutate(ID = SURVEYsample$ID) # cbind with survey ID ``` ```{r importexport1, include=FALSE} # st_write(SURVEYcar, "../original/routes_car.geojson") # st_write(SURVEYwalk, "../original/routes_foot.geojson") # st_write(SURVEYbike, "../original/routes_bike.geojson") # piggyback::pb_upload("original/routes_car.geojson") # piggyback::pb_upload("original/routes_foot.geojson") # piggyback::pb_upload("original/routes_bike.geojson") SURVEYcar = st_read("https://github.com/U-Shift/MQAT/releases/download/2025/routes_car.geojson", quiet = TRUE) SURVEYwalk = st_read("https://github.com/U-Shift/MQAT/releases/download/2025/routes_foot.geojson", quiet = TRUE) SURVEYbike = st_read("https://github.com/U-Shift/MQAT/releases/download/2025/routes_bike.geojson", quiet = TRUE) ``` ```{r} names(SURVEYcar) ``` ::: {.callout-note appearance="simple"} If we want to know only time and distance, and **not the route** itself, we can use the [`ors_matrix()`](https://giscience.github.io/openrouteservice-r/reference/ors_matrix.html). ::: ## Compare distances {#compare} We can now compare the euclidean and routing distances that we estimated for the survey locations under 2 km. ```{r} summary(SURVEYsample$distance) # Euclidean summary(SURVEYwalk$distance) # Walk summary(SURVEYcar$distance) # Car summary(SURVEYbike$distance) # Bike ``` > What can you understand from this results? ```{r} #| echo: false distances = data.frame(euclidean = SURVEYsample$distance, walk = SURVEYwalk$distance, car = SURVEYcar$distance, bike = SURVEYbike$distance) distances = distances |> arrange(euclidean) # Define the number of observations n <- nrow(distances) # Create an empty plot with the appropriate y-axis limits plot(1:n, distances$euclidean, type = "n", ylim = range(distances), xlab = "Observation (sorted by Euclidean distance)", ylab = "Distance [m]", main = "Distances by Euclidean, Walk, Car, and Bike") # Add lines for each type of distance lines(1:n, distances$euclidean, lwd = 2) lines(1:n, distances$walk, col = "red", lwd = 2) lines(1:n, distances$car, col = "blue", lwd = 2) lines(1:n, distances$bike, col = "gold", lwd = 2) # Optional: Add points for better visibility points(1:n, distances$euclidean, pch = 16) points(1:n, distances$walk, col = "red", pch = 16) points(1:n, distances$car, col = "blue", pch = 16) points(1:n, distances$bike, col = "gold", pch = 16) # Add a legend to distinguish the lines legend("topleft", legend = c("Euclidean", "Walk", "Car", "Bike"), col = c("black", "red", "blue", "gold"), lwd = 2, lty = c(1, 2, 3, 4), pch = 16) ``` ### Circuity Compare 1 single route. ```{r} #| fig-format: png #| message: false #| code-fold: true mapview(SURVEYeuclidean[165,], color = "black") + # 1556 meters mapview(SURVEYwalk[78,], color = "red") + # 2126 meters mapview(SURVEYcar[78,], color = "blue") + # 2058 meters mapview(SURVEYbike[78,], color = "gold") # 2025 meters ``` With this we can see the **circuity** of the routes, a measure of route / transportation efficiency, which is the ratio between the routing distance and the euclidean distance. The circuity for car (`r round(2058/1556,2)`) is usually lower than for walking (`r round(2126/1556,2)`) or biking, for longer distances, and higher opposite for shorter distances. ## Visualize routes Visualize with transparency of 30%, to get a clue when they overlay. ```{r} #| fig-format: png #| message: false mapview(SURVEYwalk, alpha = 0.3) mapview(SURVEYcar, alpha = 0.3, color = "red") mapview(SURVEYbike, alpha = 0.3, color = "darkgreen") ``` We can also use the [`overline()`](https://docs.ropensci.org/stplanr/reference/overline.html) function from `stplanr` package to break up the routes when they *overline*, and add them up. ```{r} #| message: false #| fig-format: png # we create a value that we can later sum # it can be the number of trips represented by this route SURVEYwalk$trips = 1 # in this case is only one respondent per route SURVEYwalk_overline = stplanr::overline( SURVEYwalk, attrib = "trips", fun = sum ) mapview(SURVEYwalk_overline, zcol = "trips", lwd = 3) ``` With this we can visually inform on how many people travel along a route, from the survey dataset[^2]. [^2]: Assuming all travel by the shortest path. ::: {.callout-tip appearance="simple"} ## Questions - How many people are entering IST by the stairs near *Bar de Civil*? - And by the North gate? - And from Alameda stairs? :::

13.1 Euclidean distances

13.1.1 Import survey data frame convert to sf

13.1.2 Create new point at the university

13.1.3 Straight lines

13.1.4 Distance

13.2 Routing Engines

13.3 Routing distances with openrouteservice

13.3.1 Distances by car

13.3.2 Distances by foot

13.3.3 Distances by bike

13.4 Compare distances

13.4.1 Circuity

13.5 Visualize routes

13.3 Routing distances with `openrouteservice`