library(dplyr)
SURVEY = read.csv("../data/SURVEY.txt", sep = "\t") # tab delimiter
names(SURVEY)[1] "ID" "AFF" "AGE" "SEX" "MODE" "lat" "lon"
We will show how to estimate euclidean distances (as crown flights) using sf package, and the distances using a road network using openrouteservice package.
Taking the survey respondents’ location, we will estimate the distance to the university (IST) using the sf package.
We will use a survey dataset with 200 observations, with the following variables: ID, Affiliation, Age, Sex, Transport Mode to IST, and latitude and longitude coordinates.
library(dplyr)
SURVEY = read.csv("../data/SURVEY.txt", sep = "\t") # tab delimiter
names(SURVEY)[1] "ID" "AFF" "AGE" "SEX" "MODE" "lat" "lon"
As we have the coordinates, we can convert this data frame to a spatial feature, as explained in the Introduction to spatial data section.
library(sf)
SURVEYgeo = st_as_sf(SURVEY, coords = c("lon", "lat"), crs = 4326) # convert to as sf dataUsing coordinates from Instituto Superior Técnico, we can directly create a simple feature and assign its crs.
UNIVERSITY = data.frame(place = "IST",
lon = -9.1397404,
lat = 38.7370168) |> # first a dataframe
st_as_sf(coords = c("lon", "lat"), # then a spacial feature
crs = 4326)Visualize in a map:
library(mapview)
mapview(SURVEYgeo, zcol = "MODE") + mapview(UNIVERSITY, col.region = "red", cex = 12)First we will create lines connecting the survey locations to the university, using the st_nearest_points() function.
This function finds returns the nearest points between two geometries, and creates a line between them. This can be useful to find the nearest train station to each point, for instance.
As we only have 1 point at UNIVERSITY layer, we will have the same number of lines as number of surveys = 200.
SURVEYeuclidean = st_nearest_points(SURVEYgeo, UNIVERSITY, pairwise = TRUE) |>
st_as_sf() # this creates lines
mapview(SURVEYeuclidean)Note that if we have more than one point in the second layer, the pairwise = TRUE will create a line for each combination of points. Set to FALSE if, for instance, you have the same number of points in both layers and want to create a line between the corresponding points.
Now we can estimate the distance using the st_length() function.
# compute the line length and add directly in the first survey layer
SURVEYgeo = SURVEYgeo |>
mutate(distance = st_length(SURVEYeuclidean))
# remove the units - can be useful
SURVEYgeo$distance = units::drop_units(SURVEYgeo$distance) We could also estimate the distance using the st_distance() function directly, although we would not get and sf with lines.
SURVEYgeo = SURVEYgeo |>
mutate(distance = st_distance(SURVEYgeo, UNIVERSITY)[,1] |> # in meters
units::drop_units()) |> # remove units
mutate(distance = round(distance)) # round to integer
summary(SURVEYgeo$distance) Min. 1st Qu. Median Mean 3rd Qu. Max.
298 1106 2186 2658 3683 8600
SURVEYgeo is still a points’ sf.
There are different types of routing engines, regarding the type of network they use, the type of transportation they consider, and the type of data they need. We can have:
Uni-modal vs. Multi-modal
Output level of the results
Routing profiles
Local vs. Remote (service request - usually web API)
openrouteserviceWe will use the openrouteservice r package to estimate the distances using a road network (Oleś 2025).
To properly use the openrouteservice package, you need to setup the ORS provider. See the setup instructions for more details.
We will use only respondents with a distance to the university less than 2 km.
SURVEYsample = SURVEYgeo |> filter(distance <= 2000)
nrow(SURVEYsample)[1] 95
We need an id (unique identifier) for each survey location, so we can compare them later.
Also, we need to extract the coordinates, from both datasets, to be used in the routing functions of openrouteservice::directions().
# create id columns for both datasets
SURVEYsample = SURVEYsample |>
mutate(id = c(1:nrow(SURVEYsample))) |> # from 1 to the number of rows
mutate(coordinates = st_coordinates(geometry)) # extract coordinates
UNIVERSITY = UNIVERSITY |>
mutate(id = 1) |> # only one row
mutate(coordinates = st_coordinates(geometry)) # extract coordinatesEstimate the routes with time and distance by car, from survey locations to University.
This one is not that easy to set-up because the function is prepared to retrieve only one result per request :( So we do a loop function.
library(openrouteservice)
SURVEYcar = data.frame() # initial empty data frame
# loop - the origin (i) is the survey location, and the UNIVERSITY is always the same destination
for (i in 1:nrow(SURVEYsample)) {
ROUTES1 = ors_directions(
data.frame(
lon = c(SURVEYsample$coordinates[i, 1], UNIVERSITY$coordinates[1, 1]),
lat = c(SURVEYsample$coordinates[i, 2], UNIVERSITY$coordinates[1, 2])
),
profile = "driving-car", # or cycling-regular foot-walking
preference = "fastest", # or shortest
output = "sf"
)
ROUTES1$distance = ROUTES1$summary[[1]]$distance # extract these values from summary
ROUTES1$duration = ROUTES1$summary[[1]]$duration
SURVEYcar = rbind(SURVEYcar, ROUTES1) # to keep adding in the same df
}
SURVEYcar = SURVEYcar |>
select(distance, duration, geometry) |> # discard unnecessary variables
mutate(ID = SURVEYsample$ID) # cbind with syrvey IDRepeat the same for foot-walking.
SURVEYwalk = data.frame() # initial empty data frame
# loop - the origin (i) is the survey location, and the UNIVERSITY is always the same destination
for (i in 1:nrow(SURVEYsample)) {
ROUTES1 = ors_directions(
data.frame(
lon = c(SURVEYsample$coordinates[i, 1], UNIVERSITY$coordinates[1, 1]),
lat = c(SURVEYsample$coordinates[i, 2], UNIVERSITY$coordinates[1, 2])
),
profile = "foot-walking", # or driving-car cycling-regular cycling-electric
preference = "fastest", # or shortest
output = "sf"
)
ROUTES1$distance = ROUTES1$summary[[1]]$distance # extract these values from summary
ROUTES1$duration = ROUTES1$summary[[1]]$duration
SURVEYwalk = rbind(SURVEYwalk, ROUTES1) # to keep adding in the same df
}
SURVEYwalk = SURVEYwalk |>
select(distance, duration, geometry) |> # discard unnecessary variables
mutate(ID = SURVEYsample$ID) # cbind with survey IDAnd for cycling-regular.
SURVEYbike = data.frame() # initial empty data frame
# loop - the origin (i) is the survey location, and the UNIVERSITY is always the same destination
for (i in 1:nrow(SURVEYsample)) {
ROUTES1 = ors_directions(
data.frame(
lon = c(SURVEYsample$coordinates[i, 1], UNIVERSITY$coordinates[1, 1]),
lat = c(SURVEYsample$coordinates[i, 2], UNIVERSITY$coordinates[1, 2])
),
profile = "cycling-regular",
preference = "fastest", # or shortest
output = "sf"
)
ROUTES1$distance = ROUTES1$summary[[1]]$distance # extract these values from summary
ROUTES1$duration = ROUTES1$summary[[1]]$duration
SURVEYbike = rbind(SURVEYbike, ROUTES1) # to keep adding in the same df
}
SURVEYbike = SURVEYbike |>
select(distance, duration, geometry) |> # discard unnecessary variables
mutate(ID = SURVEYsample$ID) # cbind with survey IDnames(SURVEYcar)[1] "distance" "duration" "ID" "geometry"
If we want to know only time and distance, and not the route itself, we can use the ors_matrix().
We can now compare the euclidean and routing distances that we estimated for the survey locations under 2 km.
summary(SURVEYsample$distance) # Euclidean Min. 1st Qu. Median Mean 3rd Qu. Max.
298 790 1046 1112 1470 1963
summary(SURVEYwalk$distance) # Walk Min. 1st Qu. Median Mean 3rd Qu. Max.
467.3 1009.0 1447.3 1471.8 1953.5 2769.4
summary(SURVEYcar$distance) # Car Min. 1st Qu. Median Mean 3rd Qu. Max.
435.4 1227.0 1686.7 1771.4 2210.3 3503.2
summary(SURVEYbike$distance) # Bike Min. 1st Qu. Median Mean 3rd Qu. Max.
436 1247 1752 1791 2225 3758
What can you understand from this results?
Compare 1 single route.
mapview(SURVEYeuclidean[165,], color = "black") + # 1556 meters
mapview(SURVEYwalk[78,], color = "red") + # 2126 meters
mapview(SURVEYcar[78,], color = "blue") + # 2058 meters
mapview(SURVEYbike[78,], color = "gold") # 2025 metersWith this we can see the circuity of the routes, a measure of route / transportation efficiency, which is the ratio between the routing distance and the euclidean distance.
The circuity for car (1.32) is usually lower than for walking (1.37) or biking, for longer distances, and higher opposite for shorter distances.
Visualize with transparency of 30%, to get a clue when they overlay.
mapview(SURVEYwalk, alpha = 0.3)mapview(SURVEYcar, alpha = 0.3, color = "red")mapview(SURVEYbike, alpha = 0.3, color = "darkgreen")We can also use the overline() function from stplanr package to break up the routes when they overline, and add them up.
# we create a value that we can later sum
# it can be the number of trips represented by this route
SURVEYwalk$trips = 1 # in this case is only one respondent per route
SURVEYwalk_overline = stplanr::overline(
SURVEYwalk,
attrib = "trips",
fun = sum
)
mapview(SURVEYwalk_overline, zcol = "trips", lwd = 3)With this we can visually inform on how many people travel along a route, from the survey dataset1.
Assuming all travel by the shortest path.↩︎