Get your dataset ready!
Using R and GIS
Materials for the course delivered at the EIT Doctoral Training Network Annual Forum, in Gent (Belgium), 19th and 20th September 2024.
1 Introduction
This course aims to provide tools to deal with exploring and treating transportation datasets using R programming, an open-source and widely used tool for data analytics in urban mobility.
Additionally, this course provides guidance towards the use of reproducible methods to deal with large datasets that require manipulation and/or spatial analysis.
The course has a hands-on approach, where participants will learn the basics of coding, data manipulation, and spatial analysis for urban mobility and transportation.
1.1 Mobility data
There is an emerging increase in mobility data, through new forms of technology, which result in very large and diverse datasets.
Knowing how to get, treat and analyze complex datasets with the up-to-date technologies is extremely relevant for academia, policy makers and start-ups, since it allows them to:
acquire critical view on urban mobility based on data;
spatially identify locations in the city that require policy priorities;
and improve the efficiency of data analysis processes.
Why R and GIS
Most academic programs focus on teaching modelling and deep analysis of data. However, there is a need to learn how to explore and prepare a dataset for modelling. The use of programming and GIS techniques have enormous advantages, including their flexibility; reproducibility; and transparency and understanding the step-by-step process.
The use of GIS techniques in transportation is, traditionally, not considered in transportation learning programs, despite being of enormous relevance when doing accessibility analysis or reeling with georreferenced transportation data, such as bike sharing route trips’ datasets, origin-destination flows datasets, home/work locations, GTFS public transit data, and so on. There is a need to learn how to locate these open datasets, how to explore them and how to integrate them into transportation and urban analysis. Additionally, the use of open source software and datasets allows researchers to perform methods that are reproducible and transparent.
TLDR
Open-source tools widely used in data analytics and spatial analysis
Flexibility and reproducibility in data manipulation and visualization
Critical for urban mobility and transportation research, with spatial relevance
Large transportation datasets are becoming increasingly common
1.2 Course objectives
Introduce R Programming Basics
Equip participants with foundational skills in R programming
Emphasize reproducible research practices to ensure transparency and replicability in analyses
Teach Data Manipulation Techniques
Use key R packages for data cleaning, manipulation, and summarization of datasets
Enable participants to efficiently handle large and complex transportation datasets
Spatial Data Visualization
Introduce methods for quick and effective spatial data visualization using R and GIS tools
Provide hands-on experience with creating interactive maps and visualizations
Perform Basic Spatial Analysis
Teach participants how to perform spatial analysis of transportation datasets using GIS techniques with R
Cover practical applications such as georeferencing data, accessibility analysis, and routing ODs
Utilize real-world transportation data for practical, hands-on learning
1.3 Target audience
Ph.D. candidates from DTN and other researchers
Policy makers and practitioners in urban mobility
Beginners to intermediate R users, no prior experience needed
1.4 Recommended readings
- Engel (2023) Introduction to R.
- Lovelace, Nowosad, and Muenchow (2024) Geocomputation with R.
- Pereira and Herszenhut (2023) Introduction to urban accessibility: a practical guide with R.