1+1
[1] 2
In this chapter we will introduce to the R basics and some exercises to get familiar to how R works.
1+1
[1] 2
5-2
[1] 3
2*2
[1] 4
8/2
[1] 4
round(3.14)
[1] 3
round(3.14, 1) # The "1" indicates to round it up to 1 decimal digit.
[1] 3.1
You can use help ?round
in the console to see the description of the function, and the default arguments.
c(1, 2, 3)
[1] 1 2 3
c(1:3) # The ":" indicates a range between the first and second numbers.
[1] 1 2 3
ctrl + shift + c
# Comments help you organize your code. The software will not run the comment.
A simple table with the number of trips by car, PT, walking, and cycling in a hypothetical street segment at a certain period.
Define variables
<- c("car", "PT", "walking", "cycling") # you can use "=" or "<-"
modes = c(200, 50, 300, 150) # uppercase letters modify Trips
Join the variables to create a table
= data.frame(modes, Trips) table_example
Take a look at the table
Visualize the table by clicking on the “Data” in the “Environment” page or use :
View(table_example)
Look at the first row
1,] #rows and columns start from 1 in R, differently from Python which starts from 0. table_example[
modes Trips
1 car 200
Look at first row and column
1,1] table_example[
[1] "car"
Dataset: the number of trips between all municipalities in the Lisbon Metropolitan Area, Portugal (INE 2018).
You can click directly in the file under the “Files” pan, or:
= readRDS("data/TRIPSmode.Rds") data
After you type "
you can use tab
to navigate between folders and files and enter
to autocomplete.
Summary statistics
summary(data)
Origin Destination Total Walk
Length:315 Length:315 Min. : 7 Min. : 0
Class :character Class :character 1st Qu.: 330 1st Qu.: 0
Mode :character Mode :character Median : 1090 Median : 0
Mean : 16825 Mean : 4033
3rd Qu.: 5374 3rd Qu.: 0
Max. :875144 Max. :306289
Bike Car PTransit Other
Min. : 0.00 Min. : 0 Min. : 0.0 Min. : 0.0
1st Qu.: 0.00 1st Qu.: 263 1st Qu.: 5.0 1st Qu.: 0.0
Median : 0.00 Median : 913 Median : 134.0 Median : 0.0
Mean : 80.19 Mean : 9956 Mean : 2602.6 Mean : 152.4
3rd Qu.: 0.00 3rd Qu.: 4408 3rd Qu.: 975.5 3rd Qu.: 62.5
Max. :5362.00 Max. :349815 Max. :202428.0 Max. :11647.0
Check the structure of the data
str(data)
'data.frame': 315 obs. of 8 variables:
$ Origin : chr "Alcochete" "Alcochete" "Alcochete" "Alcochete" ...
$ Destination: chr "Alcochete" "Almada" "Amadora" "Barreiro" ...
$ Total : num 20478 567 188 867 114 ...
$ Walk : num 6833 0 0 0 0 ...
$ Bike : num 320 0 0 0 0 0 0 0 91 0 ...
$ Car : num 12484 353 107 861 114 ...
$ PTransit : num 833 0 81 5 0 ...
$ Other : num 7 214 0 0 0 0 0 0 0 0 ...
Check the first values of each variable
data
head(data, 3) # first 3 values
Origin Destination Total Walk Bike Car PTransit Other
1 Alcochete Alcochete 20478 6833 320 12484 833 7
2 Alcochete Almada 567 0 0 353 0 214
3 Alcochete Amadora 188 0 0 107 81 0
Check the number of rows (observations) and columns (variables)
nrow(data)
[1] 315
ncol(data)
[1] 8
Open the dataset
View(data)
Check the total number of trips
Use $
to select a variable of the data
sum(data$Total)
[1] 5299853
Percentage of car trips related to the total
sum(data$Car)/sum(data$Total) * 100
[1] 59.17638
Percentage of active trips related to the total
sum(data$Walk) + sum(data$Bike)) / sum(data$Total) * 100 (
[1] 24.44883
Create a column with the sum of the number of trips for active modes
$Active = data$Walk + data$Bike data
Filter by condition (create new tables)
Filter trips only with origin from Lisbon
= data[data$Origin == "Lisboa",] data_Lisbon
Filter trips with origin different from Lisbon
= data[data$Origin != "Lisboa",] data_out_Lisbon
Filter trips with origin and destination in Lisbon
= data[data$Origin == "Lisboa" & data$Destination == "Lisboa",] data_in_Out_Lisbon
Remove the first column
= data[ ,-1] #first column data
Create a table only with origin, destination and walking trips
There are many ways to do the same operation.
names(data)
[1] "Destination" "Total" "Walk" "Bike" "Car"
[6] "PTransit" "Other" "Active"
= data[ ,c(1,2,4)] data_walk2
= data[ ,-c(3,5:9)] data_walk3
Save data in .csv and .Rds
write.csv(data, 'data/dataset.csv', row.names = FALSE)
saveRDS(data, 'data/dataset.Rds') #Choose a different file.
= read.csv("data/dataset.csv")
csv_file = readRDS("data/dataset.Rds") rds_file