1+1[1] 2
In this chapter we will introduce to the R basics and some exercises to get familiar to how R works.
1+1[1] 2
5-2[1] 3
2*2[1] 4
8/2[1] 4
round(3.14)[1] 3
round(3.14, 1) # The "1" indicates to round it up to 1 decimal digit.[1] 3.1
You can use help ?round in the console to see the description of the function, and the default arguments.
c(1, 2, 3)[1] 1 2 3
c(1:3) # The ":" indicates a range between the first and second numbers. [1] 1 2 3
ctrl + shift + c# Comments help you organize your code. The software will not run the comment. A simple table with the number of trips by car, PT, walking, and cycling in a hypothetical street segment at a certain period.
Define variables
modes <- c("car", "PT", "walking", "cycling") # you can use "=" or "<-"
Trips = c(200, 50, 300, 150) # uppercase letters modifyJoin the variables to create a table
table_example = data.frame(modes, Trips)Take a look at the table
Visualize the table by clicking on the “Data” in the “Environment” page or use :
View(table_example)Look at the first row
table_example[1,] #rows and columns start from 1 in R, differently from Python which starts from 0. modes Trips
1 car 200
Look at first row and column
table_example[1,1][1] "car"
Dataset: the number of trips between all municipalities in the Lisbon Metropolitan Area, Portugal (INE 2018).
You can click directly in the file under the “Files” pan, or:
data = readRDS("data/TRIPSmode.Rds")After you type " you can use tab to navigate between folders and files and enter to autocomplete.
Summary statistics
summary(data) Origin Destination Total Walk
Length:315 Length:315 Min. : 7 Min. : 0
Class :character Class :character 1st Qu.: 330 1st Qu.: 0
Mode :character Mode :character Median : 1090 Median : 0
Mean : 16825 Mean : 4033
3rd Qu.: 5374 3rd Qu.: 0
Max. :875144 Max. :306289
Bike Car PTransit Other
Min. : 0.00 Min. : 0 Min. : 0.0 Min. : 0.0
1st Qu.: 0.00 1st Qu.: 263 1st Qu.: 5.0 1st Qu.: 0.0
Median : 0.00 Median : 913 Median : 134.0 Median : 0.0
Mean : 80.19 Mean : 9956 Mean : 2602.6 Mean : 152.4
3rd Qu.: 0.00 3rd Qu.: 4408 3rd Qu.: 975.5 3rd Qu.: 62.5
Max. :5362.00 Max. :349815 Max. :202428.0 Max. :11647.0
Check the structure of the data
str(data)'data.frame': 315 obs. of 8 variables:
$ Origin : chr "Alcochete" "Alcochete" "Alcochete" "Alcochete" ...
$ Destination: chr "Alcochete" "Almada" "Amadora" "Barreiro" ...
$ Total : num 20478 567 188 867 114 ...
$ Walk : num 6833 0 0 0 0 ...
$ Bike : num 320 0 0 0 0 0 0 0 91 0 ...
$ Car : num 12484 353 107 861 114 ...
$ PTransit : num 833 0 81 5 0 ...
$ Other : num 7 214 0 0 0 0 0 0 0 0 ...
Check the first values of each variable
datahead(data, 3) # first 3 values Origin Destination Total Walk Bike Car PTransit Other
1 Alcochete Alcochete 20478 6833 320 12484 833 7
2 Alcochete Almada 567 0 0 353 0 214
3 Alcochete Amadora 188 0 0 107 81 0
Check the number of rows (observations) and columns (variables)
nrow(data)[1] 315
ncol(data)[1] 8
Open the dataset
View(data)Check the total number of trips
Use $ to select a variable of the data
sum(data$Total)[1] 5299853
Percentage of car trips related to the total
sum(data$Car)/sum(data$Total) * 100[1] 59.17638
Percentage of active trips related to the total
(sum(data$Walk) + sum(data$Bike)) / sum(data$Total) * 100[1] 24.44883
Create a column with the sum of the number of trips for active modes
data$Active = data$Walk + data$BikeFilter by condition (create new tables)
Filter trips only with origin from Lisbon
data_Lisbon = data[data$Origin == "Lisboa",]Filter trips with origin different from Lisbon
data_out_Lisbon = data[data$Origin != "Lisboa",]Filter trips with origin and destination in Lisbon
data_in_Out_Lisbon = data[data$Origin == "Lisboa" & data$Destination == "Lisboa",]Remove the first column
data = data[ ,-1] #first columnCreate a table only with origin, destination and walking trips
There are many ways to do the same operation.
names(data)[1] "Destination" "Total" "Walk" "Bike" "Car"
[6] "PTransit" "Other" "Active"
data_walk2 = data[ ,c(1,2,4)]data_walk3 = data[ ,-c(3,5:9)]Save data in .csv and .Rds
write.csv(data, 'data/dataset.csv', row.names = FALSE)
saveRDS(data, 'data/dataset.Rds') #Choose a different file. csv_file = read.csv("data/dataset.csv")
rds_file = readRDS("data/dataset.Rds")