4  R basics

In this chapter we will introduce to the R basics and some exercises to get familiar to how R works.

4.1 Math operations

Sum

1+1
[1] 2

Subtraction

5-2
[1] 3

Multiplication

2*2
[1] 4

Division

8/2
[1] 4

Round the number

round(3.14)
[1] 3
round(3.14, 1) # The "1" indicates to round it up to 1 decimal digit.
[1] 3.1

You can use help ?round in the console to see the description of the function, and the default arguments.

4.2 Basic shortpaths

Perform Combinations

c(1, 2, 3)
[1] 1 2 3
c(1:3) # The ":" indicates a range between the first and second numbers. 
[1] 1 2 3

Create a comment with ctrl + shift + c

# Comments help you organize your code. The software will not run the comment. 

Create a table

A simple table with the number of trips by car, PT, walking, and cycling in a hypothetical street segment at a certain period.

Define variables

modes <- c("car", "PT", "walking", "cycling") # you can use "=" or "<-"
Trips = c(200, 50, 300, 150) # uppercase letters modify

Join the variables to create a table

table_example = data.frame(modes, Trips)

Take a look at the table

Visualize the table by clicking on the “Data” in the “Environment” page or use :

View(table_example)

Look at the first row

table_example[1,] #rows and columns start from 1 in R, differently from Python which starts from 0.
  modes Trips
1   car   200

Look at first row and column

table_example[1,1]
[1] "car"

4.3 Practical exercise

Dataset: the number of trips between all municipalities in the Lisbon Metropolitan Area, Portugal (INE 2018).

Import dataset

You can click directly in the file under the “Files” pan, or:

data = readRDS("data/TRIPSmode.Rds")

After you type " you can use tab to navigate between folders and files and enter to autocomplete.

Take a first look at the data

Summary statistics

summary(data)
    Origin          Destination            Total             Walk       
 Length:315         Length:315         Min.   :     7   Min.   :     0  
 Class :character   Class :character   1st Qu.:   330   1st Qu.:     0  
 Mode  :character   Mode  :character   Median :  1090   Median :     0  
                                       Mean   : 16825   Mean   :  4033  
                                       3rd Qu.:  5374   3rd Qu.:     0  
                                       Max.   :875144   Max.   :306289  
      Bike              Car            PTransit            Other        
 Min.   :   0.00   Min.   :     0   Min.   :     0.0   Min.   :    0.0  
 1st Qu.:   0.00   1st Qu.:   263   1st Qu.:     5.0   1st Qu.:    0.0  
 Median :   0.00   Median :   913   Median :   134.0   Median :    0.0  
 Mean   :  80.19   Mean   :  9956   Mean   :  2602.6   Mean   :  152.4  
 3rd Qu.:   0.00   3rd Qu.:  4408   3rd Qu.:   975.5   3rd Qu.:   62.5  
 Max.   :5362.00   Max.   :349815   Max.   :202428.0   Max.   :11647.0  

Check the structure of the data

str(data)
'data.frame':   315 obs. of  8 variables:
 $ Origin     : chr  "Alcochete" "Alcochete" "Alcochete" "Alcochete" ...
 $ Destination: chr  "Alcochete" "Almada" "Amadora" "Barreiro" ...
 $ Total      : num  20478 567 188 867 114 ...
 $ Walk       : num  6833 0 0 0 0 ...
 $ Bike       : num  320 0 0 0 0 0 0 0 91 0 ...
 $ Car        : num  12484 353 107 861 114 ...
 $ PTransit   : num  833 0 81 5 0 ...
 $ Other      : num  7 214 0 0 0 0 0 0 0 0 ...

Check the first values of each variable

data
head(data, 3) # first 3 values
     Origin Destination Total Walk Bike   Car PTransit Other
1 Alcochete   Alcochete 20478 6833  320 12484      833     7
2 Alcochete      Almada   567    0    0   353        0   214
3 Alcochete     Amadora   188    0    0   107       81     0

Check the number of rows (observations) and columns (variables)

nrow(data)
[1] 315
ncol(data)
[1] 8

Open the dataset

View(data)

Explore the data

Check the total number of trips

Use $ to select a variable of the data

sum(data$Total)
[1] 5299853

Percentage of car trips related to the total

sum(data$Car)/sum(data$Total) * 100
[1] 59.17638

Percentage of active trips related to the total

(sum(data$Walk) + sum(data$Bike)) / sum(data$Total) * 100
[1] 24.44883

Modify original data

Create a column with the sum of the number of trips for active modes

data$Active = data$Walk + data$Bike

Filter by condition (create new tables)

Filter trips only with origin from Lisbon

data_Lisbon = data[data$Origin == "Lisboa",]

Filter trips with origin different from Lisbon

data_out_Lisbon = data[data$Origin != "Lisboa",]

Filter trips with origin and destination in Lisbon

data_in_Out_Lisbon = data[data$Origin == "Lisboa" & data$Destination == "Lisboa",]

Remove the first column

data = data[ ,-1] #first column

Create a table only with origin, destination and walking trips

There are many ways to do the same operation.

names(data)
[1] "Destination" "Total"       "Walk"        "Bike"        "Car"        
[6] "PTransit"    "Other"       "Active"     
data_walk2 = data[ ,c(1,2,4)]
data_walk3 = data[ ,-c(3,5:9)]

Export data

Save data in .csv and .Rds

write.csv(data, 'data/dataset.csv', row.names = FALSE)
saveRDS(data, 'data/dataset.Rds') #Choose a different file. 

Import data

csv_file = read.csv("data/dataset.csv")
rds_file = readRDS("data/dataset.Rds")