R Prog. Part 1


Done by: Low Yi Xiang


Linkedin Profile


Back To Selection


Jump To Contents

The R programming

R is an open source language with multiple libraries available. It is a widely used programming language for various purposes, such as data wrangling, visualization, modeling and even building this deck!


R also has rich libraries beyond data wrangling, it can build dashboards, rich interactive graphics or slides such as the one you are looking at right now!

Downloading R

Head over to website https://cran.r-project.org/bin/ or Google Install R for and click on the first link


Choose the OS your machine is on, download & Install software.


Download Rstudio

Head over to Rstudio website https://www.rstudio.com/products/rstudio/download2/ , download Rstudio (Free license) and install it.


Launch Rstudio and you should arrive at this image below:

Rstudio!

width

Scripting Environment

Click on the top left icon with the green "plus" sign to launch a script

width

Saving Scripts

You can create a folder you like and click on the save icon (floppy disk). A Pop up menu should guide you along.


Alternatively if you prefer to save a new copy at a differnet location, do the usual File -> Save as -> ...

Learning R (part 1)

Things we will go through in the next 1-2 hours!


Basic calculations

In any programming languages, parenthesis are very important. (e.g, every bracket must have closure, commas must be used carefully).

In R, you can do basic calculations like most scientifc computing languages such as Matlab, Python.

1+1 #addition

10-2 #substraction

100+2 - 4 #addition and substraction

224*2 #multiplication

84/4 #division

2^7 #power

R can handle strings too!

"a"
"b"
"this is a cat"

Back To Contents

Assigning Variables

in R, you can assign values / calculations to words with the "<-" or "=" symbol

one <- 1
two <- 2
three <- 3
cat <- "cat"
dog <- "dog"

you can print them out by typing them or with the print command

print(cat)
## [1] "cat"

Back To Contents

Do More Stuff With Variables!

You can perform calculations with Variables, for example:

one+one

two/three

three*two + one 

However, if you have not defined the variable, an error is returned

four
## Error in eval(expr, envir, enclos): object 'four' not found

You can also assign new variables to these calculations

four = two*two
four = three+one

print(four)
## [1] 4

Back To Contents

if-else statements

You can write conditions to check on variables and other kind of variables such as lists, matrices, dataframes etc. Recall that parenthesis is important, watch out for your brackets!


Understanding conditions

The if statement allows you to check for a condition, and the condition must return TRUE or FALSE. Example of conditions can be found below:

animal <- "cat"
print(animal == "cat")
## [1] TRUE

There are other functions (more on that later) and conditions symbol, such as != (not equal), >= (greater than or equals to) as well as <= (smaller than or equal to)

Examples:
one <- 1 ; two <-2 
two <= one
## [1] FALSE

Back To Contents

Writing your first if statement

Here is an example of an if-statement. (animal == "cat") is the condition, and the round brackets are required.

In addition, notice the curly brackets which are the parenthesis.

animal <- "cat"
if(animal == "cat"){
  print("your animal is a cat!")
}
## [1] "your animal is a cat!"

Exercise:

Question 1.1

What happens when you declare a variable animal <- "cat" and write an if statement that if the animal is a bird, print the statement "your animal is a bird" ?

Question 1.2

Write another if statement that if the animal is not a dog, print the statement "your animal is not a dog" .

Back To Contents

Answer for Question 1

Question 1.1 Solution

animal <- "cat"

if(animal == "bird"){
  print("your animal is a bird")
}

Nothing is printed out!

Question 1.2 Solution

animal <- "cat"

if(animal != "dog"){
  print("your animal is not a dog")
}
## [1] "your animal is not a dog"

What if for question 1.1 you wanted to follow up with the if statement and also print out "your animal is not a bird" ?

Back To Contents

Else statement

Introducing the else statement

Example:

animal <- "cat"

if(animal == "bird"){
  print("your animal is a bird")
}else{
  print("your animal is not a bird")
}
## [1] "your animal is not a bird"

What if you have multiple conditions you want to check? One way to do it is to write multiple else-if statements.

animal <- "cat"

if(animal == "bird"){
  print("your animal is a bird")
}else if(animal == "dog"){
  print("your animal is a dog")
}else{
  print("your animal is not a bird or dog")
}
## [1] "your animal is not a bird or dog"

Back To Contents

ifelse statements

There is also an if-else statement that is fairly convenient for simple tasks. In R, you can type ?ifelse or any functions with a question mark infront to access the documentation.


try the command ?ifelse and read the documentation now.


Heres an additional example:

first_digit <- 2
second_digit <- 4

ifelse(first_digit <= second_digit, "first digit is bigger", "second digit is bigger")
## [1] "first digit is bigger"

Back To Contents

More on conditions

You should notice by now that in all of these statements all require a conditional statement that turns either TRUE or FALSE .

You can combine these conditions with additional and statements or assign them to variables.

digit1 <- 1 ;digit2 <- 2 ; digit3 <-3
condition1 <- digit2 >= digit1
condition2 <- digit3 <= digit2

print(condition1 & condition2 ) #TRUE AND FALSE = FALSE
## [1] FALSE
print(condition1 || condition2) #TRUE OR FALSE = TRUE 
## [1] TRUE
#you can then stack them together. 
condition3 <- condition1 || condition2
if(condition3){
  #code here
}
## NULL

Back To Contents

Vectors

In R, sometimes you want to store multiple values, such as observations of a person height. You can declare them by using c(). More information can be found by ?c

Example:

random_numbers <- c(1,6,3,1,8,5,7,9,0,2)

You can perform math operations on vectors, try them out!

sum(random_numbers) #find the sum
mean(random_numbers) #find the average
mode(random_numbers) #find the most freq. item
min(random_numbers) #find the minimum number 
max(random_numbers) #find the maximum number
random_numbers*2  #multiply by 2
random_numbers -1 #substract each element by 1 

There is alot more functionalities available in vectors - but usually you will google them as you need them along the way.

Back To Contents

Vectors Methods

Vectors have some methods, such as finding out the length of the vector with length(vector). You can also index specific part of the vectors with square brackets. Multiple elements can be extracted out with either a another vector as follows:

random_numbers[1] #extract the first element
## [1] 1
random_numbers[c(4,6)] #extract the 4th and 6th element
## [1] 1 5

You can also use a TRUE/FALSE vector

greater_than_two <- random_numbers > 2
greater_than_two
##  [1] FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
random_numbers[greater_than_two]
## [1] 6 3 8 5 7 9

Back To Contents

Lists

One limitation with vectors is that you can only store individual elements in them. Suppose that you want to store 2 different vectors together:

a <- c(1,2,3)
b <- c(4,5,6)
new_vect <- c(a,b)
print(new_vect)
## [1] 1 2 3 4 5 6

Lists overcome this problem:

new_list <- list(a,b)
print(new_list)
## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] 4 5 6

Back To Contents

Indexing Lists

Lists can be indexed in the same way with vectors however they need double square brackets. list[[elements]] .

Example:

new_list[[c(1)]]
## [1] 1 2 3

In Lists, it is also possible to assign names and index them by their names. For example:

new_list2 <- list(first_A=a,second_B=b)
new_list2[["first_A"]]
## [1] 1 2 3

Back To Contents

Indexing Lists (Part2)

In list, indexing multiple elements is slightly different.

Example

new_list[c(1,2)] 
## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] 4 5 6

Using double square results would mean that you are taking sub-elements.

new_list[[c(1,2)]] #taking the first element, then take the second element. 
## [1] 2

Usually to take sub-elements, one would extract out each element and use appropriate indexing for that class. In this example the class happens to be a vector.

new_list[[1]][c(1,2)]
## [1] 1 2

Back To Contents

List (Advanced)

List have apply functions that is beyond the scope of this course More information can be found here. You can also type ?lapply to find out more.


Do note that with the introduction of dataframes and the dplyr package (part2), most people prefer using dataframes rather than lists when it comes to manipulating data.


Nevertheless, Lists are still very important and could be extremly useful as they can store many different variables. Infact, they can store about anything such as models, dataframes, functions. They can also be used for functional programming which is considered an advance topic.

Back To Contents

Functions

If you are familar with programming, you will understand that functions are very important.


For those who are not familar, functions are essentially methods you can use to apply to variables and get the same output through same process(es).


In other words, there are sometimes you need to apply the same code multiple times, this is where functions are useful as it (1) reduces the code you need , (2) readability, (3) saves time


Back To Contents

Functions (Example)

For instance, you need to take the square of a variable, square it, substract by itself, and add 8 to it.

x <- 2 
x <- x^2- x + 8 
x
## [1] 10

Suppose you need to do it multiple times, this might be a better approach.

x<-2
function_example <- function(x){
  x <- x^2-x+8
  return(x)
}
x <- function_example(x)

Back To Contents

Functions (Details)

As seen in the previous slides, functions are essentially codes that you can re-use without explicitly typing them out. A function has to contain 3 parts, declaration of variables / function name, the body, and the output which is the return function.


In the earlier example, function_example is the function name, x is the input variable, while x <- x2 - x+8 is the body, and return(x) is the code.


Heres another (trivial) example, suppose we have two numbers and multiply them together with a function:

multiply_two_numbers <- function(x,y){
  new_number <- x*y
  return(x*y)
}

Back To Contents

Functions (Summary)

The code below shows the outline of a function:


<function_name> <- function(input_1,input_2, ... , input_n){ #notice the brackets
  #your code here
        .
        .
        .
  return(<return the variables you require>)
}

Back To Contents

Functions (Questions)


Question 2.1

Write a function that takes in a string variable, checks whether it is a dog or a cat, otherwise return the string "it is neither a dog nor a cat"


Question 2.2

Write a funtion that takes in a vector of numerical values, and return a list of results computing the length,max, min, and mean.

Back To Contents

Functions (Answers)

Question 2.1

test_cat_dog <- function(animal){

  if(animal == "dog"){
    return("your animal is a dog")

  }else if(animal == "cat"){
    return("your animal is a cat")

  }else{
    return("your animal is neither a cat nor a dog")
  }
}

test_cat_dog("dog")
## [1] "your animal is a dog"
test_cat_dog("bird")
## [1] "your animal is neither a cat nor a dog"


Back To Contents

Functions (Answers)

Question 2.2

summary_stats <- function(x){
  length_x <- length(x)
  mean_x   <- mean(x)
  min_x    <- min(x)
  max_x    <- max(x)

  return_list <- list(length = length_x, mean = mean_x, min = min_x, max = max_x)

  return(return_list)
}

summary_stats(c(1,2,3,4,5))
## $length
## [1] 5
## 
## $mean
## [1] 3
## 
## $min
## [1] 1
## 
## $max
## [1] 5


Back To Contents

Functions (Further comments)


There are other features about functions in R such as inheritance, functional programming or specifiying default values which are out of scope of this course.


To recap, functions are a extremely useful way to write neater and shorter codes. As a rule of thumb, if you need to write the same code twice or more, it is probably a good idea to write a function for it.


Sometimes, you would be using functions that are built by others in the form of packages(#/11), it is thus important that you know how to write / read / call functions to help your data analysis in R!


Back To Contents

Loops

There are times when you need to do some task over and over again - in this case, you should think of loops!


There are two kind of loops - for and while loops.


for loops run within a fixed set and perform some tasks for you. More examples will be shown later.


while loops run until a certain condition is satisified.

Back To Contents

For Loops

In for loops, the structure is as follows:

for(i in 1:10){ #do something for ten times 
  #do something
}

It also possible to specify a vector to 'loop' through :

student_names <- c("Mary","John","Peter","Berry")
for(i in student_names){
  print(i)
  #code to do task related to each student's name. 
}
## [1] "Mary"
## [1] "John"
## [1] "Peter"
## [1] "Berry"


Back To Contents

For Loops (Questions)

Question 3.1

Specify a vector of 1:100 and using a for loop, compute the sum of all numbers in this range that are divisible by 3.

Hint1: to find the remainder of two numbers can be found with the modulo function in R, e.g 4%%2 = 0, while 4%%3 = 1.

Hint2: you can specify a variable to keep track of the running sum of variables.

numbers <- 1:100
running_sum <- 0 
for(i in numbers){
  running_sum <- running_sum + i  #keeping track of the total sum.
}
print(running_sum)
## [1] 5050


Back To Contents

For Loops (Answers)

Answer:

numbers <- 1:100
running_sum <- 0 
for(i in numbers){
  if(i %%3 ==0){
      running_sum <- running_sum + i  #keeping track of the total sum.
  }
}
print(running_sum)
## [1] 1683

While Loops

While loops is generally used when you want to achieve a task and is uncertain about the steps you need to take. The structure is as follows:

condition <- TRUE
while(condition){ #notice the brackets
  #do some stuff 
  #if the condition is fufilled, change it to FALSE and the while loop stops running. 
  }

As an example:

i=1 ; condition <- TRUE
while(condition){
  print(i) ; i<- i+1
  if(i == 3){
    condition<-FALSE
  }
}
## [1] 1
## [1] 2

Be careful with while loops as you might encounter infinite loops - the loop will run forever since the condition will never be false!

Back To Contents

While Loops (Question)

Question3.2

Using a while loop, find out how many numbers is required to have a running sum that is greater than 500 with numbers that are divisible by three.

Hint:

sum_required <- 500
condition <- TRUE
i <- 1 #start from 1
running_sum <- 0
while(condition){
  #check if i is divisble by three
  #if yes, running_sum <- running_sum+i 
  #check if running_sum exceeds sum_required
  i <- i+1 #add 1 to "i" to start the next interation. 
}


Back To Contents

While Loops (Answer)

Question3.2

Answer:

sum_required <- 500
condition <- TRUE
i <- 1 #start from 1
running_sum <- 0
while(condition){
  if(i %% 3 == 0 ){
    running_sum <- running_sum +i
  }
  if(running_sum >= sum_required){
    condition <- FALSE
  }
  i<- i+1
}
print(i)
## [1] 55


Bonus Challenge: how many numbers in total were used to achieve a sum exceeding 500 with numbers that are divisible by 3.

Back To Contents

While Loop (Bonus Challenge)

sum_required <- 500
condition <- TRUE
i <- 1 #start from 1
running_sum <- 0
counter <- 0 
while(condition){
  if(i %% 3 == 0 ){
    running_sum <- running_sum +i
    counter <- counter+1 #just add in a counter here to see when is this condition triggered. 
  }
  if(running_sum >= sum_required){
    condition <- FALSE
  }
  i<- i+1
}
print(counter)
## [1] 18


Back To Contents

Loops (cont)

There are two additional functionalities that are useful - break and next .

The break command basically stops the loops from running while the next command simply moves on to the next iteration of the loop.

"break" command
for( i in 1:4){
  if(i == 3){break}
  print(i)
}
## [1] 1
## [1] 2

"next" command

for( i in 1:4){
  if(i == 3){next}
  print(i)
}
## [1] 1
## [1] 2
## [1] 4

Back To Contents

Installing packages

R is a widely contributed by people all over the world, there are currently 8992 packagess available on CRAN not accounting for other libraries on github.


To see the libraries available by date or by name.


In the next part we will be playing around with dataframes, please run the following codes in your console.

install.packages("dplyr")
install.packages("tidyr")
install.packages("packrat")


Back To Contents

End!




Back To Contents