Introduction

In this tutorial, we will try to understand some fundamental control structures used in statistical programming. In the beginning, we will separately analyze different control structures.

Part 1: If-Else Statements

General Construction:

if (CONDITION) {
    ACTION
}
if (CONDITION) {
    ACTION 1
} else {
    ACTION 2
}
ifelse(CONDITION,ACTION1,ACTION2)

Chunk 1: Illustration of If

x = 3
if(x > 0){
  print(log(x))
}

x = -3
if(x > 0){
  print(log(x))
}

Chunk 2: Illustration of If-Else

x = 3
if(x > 0){
  print(log(x))
} else{
  message("Unable to Take Logarithm")
}

x = -3
if(x > 0){
  print(log(x))
} else {
  message("Unable to Take Logarithm")
}

Excercise 1

Write code that takes numbers a and b as input and prints ‘a is greater than b’ if a>b, otherwise prints ‘a is not greater than b’.

a = 10
b = 8
# write code here
if (a>b){
  print('a is greater than b')
} else {
  print('a is not greater than b')
}
## [1] "a is greater than b"

Chunk 3: Potential Problem of If-Else Statements

x = BLANK
if(x > 0){
  print(log(x))
}

if(x > 0){
  print(log(x))
} else{
  message("Unable to Take Logarithm")
}

Chunk 4: Fixing Potential Problem in Chunk 3

x=BLANK
if(is.numeric(x)){
  if(x > 0){
    print(log(x))
  } else{
    message("Unable to Take Logarithm")
  }
} else{
  message("Please Input Numbers")
}

Excercise 2

Redo Excercise 1 but check the data types before doing the comparison. Hint: && (and) and || (or) can be used to combine multiple logical expressions. Please don’t use & and | in an if statement: these are vectorized operations.

a = '10'
b = 8
# write code here
if (is.numeric(a) && is.numeric(b)){
  if (a>b){
    print('a is greater than b')
  } else {
    print('a is not greater than b')
  }
} else {
  message("Please Input Numbers")
}
## Please Input Numbers

Chunk 5: Vectorized Version with ifelse()

x=c(-1,3,200)
print(log(x))

y1 =  if(x > 0){
        log(x)
      } else{
        NA
      }
print(y1)
y2 = ifelse(x>0,log(x),NA)
print(y2)

Chunk 6: Nested ifelse() Statements

x=rnorm(1000,mean=0,sd=1)
y=ifelse(abs(x)<1,"Within 1 SD",ifelse(abs(x)>2,"Far Far Away","Between 1 and 2 SD"))
y.fct=factor(y,levels=c("Within 1 SD","Between 1 and 2 SD","Far Far Away"))
ggplot() +
  geom_bar(aes(x=y.fct),fill="lightskyblue1") +
  theme_minimal()

Excercise 3

Please use ifelse() function to create a new column WageLevel with lwage column of the Wages dataset from Ecdat.

  • If the lwage is greater than \(7.0\), the corresponding value in the final vector is High Wage.
  • If lwage is lower than \(6.4\), the value is Low Wage.
  • For all the other cases, the values are Normal.
New_Wages = Wages %>%
  mutate(WageLevel=ifelse(lwage>7.0,'High Wage',
                          ifelse(lwage<6.4, 'Low Wage', 'Normal')
                          )
         )
head(New_Wages,5)
##   exp wks bluecol ind south smsa married  sex union ed black   lwage WageLevel
## 1   3  32      no   0   yes   no     yes male    no  9    no 5.56068  Low Wage
## 2   4  43      no   0   yes   no     yes male    no  9    no 5.72031  Low Wage
## 3   5  40      no   0   yes   no     yes male    no  9    no 5.99645  Low Wage
## 4   6  39      no   0   yes   no     yes male    no  9    no 5.99645  Low Wage
## 5   7  42      no   1   yes   no     yes male    no  9    no 6.06146  Low Wage

Part 2: Loops

Chunk 1: Checking Geometric Series Proof with for loop

Geometric Series: \(a, ar, ar^2, ar^3,...\)

Formula of Sum: \(\sum_{k=0}^\infty ar^k=\frac{a}{1-r}\), for \(|r|<1\).

a=1 #Any Number
r=1/2 #Any Number Between -1 and 1: abs(r)<1

theoretical.limit=a/(1-r)

START=a

FINISH.1 = START + a*r^1

FINISH.2 = FINISH.1 + a*r^2

FINISH.3 = FINISH.2 + a*r^3

FINISH.10 = a 
for(k in 1:10){
  FINISH.10=FINISH.10+a*r^k
}

FINISH.100 = a 
for(k in 1:100){
  FINISH.100=FINISH.100+a*r^k
}

DATA = tibble(k=c(1,2,3,10,100,"Infinity"),
            SUMMATION=c(FINISH.1,FINISH.2,FINISH.3,
                        FINISH.10,FINISH.100,
                        theoretical.limit))
print(DATA)

ABSOLUTE.ERROR = abs(FINISH.100-theoretical.limit)
print(ABSOLUTE.ERROR)

Excercise 4

set.seed(4)
u = rnorm(100)

Use for loop to calculate sum of the squares of the first 10 elements of vector u.

sum10 = 0
for(i in 1:10){
  sum10 = sum10 + u[i]^2
}
print(sum10)
## [1] 13.08209

Use for loop and if statement to calculate the sum of squares of the elements with even indices of vector u.

sum_even = 0
for(i in 1:100){
  if(i%%2==0){
    sum_even = sum_even + u[i]
  }
}
print(sum_even)
## [1] -1.567888
sum_odd = 0
for(i in 1:100){
  if(i%%2!=0){
    sum_odd = sum_odd + u[i]
  }
}
print(sum_odd)
## [1] 11.22039
sum_even = 0
for(i in seq(2,100,2)){
  sum_even = sum_even + u[i]
}
print(sum_even)
## [1] -1.567888

Chunk 2: Checking Geometric Series Proof with while loop

a=1
r=1/2

FINISH=a
k=0
while(abs(FINISH-a/(1-r)) > 1e-10) {
  k=k+1
  FINISH = FINISH + a*r^k
  #if(k>100) break
}
print(c(k,FINISH))

Chunk 3: Saving Steps in Geometric Series for Figure

a=10
r=-0.75
theoretical.limit=a/(1-r)

K=10 #How Many Steps Do You Want to Save?

summation=rep(NA,(K+1))
summation[1]=a
for (k in 1:K) {
  summation[k+1]=summation[k] + a*r^k
}

ggplot() +
  geom_line(aes(x=1:(K+1),y=summation)) +
  geom_hline(yintercept=theoretical.limit,
             linetype="dashed")

Excercise 5

Write for loops to generate 100 random samples from normal distributions with means of 0 to 99 and save the random samples to a vector a.

  • The kth component of a is generated from \(N(k-1,1)\).
  • Hint: Function for generating random samples from normal distribution rnorm.
set.seed(100)
a = rep(NA,100)
for (k in 1:100){
  a[k] = rnorm(1,k-1,1)
}
print(a)
##   [1] -0.5021924  1.1315312  1.9210829  3.8867848  4.1169713  5.3186301
##   [7]  5.4182093  7.7145327  7.1747406  8.6401379 10.0898861 11.0962745
##  [13] 11.7983660 13.7398405 14.1233795 14.9706833 15.6111458 17.5108563
##  [19] 17.0861858 21.3102968 19.5619100 21.7640606 22.2619613 23.7734046
##  [25] 23.1856209 24.5615494 25.2797784 27.2309445 26.8422705 29.2470760
##  [31] 29.9088864 32.7573756 31.8620704 32.8888065 33.3099857 34.7782058
##  [37] 36.1829077 37.4173233 39.0654023 39.9702020 39.8983708 42.4032035
##  [43] 40.2232244 43.6228674 43.4777166 46.3222310 45.6365597 48.3190657
##  [49] 48.0437791 47.1213441 49.5529378 49.2614021 52.1788648 54.8974657
##  [55] 51.7280745 55.9804641 54.6011744 58.8248724 59.3812987 58.1611481
##  [61] 59.7380042 60.9311560 61.6211164 65.5819589 64.1298341 64.2869750
##  [67] 66.6379942 67.2016916 67.9300831 68.9075101 70.4489033 69.9356443
##  [73] 70.8375807 74.6485217 71.9379040 75.0127497 74.9124717 77.2705395
##  [79] 79.0084519 76.9255952 80.8968223 80.9500042 80.6546507 81.0687885
##  [85] 84.7095816 84.8420950 86.2163679 87.8173621 89.7271758 88.8962297
##  [91] 89.4428777 92.4283014 91.1070426 91.8424288 93.4697035 97.4456828
##  [97] 95.1675042 97.4135198 96.8213169 97.8259652