Introduction

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

This lab assignment is based on mpg data from ggplot2 package. This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov. Each row of the data frame represents a different car model and. There are 234 rows and 11 variables in the dataset. You can type ?mpg in the console to check details of the dataset.

You will need to modify the code chunks so that the code works within each of chunk (usually this means modifying anything in ALL CAPS). You will also need to modify the code outside the code chunk. When you get the desired result for each step, change Eval=F to Eval=T and knit the document to HTML to make sure it works. After you complete the lab, you should submit your HTML file of what you have completed to Sakai before the deadline. # Excercises

Part 1: Basic Plot

  1. Use scatterplot to visualize the relationship between displ(engine displacement) and hwy (highway miles per gallon) from mpg with displ on x-axis and hwy on y-axis.
ggplot(data = FILL_DATA) +
  geom_point(aes(x = VARIABLE, y = VARIABLE))
  1. Add a smooth curve to the previous scatterplot with linear regression (lm) as smoothing method.
ggplot(data = FILL_DATA) +
  geom_point(aes(x = VARIABLE, y = VARIABLE)) +
    geom_smooth(aes(x = VARIABLE, y = VARIABLE),method = FILL)
  1. Generate the same plot as in (b) but specify the aesthetic mappings in ggplot() function. Is there any difference between plot (c) and plot (b)?
ggplot(FILL) +
  geom_point() +
    geom_smooth(method = FILL)

ANSWER:___________________

  1. Generate the same plot as in (b) but the color of scatterplot points are controlled by class (type of car) in mpg.
ggplot(data = FILL_DATA) +
    geom_point(FILL) +
  geom_smooth(FILL)

Part 2: Advacned Plot

  1. Use facet_wrap to visualize the relationship between displ and hwy based on class.
ggplot(data = FILL_DATA) +
    geom_point(aes(x = VARIABLE, y = VARIABLE)) +
    facet_wrap(VARIABLE~., nrow=2)
  1. Use facet_grid to visualize the relationship between displ and hwy based on the relationship between drv (type of drive train) and cyl (number of cylinders).
ggplot(data = FILL_DATA) +
    geom_point(aes(x = VARIABLE, y = VARIABLE)) +
    facet_grid(VARIABLE ~ VARIABLE)

(Note that both drv and cyl are categorical variables. Their relationship will form a contingency table. The final plot visualizes relationship between displ and hwy based on each element of the contingency table.)

  1. Compare the following plot with the plot in (a), what is the difference?
ggplot(data = mpg) + 
    geom_point(aes(x = displ, y = hwy), position = "jitter")

ANSWER: ___________________

  1. geom_jitter is a convenient shortcut for geom_point(position = "jitter"). Generate the plot in (g) and set the points to be transparent with scale .5.
ggplot(data = FILL_DATA) + 
    geom_jitter(FILL)
  1. Generate boxplot of hwy based on class.
ggplot(data = FILL_DATA) + 
    geom_boxplot(aes(x = VARIABLE, y = VARIABLE))

Flip the coordinates of the boxplot with coord_flip().

ggplot(data = FILL_DATA) + 
    geom_boxplot(aes(x = VARIABLE, y = VARIABLE)) +
  fUNCTION