Introduction

The task is to explore the US census population estimates by county for 2022 from the package usmap. The data frame (countypop) has 3222 rows and 4 variables:

Each row of the data frame represents a different county or a county equivalent. For the sake of simplicity, when we say a county, that also includes a county equivalent and when we say a state, that also includes the District of Columbia. Answer the following questions.

You will need to modify the code chunks so that the code works within each of chunk (usually this means modifying anything in ALL CAPS). You will also need to modify the code outside the code chunk. When you get the desired result for each step, change Eval=F to Eval=T and knit the document to HTML to make sure it works. After you complete the lab, you should submit your HTML file of what you have completed to Canvas before the deadline.

Excercises

Part 1: Length and Unique

  1. How many unique 2-letter state abbreviations are there (2 point)? Use length and unique functions.
FUNCTION1(FUNCTION2(VARIABLE))
  1. What is the total number of counties in the US (2 point)? Use length and unique functions.
FUNCTION1(FUNCTION2(VARIABLE))
  1. How many unique county names are there (2 point)? Use length and unique functions.
FUNCTION1(FUNCTION2(VARIABLE))

Part 2: Count and Arrange

  1. What are the top 10 most common county names (2 points)? count number of different county names, arrange in descending order and show the first 10 observations.
DATANAME %>%
  count(COUNTY_VARIABLE) %>%
  arrange(ORDER_FUNCTION(n)) %>$
  head(NUMBER_OF_OBS_TO_SHOW)
  1. Which state has the smallest number of counties (2 points)? count number of observations in each state, arrange the data in ascending order and show the first observation.
countypop %>%
  count(STATE_VARIABLE) %>%
  arrange(n) %>%
  head(NUMBER_OF_OBS_TO_SHOW)
  1. Which state has the largest county in terms of population? How many people live in the largest county in terms of population (2 points)? arrange the data with pop_2022 in descending order. The first observation contains the information.
pop_2022=countypop$pop_2022
arrange(countypop,ORDER_FUNCTION(POP_VARIABLE))[1,]

Part 3 Group_by and Summarize

  1. How many people live in each of the states (2 points)? Group the observation by the variable that serves as state identifier then summarize the data to get total number of people in each state.
countypop %>%
  group_by(STATE_VARIABLE) %>%
  summarise(total_pop=SUM_FUNCTION(POP_VARIABLE))
  1. What is the average population of a county in North Carolina (2 points)? filter the data to keep observations from ‘NC’, summarise the data to get average population.
countypop %>%
  filter(STATE_VARIABLE==NORTH_CAROLINA) %>%
  summarise(AVERAGE_FUNCTION(VARIABLE))
  1. What is the largest county in terms of population of each of the states (4 points)?
countypop %>%
  group_by(STATE_VARIABLE) %>%
  summarise(county=COUNTY_VARIABLE[which.max(POP_VARIABLE)],MAX_FUNCTION(POP_VARIABLE))