Assignment

Part 1: Cholesterol Experiment

The data frame CHOL1 contains experimental results from randomly assigning 18 people to one of two competing margarine brands “A” and “B”. The cholesterol of these patients was measured once before using the margarine brand, once after 4 weeks with the margarine brand, and then again after 8 weeks with the margarine brand. Researchers want to see if there is benefit of these brands of margarine on reducing an individual’s cholesterol and want to determine if there is a statistically significant difference between the two competing brands.

Q1 (3 Points)

Start by examing the tables CHOL1 and CHOL2 and answering the following questions with Yes or No responses.

Is the variable ID in CHOL1 a primary key?

Answer (1 Point): ANSWER_HERE

Is the variable, Margarine in CHOL1 a primary key?

Answer (1 Point): ANSWER_HERE

Is the variable, Brand in CHOL2 a primary key?

Answer (1 Point): ANSWER_HERE

Q2 (2 Points)

In a new data frame called CHOL1a based on CHOL1, rename the variables After4weeks and After8weeks to nonsynctactic variable names 4 and 8, respectively. Use names(CHOL1a) to display this modification.

Q3 (4 Points)

Use the pivot_longer() function or gather() function on CHOL1a to create a new numeric variable called Week that contains numeric values 4 or 8 and a new numeric variable called Response that contains the Cholesterol after the corresponding number of weeks. Create a new data frame called CHOL1b with these modifications and use str(Chol1b) to show that both variables have been created correctly and are indeed numeric (an integer variable is a specific type of numeric variable).

Q4 (4 Points)

Now working with CHOL2, we want to spread the variable Statistic across multiple columns. Do this in a new data frame called CHOL2a and use print(CHOL2a) to display the modified complete table.

Q5 (3 Points)

Start by examing the tables CHOL1b and CHOL2a and answering the following questions with Yes or No responses.

Is the variable ID in CHOL1b a primary key?

Answer (1 Point): ANSWER_HERE

Is the variable, Margarine in CHOL1b a primary key?

Answer (1 Point): ANSWER_HERE

Is the variable, Brand in CHOL2a a primary key?

Answer (1 Point): ANSWER_HERE

Q6 (4 Points)

Get the nutritional facts of the different margarine brands in CHOL2a into the experimental results found in CHOL1b using a join. Create a new data frame named CHOL.COMBINED and display the table using head(CHOL.COMBINED). This final data frame should contain 36 observations and 10 variables.

Part 2: Linking Important Information to 2017 Violent Crimes Data

In the zipped folder, there are 5 CSV files. In this section, we are going to merge all of that data into one object called FINAL.VIOLENT.

Q1 (2 Points)

The dataset S_VS_D contains a variable CLASS where “S=Safe” and “D=Dangerous” according to the article These Are the 2018 Safest and Most Dangerous States in the U.S by Steve Karantzoulidis. We seek to compare the violent crime statistics for states not in this list. Use a filtering join to create a new data frame called VIOLENT2 that only contains violent crime statistics from the states not represented in the data frame S_VS_D. Use str(VIOLENT2) to display the variables and the dimensions of VIOLENT2.

Q2 (8 Points)

Start by creating a new data set called VIOLENT3 based on VIOLENT2 that fixes some problems in the variable City. Specifically, we would like to change “Louisville Metro” to “Louisville”. Next, create a new data frame named VIOLENT4 that connects the population change and density measures from 2019 contained in CENSUS to the cities and states in VIOLENT3. Use head(VIOLENT4) to give a preview of the new merged dataset.

Finally, in a complete sentence, identify any location(s) (Cities and States) missing violent crime information.

Code and Output (4 Points):

Answer (4 Points): ANSWER_HERE

Q3 (10 Points)

Either ambitiously using one step or less-ambitiously using multiple steps add the longitude and latitude information provided in ZIP to the cities and states in VIOLENT4. You will need to use STATE_ABBREV data to link these two data frames. Your final data frame named FINAL.VIOLENT should contain all of the information in VIOLENT4 along with the variables lat and lon from ZIP. There should be no state abbreviations in FINAL.VIOLENT since this information is redundant. Use str(FINAL.VIOLENT) to demonstrate that everything worked as planned.

In FINAL.VIOLENT identify what cities are missing latitude and longitude. Closely, inspect both the ZIP and VIOLENT4 data frames. Report the location(s) missing geographical information and explain in complete sentences why this happened.

Finally, challenge yourself and attempt to fix this problem in a new data frame called FINAL.VIOLENT.FIX. Use a combination of str() and filter() to only display the data in FINAL.VIOLENT.FIX for the location(s) that FINAL.VIOLENT was missing latitude and longitude. Do this in the second code chunk below.

Code and Output (6 Points):

Answer (2 Points): ANSWER_HERE

Code and Output (2 Point):

Analysis 2: Joining Cholesterol with Crimes

FIRSTNAME LASTNAME

September 26, 2024

Instructions

Introduction

Assignment

Part 1: Cholesterol Experiment

Q1 (3 Points)

Q2 (2 Points)

Q3 (4 Points)

Q4 (4 Points)

Q5 (3 Points)

Q6 (4 Points)

Part 2: Linking Important Information to 2017 Violent Crimes Data

Q1 (2 Points)

Q2 (8 Points)

Q3 (10 Points)