Purpose

The purpose of the project proposal is to demonstrate your ability to find relevant and interesting data and to propose many thoughtful and creative questions about that data. This is the first stage in your final project, and the data you select will be heavily explored for the remainder of the project. Work as a team to find data that intrigues the entire team and develop questions that are valuable for exploration.

Requirements

All members of the group should be involved in the selection of the data. The data should have at least 5 variables that are not identifiers and will be studied in depth. Out of all the variables, at least 2 should be categorical. If your data only contains numeric variables, your group should decide on how to treat at least two of the variables as categorical. You are able to use multiple datasets in your project, but these will need to be merged at some point. To ensure future parts of the final project go smoothly, I recommend finding a dataset that contains more than 10 variables. To ensure your group is free of plagiarism, I recommend selecting datasets that are not attached to many online analyses. Please don’t use datasets from R packages.

Each member of the group is required to design at least two initial questions. These questions can be very general but should not be trivial. I recommend discussing the data as a group, design questions together, and then delegate the questions for future use. In later project parts, your group will be required to investigate these questions and then devise new follow-up questions for future analysis. Think generally about these initial questions so there is room for growth. Choose questions that have not been analyzed online for the data you have selected.

A template for the project proposal is provided on the course website. In this template, I need to see three key things.

  • Header: Title, Group number, Date
  • Roles for each of the group members
  • Hyperlink to the online source of the data
  • Ten/Eight questions typed out in the form of a question

The Deliverer is responsible for compiling all the information into the RMarkdown template provided on the course website. This document should be carefully proofread and submitted as an HTML file via Canvas by the due date. If the proposal is submitted late, your group will not receive a grade for it. This penalty will affect the entire group.

The Creator should schedule a 10 minute meeting with the instructor on designated dates (Find it on the class website) during lecture time. To reserve your 10 minute time slot, email your instructor with a specific 10 minute interval. Time slots will be prioritized according to email and posted on a google spreadsheet linked on the course website.

In this meeting, the Creator should come prepared with the dataset downloaded and ready to display on a laptop. The Creator will tell the Instructor where their group found the data and a summary about the variables contained in the data. The Creator should mention how many variables are of interest for future analyses, which of the variables are numerical,and which of the variables are categorical or will be treated as categorical. We will go through your initial questions to detect any problems that may arise.

Rubric

Requirement Points
Source of Data Given 1 Point
Data has 5 Variables with 2 Categorical 2 Points
At Least 2 Questions Per Group Member 2 Points
Followed RMarkdown Template 1 Point
Prepared for the meeting 4 Point
Total 10 Points