Purpose

The purpose of the final paper is to summarize results for two interesting questions using a combination of figures, tables, and modeling techniques. Written communication is an integral part of data science. This is your opportunity to develop a high quality blog post/article that could potentially be published to the web and used in future job interviews. The findings should not overlap with what you have done in the EDA report.

Requirements

After the exploratory data analysis, your group should have two questions that are interesting, relevant, and worth sharing to the world. These questions should involve multiple variables and should be answerable by predictive modeling techniques. Be innovative and creative. Do not try to answer questions that have obvious solutions or have been extensively studied. Pick questions that would spark a reader or fellow researcher to ask more questions or engage in discussion.

The final paper consists of four sections: Introduction, Data, Results, and Conclusion. Provided on the course website is a simple Rmarkdown template with predefined headings. The template also contains requirements and suggestions for each of the four sections.

The Deliverer is responsible for compiling all the information into the RMarkdown template provided on the course website. The compiled HTML document, along with a separate RMarkdown file containing the project’s code, must be carefully proofread and submitted via Canvas by the due date. If the document and code are submitted late, your group will not receive a grade for the final report. This penalty will affect the entire group.

In the final HTML document, there should be absolutely no R code. The writing and proofreading of the document should be shared by all members of the group. All figures should have appropriate legends, titles, and colors. You are encouraged to use Markdown syntax for subsections, bold, italic, hyperlink, tables, etc.

A separate RMarkdown file containing all the code for reproducing the project’s results should be submitted along with the HTML document. The R code will be evaluated based on its ability to successfully reproduce the results presented in the project.

Ten points of the final project is based on an average score measuring overall contribution as seen by you and the other members of your group. Each group member should score every person in their group on a continuous scale from 0 (Bad) to 10 (Good). Before the due date of the final paper, every member is required to submit the group scoring through the google survey link on course website. Your name and this information will remain private between me and you. If you fail to submit this group scoring before deadline, 2 points penalty will be applied and I will give the other members a score of 10.

Rubric

Requirement Points
Introduction: 2 Questions Clearly Defined 1 Point
Introduction: Am I Interested? 3 Points
Data: Adequately Describes Data 1 Point
Results: Appropriate Methods Using Multiple Models 5 Points
Results: Adequately Explains Results 3 Points
Results: Techinical Accuracy 6 Points
Conclusion: Summarize Questions with Results 1 Point
Conclusion: Do I Want to Learn More? 1 Point
Overall: No R Code in the HTML Document 1 Point
Overall: Followed RMarkdown Template 1 Point
Overall: Free of Spelling and Grammatical Errors 3 Points
Code: Reproducible 4 Points
Individual Score 10 Points
Total 40 Points