-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathfinal-project.qmd
More file actions
288 lines (188 loc) · 19.4 KB
/
final-project.qmd
File metadata and controls
288 lines (188 loc) · 19.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
---
title: "Final Project"
---
# General Instructions
You will be assigned to a **pre-assigned team of 3 or 4 students** and a **Teaching Fellow (TF)** who will supervise your project. Your team will select one project approach from the options below and report to your assigned TF throughout the semester.
**Important**: All project communication should go through your assigned TF. Do not contact the instructors directly about project questions.
::: {.callout-note collapse="true" title="Click to View Team Assignments"}
| Team | TF | Team Slack Channel | Members |
|:--------------|:--------------|:---------------------------------|:----------------|
| Team 1 | Angela | [#bst-260-project-team1](https://harvard.enterprise.slack.com/archives/C09MCF1BW5Q) | Motohiko Adomi, Jethro Au, Kate Bucci, Anthony Candelmo |
| Team 2 | Angela | [#bst-260-project-team2](https://harvard.enterprise.slack.com/archives/C09M919D58V) | Tiger Chaisutyakorn, Yixiao Chen, EK Cheng, Huihan Cui |
| Team 3 | Angela | [#bst-260-project-team3](https://harvard.enterprise.slack.com/archives/C09MG1JMZ60) | Sylvia Deng, Yinuo Du, Qianyu Fan |
| Team 4 | Angela | [#bst-260-project-team4](https://harvard.enterprise.slack.com/archives/C09N6P792N4) | Tanvi Gaitonde, Gabrielle Gonzalez, Allen Gu, Camila Guetter |
| Team 5 | Angela | [#bst-260-project-team5](https://harvard.enterprise.slack.com/archives/C09M632U4MR) | Siwei Guo, Hannah Hamling, Shuying Han, Runpeng Hu |
| Team 6 | Ava | [#bst-260-project-team6](https://harvard.enterprise.slack.com/archives/C09MG1KMD7W) | Anthea Hua, Jiaming Huang, Helen Keetley, Helena Li |
| Team 7 | Ava | [#bst-260-project-team7](https://harvard.enterprise.slack.com/archives/C09M91K2A13) | Stellen Li, Yutong Li, Jason Liang, Julia Lin |
| Team 8 | Ava | [#bst-260-project-team8](https://harvard.enterprise.slack.com/archives/C09N6P85R8Q) | Siyan Lin, Cindy Liu, Jasper Liu, Junhao Luo |
| Team 9 | Ava | [#bst-260-project-team9](https://harvard.enterprise.slack.com/archives/C09LX1CB49M) | Peng Luo, Yuxi Luo, Adriana Manjon, Neel Mirani |
| Team 10 | Ava | [#bst-260-project-team10](https://harvard.enterprise.slack.com/archives/C09M634FFL3) | Bao Han Ngo, Ryan Ou, Katelyn Power, Chloe Qiu |
| Team 11 | Emma | [#bst-260-project-team11](https://harvard.enterprise.slack.com/archives/C09MG1MD0UC) | Yvonne Qiu, Ziyue Qiu, Varshini Ramanathan, April Ren |
| Team 12 | Emma | [#bst-260-project-team12](https://harvard.enterprise.slack.com/archives/C09MCF7DCJE) | Aanika Schueler, Emily Shen, Shriya Sai Shivakumar, Erica Song |
| Team 13 | Emma | [#bst-260-project-team13](https://harvard.enterprise.slack.com/archives/C09M91CB1QD) | Rahul Srinivasaragavan, Nyah Strickland, Christina Wang, Hengyuan Wang |
| Team 14 | Emma | [#bst-260-project-team14](https://harvard.enterprise.slack.com/archives/C09MCF7K5PU) | Siwen Wang, Yuanshu Wang, Emily Weng, Andrew Wu |
| Team 15 | Emma | [#bst-260-project-team15](https://harvard.enterprise.slack.com/archives/C09N6P9TXLG) | Kai Wu, Yanting Wu, Zhentian Wu, Baoyue Xing |
| Team 16 | Jing | [#bst-260-project-team16](https://harvard.enterprise.slack.com/archives/C09M91DAJ5T) | Lavinia Xu, Xinyu Xu, Xu Yan, Cuiqiyun Yang |
| Team 17 | Jing | [#bst-260-project-team17](https://harvard.enterprise.slack.com/archives/C09LX1ELXT9) | Yuntian Yang, Haochen Ye, Haihan Yuan, Yiwei Yun |
| Team 18 | Jing | [#bst-260-project-team18](https://harvard.enterprise.slack.com/archives/C09MCF89VQA) | Irene Zhang, Iris Zhang, Yiyang Zhang, Zihan Zhang |
| Team 19 | Jing | [#bst-260-project-team19](https://harvard.enterprise.slack.com/archives/C09M636P8UB) | Johnny Zhao, Mengze Zhao, Tianyu Zhao, Yinuo Zhao |
| Team 20 | Jing | [#bst-260-project-team20](https://harvard.enterprise.slack.com/archives/C09MG1PK9E0) | Zifan Zhao, Junyi Zhou, Zi Zhu |
**Note**: All project communication should happen in this channel with your TF.
:::
**Grading**: The final project accounts for 20% of your course grade, divided as follows:
- **Final Project Report**: 10% of course grade
- **Oral Presentation**: 10% of course grade
## GitHub Repository Setup
### Creating Your Team Repository
**Repository Naming Convention**: `bst-260-2025F-team#` (where \# is your team number, e.g., `bst-260-2025F-team1`, `bst-260-2025F-team15`)
**Setup Process**:
1. **One team member** creates the repository on GitHub with the exact naming convention above
2. Make the repository **private**
3. Add all team members as collaborators with **write access**
4. Add your assigned TF as a collaborator with **write access**
5. Set up the initial directory structure (see requirements below)
### Notifying Your TF
Once your repository is created:
1. Post the repository URL in your TF's Slack channel (see table above)
2. Include your team number and all team member names
3. Your TF will confirm access and provide initial feedback
**Example Slack message**:
```
Team 5 Repository: https://github.com/username/bst-260-2025F-team5
Members: Siwei Guo, Hannah Hamling, Shuying Han, Runpeng Hu
Ready for initial review of `Outline.txt`
```
## Communication Guidelines
**All project communication must go through your assigned TF using your team's private Slack channel**:
- Your TF is your primary point of contact for all project matters
- TFs will approve your `Outline.txt` and provide ongoing feedback
- **TFs will review and approve your final project submission**
- TFs will conduct your oral evaluation
- Only your TF can escalate issues to instructors if necessary
## Project Framework: NHANES Data Analysis
All teams will work with the **NHANES (National Health and Nutrition Examination Survey)** dataset using the `NHANES` R package. You must choose between two versions and justify your choice:
- **NHANES**: A probability sample from the US population (recommended for most analyses)
- **NHANESraw**: Survey-weighted population data (recommended for teams interested in epidemiological methods using survey weights)
**Project Differentiation**: Each team must have a distinct project through different **age groups**, **outcomes**, or **methodological approaches**.
## Project Planning Requirements
**Within one week of team assignment**, you must:
1. Create your GitHub repository (see setup instructions above)
2. Choose your specific project focus (see options below)
3. Create a file called `Outline.txt` (plain text) in your GitHub repository with:
- Your analysis plan and research questions
- Team member responsibilities for each activity
- Weekly breakdown of tasks and deliverables
4. Notify your TF via Slack when ready for review and feedback
You should update this outline document throughout the project to track progress and any changes to your plan.
## Project Timeline & Deadlines
| Milestone | Deadline | Responsibility |
|:----------|:---------|:---------------|
| **`Outline.txt` Submission** | **October 31, 2025** | Teams post in their Slack channel |
| **`Outline.txt` Feedback** | **November 7, 2025** | TFs provide approval/revision requests |
| **Revised Outline (if needed)** | **November 12** | Teams address TF feedback |
| **Oral Presentation Scheduling** | **December 12, 2025** | Teams schedule with TF for December 16-19 window |
| **Final Project Submission** | **December 15, 2025** | Teams notify TF when repository is complete |
**Important**: All deadlines are firm. Late submissions will result in grade penalties.
## Available Project Approaches
### Option 1: Age-Specific Health Analysis
Focus on one of these age groups with appropriate health outcomes:
- **Ages 2-18**: Growth patterns, childhood health indicators (note: different BMI/growth relationships than adults)
- **Ages 19-40**: Young adult health patterns, lifestyle factors
- **Ages 40+**: Disease risk factors, aging-related health patterns
**Key considerations**: Include important covariates (age, sex, race) and their interactions. Consult relevant literature for risk factors specific to your chosen age group and outcome.
### Option 2: Statistical Methodology Focus
Instead of focusing on specific health outcomes, examine statistical approaches:
- **Multiple imputation methods** for handling missing data patterns
- **Spline models** for interpretable non-linear associations with continuous variables
- **Survey methods** for population-level estimates and inference
- **Machine learning approaches** (logistic regression, random forests) for data structure exploration
- **Interactive applications** using Shiny/Posit Connect for data exploration tools
### Option 3: Missing Data Analysis
Investigate missing data patterns in NHANES:
- Understand why observations are missing for different variables
- Apply modern missing data methods
- Compare analytical approaches under different missingness assumptions
### Option 4: Custom Research Question
Propose your own research question using NHANES data, subject to TF approval. Must demonstrate clear analytical approach and feasibility.
## Important Data Considerations
**Missing Values**: Many NHANES variables have substantial missing data. Consider whether this affects your analysis or could be the focus of your study.
**Outcome Variables**: Examples include systolic blood pressure, diabetes risk, depression scores, sleep patterns, smoking status, BMI. Continuous outcomes may be easier to model initially.
## Project Submission Requirements
You will submit your project using Git. Your project should be completely reproducible, meaning all the code and data needed to render your report from scratch should be in the repository.
**Required Submissions to your TF:**
1. **GitHub Repository**: Submit the link to your team's GitHub repository with all components below
2. **Oral Presentation**: Each team must schedule a 20-minute oral presentation with their assigned TF via Zoom. **Scheduling must be completed by December 12, 2025,** for presentations during **December 16-19, 2025**. **All team members must be present** during the presentation. The TF will ask each team member specific questions about different aspects of the project based on their individual contributions detailed in the contribution summary document (e.g., if a member contributed to data analysis, they may be asked about model choice, coding decisions, statistical methods, etc.).
### Oral Presentation Evaluation Rubric
The oral presentation will be evaluated as a **group grade** out of 10 points based on the following criteria:
| Score | Criteria |
|:------|:---------|
| 0-1 | No meeting scheduled or major absence of team members |
| 2-3 | Limited understanding of project components; significant gaps in explanations |
| 4-5 | Moderate understanding of project components; difficulty explaining individual contributions/methodological choices |
| 6-7 | Good understanding of project components; able to explain most individual contributions/methodological choices |
| 8-9 | Very good understanding of all project components; strong defense of analytical approaches with minimal gaps |
| 10 | Excellent understanding of all project components; exceptional ability to defend and discuss all aspects of the analysis |
### Final Report Evaluation Rubric
The final report will be evaluated as a **group grade** out of 10 points based on the following criteria:
| Score | Criteria |
|:------|:---------|
| **9-10** (Excellent) | Clear, well-structured report with sophisticated analysis. |
| **8-8.9** (Very Good) | Well-written report with good analysis. |
| **7-7.9** (Good) | Adequate report structure with acceptable analysis. |
| **6-6.9** (Satisfactory) | Report meets basic requirements but has several areas for improvement. |
| **4-5.9** (Needs Improvement) | Report has significant structural problems or analysis errors. |
| **0-3.9** (Inadequate) | Report does not meet basic requirements. |
**Detailed Criteria:**
**9-10 Points (Excellent):** Clear, well-structured report with sophisticated analysis. All sections meet word count requirements. Exceptional use of statistical methods, excellent data visualization, and insightful interpretation. Professional formatting with proper citations. Demonstrates deep understanding of NHANES data and chosen methodology.
**8-8.9 Points (Very Good):** Well-written report with good analysis. Minor issues in structure or presentation. Good use of statistical methods and visualizations. Most sections meet requirements with solid interpretation of results.
**7-7.9 Points (Good):** Adequate report structure with acceptable analysis. Some sections may be slightly under/over word count. Basic statistical methods applied correctly. Visualizations present but could be improved. Interpretation shows understanding but lacks depth.
**6-6.9 Points (Satisfactory):** Report meets basic requirements but has several areas for improvement. Statistical analysis is basic or contains minor errors. Limited interpretation of results. Some formatting or structural issues.
**4-5.9 Points (Needs Improvement):** Report has significant structural problems or analysis errors. Poor data visualization or interpretation. Major sections missing or substantially under word count. Limited understanding of methodology.
**0-3.9 Points (Inadequate):** Report does not meet basic requirements. Major analysis errors, missing key sections, or demonstrates poor understanding of the data and methods. Unprofessional presentation.
### TF Feedback Process
**Your TF will provide detailed feedback and final grades through GitHub Issues:**
1. **After your final submission**, your TF will create Issues in your repository for:
- Overall project feedback and final grade breakdown
- Specific comments on analysis, methods, or presentation
- Individual contribution assessment based on commit history and contribution files
2. **Grade breakdown will include**:
- Final Report Score (out of 10 points)
- Oral Presentation Score (out of 10 points)
- Overall Final Project Grade calculation
3. **Teams can respond** to TF feedback through Issue comments if clarification is needed
**Note**: Check your repository for Issues after the final project deadline for comprehensive feedback and grades.
## Report Structure
You will prepare a comprehensive report following the style of an academic paper. This report will be divided into the following five structured sections, with approximate word counts to help you reach a target of 2,500 to 3,000 words, up to four figures and up to two tables.
### Abstract (150-200 words) {.nonumber}
- **Purpose**: The abstract provides a concise summary of your project, including its objectives, key findings, and significance. Write this section last, after completing all other sections, to accurately reflect your project's focus and main results.
- **Guidelines**: Limit this section to 150-200 words. Briefly outline the purpose of your study, the approach you used, and the primary results and conclusions. The abstract should be clear, succinct, and give readers an immediate understanding of what your project entails.
### Introduction (500-600 words) {.nonumber}
- **Purpose**: The introduction sets the stage for your project, presenting the background and rationale for your analysis. Explain why the topic is significant and justify your choice of NHANES vs. NHANESraw.
- **Guidelines**: Start with a broad overview of the topic, gradually narrowing down to your specific focus. Conclude with a clear statement of your research questions, hypotheses, or objectives. Use 2-3 paragraphs to establish a solid foundation for the rest of the paper.
### Methods (600-700 words) {.nonumber}
- **Purpose**: This section details the data sources, methods, and analytical techniques you used to conduct your analysis. It should be specific enough that someone else could replicate your study using the same resources and approach.
- **Guidelines**: Describe the NHANES dataset version you used and justify your choice. Outline your approach for cleaning and analyzing the data, including any statistical or computational methods applied. Clearly explain any assumptions or limitations in your approach, particularly regarding missing data handling.
### Results (500-600 words) {.nonumber}
- **Purpose**: The results section presents the main findings of your analysis without interpretation. Organize the data logically to highlight key insights, using tables, figures, and charts to illustrate trends and comparisons.
- **Guidelines**: For each result, briefly describe it and refer to relevant visuals or tables where appropriate. Do not provide explanations or discuss implications in this section; focus only on presenting the findings clearly and accurately.
### Use of AI Tools
Students are welcome to use AI tools as a complementary aid, but they must clearly state in their report where AI was used (e.g., text generation, editing, data analysis suggestions, or AI-assisted conclusions). AI should serve as a productivity and learning tool, not as the primary author of the report.
### Discussion (600-700 words) {.nonumber}
- **Purpose**: In the discussion, interpret the significance of your findings, explore potential implications, and relate the results back to your initial research questions or hypotheses. This section allows you to discuss any patterns, unexpected findings, or limitations and suggest possible future research.
- **Guidelines**: Analyze your results in the context of your research question and relevant health literature. Consider what your findings reveal, any limitations they may have (particularly regarding missing data or survey design), and how they might impact future work or policy. End with a brief conclusion summarizing your main insights.
Your final report should be professionally formatted, with each section clearly labeled and referenced. Aim for clarity, precision, and a well-organized presentation of your analysis.
**Total Word Count**: Approximately 2,500-3,000 words.
## Supplementary Methods (no limit) {.nonumber}
You can include a **separate document** titled Supplementary Methods.
- **Purpose**: Share any mathematical derivations, data visualizations, or tables needed to justify the choices described in the Methods Section. You can also provide further support for the claims made in the Results Section. You can refer to this document in the main report.
- **Guidelines**: There is no limits in the length of this section nor on the number of figures and tables. However, be careful not to drown the graders with too much information.
## GitHub Repository Requirements {.nonumber}
Your repository must include:
- **Directory structure**: `code`, `data`, and `docs` directories
- **Analysis scripts**: At least one script for data wrangling in the `code` directory
- **Main report**: One file called `final-project.qmd` that renders to produce the final report (can be in `code` or home directory)
- **README file**: Explaining how to reproduce all results
- **Project outline**: `Outline.txt` with your initial plan and progress updates
- **Individual contribution files**: Each team member creates a file with their name documenting their specific contributions and effort (hours/week)
- **Data handling**: Include code that loads NHANES data via the R package (no need to store raw data)
**Git Requirements**: *We expect to see at least five meaningful commits by each person* demonstrating collaborative development throughout the project timeline.