In this class, students were challenged to present data-driven problems and solutions. The course was driven by the production and presentation of data science products and feedback.
Dataset: The dataset being utilized to answer the general question and test the hypotheses listed above is from a 2014 survey conducted by the Open Sourcing Mental Illness organization. 19 This organization aims to change the way mental health is discussed in the tech community, addressing stigmatization and providing necessary resources to those suffering from mental health disorders. In the survey, which was conducted in 2014, several responses are gathered in addition to the demographics of participants in a result to measure attitudes towards mental health and frequency of mental health disorders in the tech workplace.
Research Question: What are the strongest predictors dictating whether or not an employee will discuss health concerns with their direct supervisor(s)?
The research question was explored with a hyperparameter tuned Random Forest model.
You can find a more detailed description of this project at this link: https://kss7yy.github.io/ds_4002/Week_One/Presentations/Albini_Annaparedy_Sarnaik_Vellayan_DS4002_W1_Code.html
Dataset: The dataset being utilized in this project is a sub-sample of a dataset provided by the NIH containing 112,120 Chest X-Ray images obtained from 30,805 unique patients. The labels in the original dataset were generator through the use of Natural Language Processing techniques to text-mine disease classifications from the associated radiological reports. The labels are expected to be approximately 90% accurate and suitable for weakly-supervised learning.
Research Question: How do changes in class imbalances affect the performance of a state-of-the-art Convolutional Neural Network model in classifying Chest X-Ray Images?
The research question was explored with SOTA model architectures such as VGG16, ResNet50, DenseNet, and custom models.