Mindpetal DC Metro Ridership Challenge: Karen Li, Cadence Cheng, Ilayda Dogan, Hellen Ou
Presentation Link: https://docs.google.com/presentation/d/1pGZ0ZPhntu_YJR_VRc4jg5CCGpK4VH-hrabetFci150/edit?usp=sharing
This repository includes:
- README
- CSV Files with data: data.csv, df.csv, dt.csv
- Data Analysis and Visualizations Notebooks: Mindpetal_Data_Visualizations.ipynb, IC25.R, Starter_Notebook.ipynb
Thousands of people ride the metro every day. People travel for different reasons, and, as college students who have our own reasons to use the metro, we wanted to take a deeper look at what it means to residents by exploring what motivates riders.
To this end, we examined variables like date, time, station, day of the week, and temperature (from NOAA) to investigate how they affect ridership. We used numpy and pandas to merge the data and input a column for temperature. We used matplotlib, seaborn, and R to visualize data trends. We found many interesting trends and outliers. Weekdays had consistently higher ridership, with the commuter riders of the week outweighing leisure riders of the weekends. We noticed a spike in activity on June 5th which we isolated to a baseball game. We also found a significant drop in entries on December 21st across all stations, a reflection of the holidays. In addition, we examined specific stations to determine if any didn’t follow general patterns. We discovered that Dulles and Reagan Airport stations didn’t follow the general weekend/weekday patterns, due to airports not following a weekday/weekend work schedule.
The metro is a key part of our DMV culture. On weekdays, it serves as transport for commuters. On weekends, the metro is how families and friends travel for fun. On special occasions, the metro supports different DC cultural/sporting activities. The metro is the heart of the DMV community, and supporting it is more important than ever.
- Python: Numpy, Pandas, Matplotlib, Seaborn
- R: ggplot2
- RStudio, Google Colab, Jupyter Notebooks, Visual Studio Code