This repository was archived by the owner on Jun 1, 2023. It is now read-only.

Description
I noticed that seg_id_nat 1638 had a high RMSE for the RGCN model (RMSE = 6.13) after I updated with the most recent data. This is right below the Neversink. Looks like there are multiple monitoring sites on this reach, and older sites from Ecosheds were further downstream than the new USGS site, which is capturing colder dynamics from the reservoir.

We may want to reconsider what data we're keeping, particularly at these reservoir sites where different places along the reach can have really different temperature signals. I don't think this is the reason this site is doing so bad (I think it's doing poorly because the model is clearly not picking up the fact that there is reservoir influence, and I think EcoSheds data was added after this model was trained, so the model didn't get a chance to see any of the NYCDEC data):
compare <- feather::read_feather('3_predictions/out/compare_predictions_obs.feather')
compare <- filter(compare, seg_id_nat %in% '1638') %>%
filter(!is.na(rgcn2_full_temp_c)) %>%
filter(date <= as.Date('2008-01-01') & date >= as.Date('2005-12-31'))
ggplot(compare, aes(x = date, y = rgcn2_full_temp_c)) +
geom_line() +
geom_point(data = compare, aes(x = date, y = mean_temp_c))
