-
Notifications
You must be signed in to change notification settings - Fork 0
Claude Code Review and Repository Feedback #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
quickcoffee
wants to merge
2
commits into
master
Choose a base branch
from
claude/repo-feedback-review-011CUrGqY9DzZhYDigRxr8TB
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,57 @@ | ||
| # R user files | ||
| .Rproj.user | ||
| .Rhistory | ||
| .RData | ||
| .Rdata | ||
|
|
||
| # OAuth tokens | ||
| .httr-oauth | ||
|
|
||
| # knitr and R markdown default cache directories | ||
| *_cache/ | ||
| /cache/ | ||
|
|
||
| # Temporary files created by R markdown | ||
| *.utf8.md | ||
| *.knit.md | ||
|
|
||
| # R Environment Variables | ||
| .Renviron | ||
|
|
||
| # pkgdown site | ||
| docs/ | ||
| doc/ | ||
| Meta/ | ||
|
|
||
| # translation temp files | ||
| po/*~ | ||
|
|
||
| # RStudio files | ||
| .Rproj.user/ | ||
| *.Rproj | ||
|
|
||
| # produced vignettes | ||
| vignettes/*.html | ||
| vignettes/*.pdf | ||
|
|
||
| # R check outputs | ||
| *.Rcheck/ | ||
|
|
||
| # Package build artifacts | ||
| *.tar.gz | ||
| *.tgz | ||
|
|
||
| # MacOS | ||
| .DS_Store | ||
|
|
||
| # GitHub dependencies | ||
| depends.Rds | ||
| .github/depends.Rds | ||
| .github/R-version | ||
|
|
||
| # IDE | ||
| .vscode/ | ||
| .idea/ | ||
|
|
||
| # Test outputs | ||
| tests/testthat/_snaps/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| # Generated by roxygen2: do not edit by hand | ||
|
|
||
| export("%>%") | ||
| export(scrape_coronavirusupdate) | ||
| importFrom(magrittr,"%>%") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| # coronavirusupdate 0.0.1.9000 (Development version) | ||
|
|
||
| ## Major Improvements | ||
|
|
||
| ### Code Quality & Reliability | ||
| * Added comprehensive error handling to all scraping functions with informative error messages | ||
| * Implemented input validation across all functions | ||
| * Added graceful handling of NULL inputs and empty results | ||
| * Scraping functions now provide detailed warnings when extraction fails | ||
|
|
||
| ### Documentation | ||
| * Added complete roxygen2 documentation to all functions | ||
| * Improved function descriptions with parameter details and return values | ||
| * Added usage examples and implementation details | ||
| * All internal functions now properly documented with @keywords internal | ||
|
|
||
| ### Testing | ||
| * Set up testthat testing framework | ||
| * Added unit tests for all extraction functions | ||
| * Added data validation tests to ensure data quality | ||
| * Added input validation tests for main scraping function | ||
| * Created test suite for edge cases and error handling | ||
|
|
||
| ### GitHub Actions & Automation | ||
| * Updated all GitHub Actions to latest versions (checkout@v4, cache@v3, setup-r@v2) | ||
| * Improved workflow to skip commits when no new data is available | ||
| * Enhanced commit messages to show number of new episodes added | ||
| * Added helper script to count new episodes for informative commit messages | ||
|
|
||
| ### Package Infrastructure | ||
| * Updated .gitignore with standard R package exclusions | ||
| * Added NEWS.md for tracking package changes | ||
| * Updated DESCRIPTION with testthat dependency | ||
| * Improved RoxygenNote to version 7.2.3 | ||
|
|
||
| ### Data Quality | ||
| * Maintained existing speaker name normalization | ||
| * Preserved incremental scraping functionality | ||
| * Kept multi-format output support (RDS, RDA, Parquet) | ||
|
|
||
| ## Bug Fixes | ||
| * Fixed potential crashes from NULL HTML responses | ||
| * Improved handling of malformed or changed website structure | ||
| * Better error messages for debugging scraping failures | ||
|
|
||
| --- | ||
|
|
||
| # coronavirusupdate 0.0.1 | ||
|
|
||
| ## Initial Release | ||
|
|
||
| * Initial package release | ||
| * Scraping functionality for NDR Coronavirus-Update podcast transcripts | ||
| * Incremental scraping support (only fetches new episodes) | ||
| * Speaker name normalization | ||
| * Multiple output formats (RDS, RDA, Parquet) | ||
| * Automated weekly updates via GitHub Actions | ||
| * Tidy data format with one row per paragraph | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,38 @@ | ||
| # TODO get episode length from iframe player rather than html_node | ||
| #' Extract episode length from episode HTML | ||
| #' | ||
| #' Extracts the duration of a podcast episode from the HTML content. | ||
| #' The duration is typically found in parentheses in the h2 element. | ||
| #' | ||
| #' @param .episode_html An xml_document object containing the episode HTML, | ||
| #' typically obtained via \code{xml2::read_html()} | ||
| #' | ||
| #' @return A character string containing the episode duration, or NA_character_ | ||
| #' if the duration cannot be extracted | ||
| #' | ||
| #' @note TODO: Consider getting episode length from iframe player rather than html_node | ||
| #' for more reliable extraction | ||
| #' | ||
| #' @keywords internal | ||
| extract_episode_length <- function(.episode_html) { | ||
| .episode_html %>% | ||
| rvest::html_node(css = ".textcontent h2") %>% | ||
| rvest::html_text() %>% | ||
| stringr::str_extract(pattern = "(?<=\\().{2,20}(?=\\)$)") | ||
| tryCatch({ | ||
| if (is.null(.episode_html)) { | ||
| warning("Episode HTML is NULL, returning NA for episode length") | ||
| return(NA_character_) | ||
| } | ||
|
|
||
| result <- .episode_html %>% | ||
| rvest::html_node(css = ".textcontent h2") %>% | ||
| rvest::html_text() %>% | ||
| stringr::str_extract(pattern = "(?<=\\().{2,20}(?=\\)$)") | ||
|
|
||
| if (is.na(result) || length(result) == 0) { | ||
| warning("Could not extract episode length from HTML") | ||
| return(NA_character_) | ||
| } | ||
|
|
||
| return(result) | ||
| }, error = function(e) { | ||
| warning(paste("Error extracting episode length:", e$message)) | ||
| return(NA_character_) | ||
| }) | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,37 @@ | ||
| #' Extract last change date from episode HTML | ||
| #' | ||
| #' Extracts and parses the last modification date of a podcast episode transcript | ||
| #' from the HTML content. The date is parsed from German date format. | ||
| #' | ||
| #' @param .episode_html An xml_document object containing the episode HTML, | ||
| #' typically obtained via \code{xml2::read_html()} | ||
| #' | ||
| #' @return A POSIXct datetime object representing when the transcript was last | ||
| #' modified, or NA if the date cannot be extracted or parsed | ||
| #' | ||
| #' @keywords internal | ||
| extract_last_change <- function(.episode_html) { | ||
| .episode_html %>% | ||
| rvest::html_node(css = ".lastchanged") %>% | ||
| rvest::html_text() %>% | ||
| stringr::str_remove(pattern = "[:alpha:]+[:punct:]") %>% | ||
| stringr::str_remove(pattern = "Uhr") %>% | ||
| stringr::str_squish() %>% | ||
| lubridate::dmy_hm() | ||
| tryCatch({ | ||
| if (is.null(.episode_html)) { | ||
| warning("Episode HTML is NULL, returning NA for last change date") | ||
| return(lubridate::as_datetime(NA)) | ||
| } | ||
|
|
||
| result <- .episode_html %>% | ||
| rvest::html_node(css = ".lastchanged") %>% | ||
| rvest::html_text() %>% | ||
| stringr::str_remove(pattern = "[:alpha:]+[:punct:]") %>% | ||
| stringr::str_remove(pattern = "Uhr") %>% | ||
| stringr::str_squish() %>% | ||
| lubridate::dmy_hm() | ||
|
|
||
| if (is.na(result)) { | ||
| warning("Could not parse last change date from HTML") | ||
| } | ||
|
|
||
| return(result) | ||
| }, error = function(e) { | ||
| warning(paste("Error extracting last change date:", e$message)) | ||
| return(lubridate::as_datetime(NA)) | ||
| }) | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,11 +1,34 @@ | ||
| #' Extract speaker names from transcript nodes | ||
| #' | ||
| #' Extracts and cleans speaker names from HTML transcript nodes. Speaker names | ||
| #' are identified by strong tags and specific text patterns (capitalized text | ||
| #' ending with a colon). Includes manual fixes for known edge cases. | ||
| #' | ||
| #' @param .transcript_nodes An xml_nodeset containing the transcript paragraph | ||
| #' nodes, typically obtained via \code{extract_transcript_nodes()} | ||
| #' | ||
| #' @return A character vector of speaker names, with NA for paragraphs without | ||
| #' identified speakers, or an empty character vector if extraction fails | ||
| #' | ||
| #' @keywords internal | ||
| extract_speaker_name <- function(.transcript_nodes) { | ||
| rvest::html_node(x = .transcript_nodes, xpath = "strong") %>% | ||
| rvest::html_text(trim = TRUE) %>% | ||
| stringr::str_squish() %>% | ||
| stringr::str_extract(pattern = "^[:upper:][:alpha:]+.+\\:$") %>% | ||
| stringr::str_remove(pattern = ":") %>% | ||
| # manual fix for episode 38 | ||
| stringr::str_replace(pattern = "Eine Bitte an unsere HΓΆrer", replacement = "Korinna Hennig") %>% | ||
| stringr::str_squish() %>% | ||
| dplyr::na_if(y = "") | ||
| tryCatch({ | ||
| if (is.null(.transcript_nodes) || length(.transcript_nodes) == 0) { | ||
| warning("Transcript nodes are NULL or empty, returning empty character vector") | ||
| return(character(0)) | ||
| } | ||
|
|
||
| rvest::html_node(x = .transcript_nodes, xpath = "strong") %>% | ||
| rvest::html_text(trim = TRUE) %>% | ||
| stringr::str_squish() %>% | ||
| stringr::str_extract(pattern = "^[:upper:][:alpha:]+.+\\:$") %>% | ||
| stringr::str_remove(pattern = ":") %>% | ||
| # manual fix for episode 38 | ||
| stringr::str_replace(pattern = "Eine Bitte an unsere HΓΆrer", replacement = "Korinna Hennig") %>% | ||
| stringr::str_squish() %>% | ||
| dplyr::na_if(y = "") | ||
| }, error = function(e) { | ||
| warning(paste("Error extracting speaker names:", e$message)) | ||
| return(character(0)) | ||
| }) | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NEWS.md states RoxygenNote was updated to 7.2.3, but DESCRIPTION file shows 7.3.0, creating an inconsistency. These version numbers should match.