Skip to content

Conversation

@stephhazlitt
Copy link
Member

@stephhazlitt stephhazlitt commented Jan 10, 2025

Changes to address Issue #353:

  • Adds facet.limit=1000 to bcdc_search_facets() (which uses the packages_search endpoint) so that >50 organizations are pulled with bcdc_search_facets("organizations") and bcdc_list_organizations().
  • Changes bcdc_list_group_records() to use group_package_show endpoint instead of group_show as the latter was returning a truncated list (10) of a group's datasets (even though the docs suggest it should return 1000). NOTE: This change results in a returned object of the same class and with almost the same set of columns (plus replacement_record and retention expiry_date cols and minus support_url col).
  • Changes bcdc_list_organization_records() to use package_search instead of organization_show endpoint, since organization_show endpoint returns a truncated list (10) of a group's datasets (as per the docs) and unlike group there is no organization_package_show ☹️.

@stephhazlitt
Copy link
Member Author

stephhazlitt commented Jan 14, 2025

I notice we don't have any tests for these helper functions and thinking it might be useful to add one/s, however with the number of groups/organizations and number of records per group/organization always changing it might be a bit fiddly. We could get the dataset/package count from bcdata::bcdc_list_organizations() and test that the group/organization with the highest count at any given pull matches the rows returned for that group/organization withbcdata::bcdc_list_organization_records(organization-name)?

@stephhazlitt stephhazlitt changed the title WIP: various API end point use fixes for issue #353 Various API end point use fixes for issue #353 Jan 14, 2025
Copy link
Collaborator

@ateucher ateucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful work @stephhazlitt! Great sleuthing, and great fixes.

It's a pretty quick review (I only tested it locally quite quickly), but a few questions for you inline...

As for tests, your idea is pretty clever, that might work well. Even basic tests that the classes of the returned object are correct are helpful too. And maybe to address the bug specifically, use a facet/group/org that we know has > 50/10/10 records, and write a test for nrow > 50/10/10?

Will you add a NEWS item as well?

One other question, probably out of scope for this PR but thought I'd ask: Did you find a way to find the list of facets available to search? I wonder if this list: c("license_id", "download_audience", "res_format", "publish_state", "organization", "groups") is the full list now?

@stephhazlitt
Copy link
Member Author

Thanks @ateucher. I'll explore further changes based on your feedback, add some tests as per the discussion and update the NEWS.md file and then ask for a new review.

Copy link
Collaborator

@boshek boshek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! My only comments are that we should expose all these as options rather than arguments or hard coded values. Thanks @stephhazlitt!

@stephhazlitt
Copy link
Member Author

One other question, probably out of scope for this PR but thought I'd ask: Did you find a way to find the list of facets available to search? I wonder if this list: c("license_id", "download_audience", "res_format", "publish_state", "organization", "groups") is the full list now?

I spent some time trying to solve this but no luck. In the end, after trying and failing with many queries, ChatGPT advised that most often the GUI will include all enabled search facets in the right hand column—by this logic, we have them all. The list depends on how the CKAN instance is configured on the server, so I could ask the catalogue team to confirm.

@stephhazlitt stephhazlitt requested a review from ateucher January 19, 2025 05:38
Copy link
Collaborator

@ateucher ateucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome @stephhazlitt thank you! One suggested change about the nrows comparison, and a mild suggestion to add tests for the options... if you don't have time for that though I think that's okay and you can go ahead and merge.

Copy link
Collaborator

@boshek boshek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If you don't end up making option specific tests (which is totally fine) I think an issue would be good to create.

Copy link
Collaborator

@ateucher ateucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect @stephhazlitt, let's get this merged!

@stephhazlitt stephhazlitt merged commit 456d6fc into main Jan 21, 2025
7 checks passed
@stephhazlitt stephhazlitt deleted the 353-fix branch January 21, 2025 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants