Skip to content

Conversation

@khaeru
Copy link
Contributor

@khaeru khaeru commented Jan 15, 2026

  • Add new CLI command tdc check-record.
    This command retrieves the metadata for a given record from the TDC CKAN API and checks/prints some properties of the record. For example:

    $ tdc check-record registered-vehicles-in-tanzania
    - https://portal.transport-data.org/@ministry of works and transport of tanzania/registered-vehicles-in-tanzania
    - Title: 'Registered Vehicles in Tanzania'
    - Category: tdc_formatted
    
    - Number of files by extension: 1 .csv, 2 .pdf, 1 .xlsx
    - Number of data files: 2
    - Number of possible SDMX-CSV files: 1
    
    Criteria for a TDC Formatted record:
    - At least one file in CSV format: True
    - Correct category assigned: True
    - CSV file(s) are in SDMX-CSV format (not implemented yet): True
    - Overall: YES
    
    Criteria for a TDC Harmonized record—all of the above, plus:
    - Correct category assigned: False
    - Overall: NO
    • Retrieve and cache the data file(s).
    • Check the contents of data file(s).
  • Filter/ignore the warning due to Replace pkg_resources with importlib.metadata ckan/ckanapi#218. The upstream PR was merged, but the package has not been released since 2024, so the warning still appears.

How to review

  • Try running tdc check-record for some known record IDs.
    • Report whether the outputs appear correct.

PR checklist

  • Checks all ✅
  • Update documentation
  • Update doc/whatsnew.rst

@khaeru khaeru self-assigned this Jan 15, 2026
@khaeru khaeru added the enh New feature or request label Jan 15, 2026
khaeru added 10 commits January 26, 2026 15:07
- Convert "resources" collection to instances of Resource.
- Add portal_url() method.
- Add type hints for known members/attributes.
- Add .fetch() method.
- Add type hints for known attributes.
- Use in existing modules.
- Reduce MODULES_WITH_CLI to internal/non-provider modules.
- Use sub-paths from registry in .is_available() call.
- Handle ConnectionError/HTTPSConnectionPool max retries exceeded.
  This may be caused by repeated queries to incorrect URLs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enh New feature or request

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant