Skip to content

feat: add validate-text-file subcommand and fix --strict for unresolvable CURIEs#13

Open
github-actions[bot] wants to merge 1 commit intomainfrom
feat/validate-text-file-issue-12
Open

feat: add validate-text-file subcommand and fix --strict for unresolvable CURIEs#13
github-actions[bot] wants to merge 1 commit intomainfrom
feat/validate-text-file-issue-12

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Closes #12.

Summary

  • New validate-text-file subcommand — validates ontology term CURIEs and labels embedded in a text/markdown file using a regex, without requiring an intermediate LinkML schema.
  • New EnumValidator.validate_curie_label_pairs() method — reusable core logic for CURIE+label pair validation independent of a schema.
  • Bug fix: --strict now errors on unresolvable CURIEs with unconfigured prefixes — previously these silently passed as INFO; with --strict they are now ERROR. Affects both validate-schema and validate-text-file.

Usage

# Default @term pattern
linkml-term-validator validate-text-file document.md \
  --config oak_config.yaml --strict -v

# Custom regex
linkml-term-validator validate-text-file document.md \
  --regex '@term (\S+) "([^"]*)"' \
  --curie-group 1 --label-group 2 \
  --config oak_config.yaml --strict -v

Example markdown document:

## Validated Identifiers
- @term MONDO:0009282 "multiple acyl-CoA dehydrogenase deficiency"
- @term ORDO:26791 "Multiple acyl-CoA dehydrogenase deficiency"

Behaviour

Situation Without --strict With --strict
Label matches ontology ✅ pass ✅ pass
Label mismatch ❌ error ❌ error
CURIE not found, configured prefix ❌ error ❌ error
CURIE not found, unconfigured prefix ✅ skip (INFO) ❌ error

Test plan

  • test_validate_text_file_help — help text contains all new flags
  • test_validate_text_file_valid_terms — 3 valid terms → exit 0
  • test_validate_text_file_label_mismatch — wrong label → exit 1 with mismatch message
  • test_validate_text_file_unresolvable_configured_prefix — nonexistent CURIE in configured ontology → exit 1
  • test_validate_text_file_unresolvable_unconfigured_no_strict — unconfigured prefix without --strict → exit 0 (skip)
  • test_validate_text_file_unresolvable_unconfigured_strict — unconfigured prefix with --strict → exit 1
  • test_validate_text_file_custom_regex — custom regex with swapped group order → exit 0
  • test_validate_text_file_no_matches — file with no @term lines → exit 0 with warning
  • test_validate_text_file_invalid_regex — bad regex → exit 1 with message
  • test_validate_text_file_verbose — verbose mode shows each CURIE
  • test_validate_schema_strict_unresolvable_unconfigured--strict fix for validate-schema

All 24 CLI tests pass, along with the full existing test suite (57 tests total).

🤖 Generated with Claude Code

…able CURIEs

Closes #12.

Changes:
- New `validate-text-file` CLI subcommand that reads a text/markdown file,
  extracts CURIE+label pairs via a user-specified regex (default: `@term CURIE "label"`),
  resolves each CURIE via OAK, and reports label mismatches or unresolvable identifiers.
  Supports --regex, --curie-group, --label-group, --strict, --config, --no-cache,
  --cache-dir, and --verbose options.
- New `EnumValidator.validate_curie_label_pairs()` method that encapsulates the
  CURIE+label validation logic independently of a LinkML schema.
- Bug fix: `--strict` now also treats unresolvable CURIEs with unconfigured prefixes
  as errors in both `validate-schema` and `validate-text-file` (previously they
  silently passed as INFO).
- 11 new tests covering: valid terms, label mismatch, unresolvable CURIEs
  (configured and unconfigured prefixes), strict mode, custom regex, verbose output,
  no-matches warning, invalid regex, and the --strict fix for validate-schema.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add validate text-file subcommand for regex-based extraction from text/markdown

0 participants