Skip to content

Conversation

@alinakbase
Copy link
Collaborator

This refactor introduces a cleaner, modular, and more maintainable architecture for UniProt data parsing within the cdm_data_loader_utils package. The new design separates concerns across multiple parser components, centralizes shared identifier extraction, enhances XML utilities, and adds a comprehensive test suite to ensure long-term stability.

Key improvements include:
• Modular parser structure under cdm_data_loader_utils/parsers/
• Unified shared identifier extraction (shared_identifiers.py)
• Robust XML parsing utilities (xml_utils.py)
• Refactored UniProt parser (uniprot.py) with clearer logic paths
• Complete tests for UniProt refactor, including:
• shared identifiers
• XML utilities
• UniProt entry parsing
• Cleaner directory layout aligned with CDM conventions

This refactor provides a foundation for future expansion (features, evidence, associations, and publications) while improving maintainability and reducing duplicated logic.

@ialarmedalien ialarmedalien changed the base branch from main to develop December 4, 2025 16:27
@ialarmedalien ialarmedalien changed the base branch from develop to main December 4, 2025 16:30
@ialarmedalien ialarmedalien force-pushed the uniprot-refactor-v2 branch 2 times, most recently from 3b89f65 to 2e45b47 Compare December 4, 2025 17:03
@ialarmedalien ialarmedalien changed the base branch from main to develop December 4, 2025 17:03
@ialarmedalien ialarmedalien force-pushed the uniprot-refactor-v2 branch 3 times, most recently from 2a781b3 to bba5e5a Compare December 10, 2025 22:17
@alinakbase alinakbase force-pushed the uniprot-refactor-v2 branch 2 times, most recently from ec65f68 to bfbf335 Compare December 22, 2025 23:37
if os.path.exists(tmp_path):
try:
os.remove(tmp_path)
except Exception:
@codecov
Copy link

codecov bot commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 56.22407% with 211 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.92%. Comparing base (b3d8269) to head (72fa98f).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
src/cdm_data_loader_utils/parsers/uniprot.py 48.75% 144 Missing ⚠️
src/cdm_data_loader_utils/parsers/uniref.py 60.00% 54 Missing ⚠️
src/cdm_data_loader_utils/parsers/xml_utils.py 80.35% 11 Missing ⚠️
...data_loader_utils/parsers/gene_association_file.py 0.00% 1 Missing ⚠️
...dm_data_loader_utils/parsers/shared_identifiers.py 88.88% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #45      +/-   ##
===========================================
+ Coverage    43.78%   44.92%   +1.13%     
===========================================
  Files           42       44       +2     
  Lines         2592     2767     +175     
===========================================
+ Hits          1135     1243     +108     
- Misses        1457     1524      +67     
Files with missing lines Coverage Δ
...data_loader_utils/parsers/gene_association_file.py 62.06% <0.00%> (ø)
...dm_data_loader_utils/parsers/shared_identifiers.py 88.88% <88.88%> (ø)
src/cdm_data_loader_utils/parsers/xml_utils.py 80.35% <80.35%> (ø)
src/cdm_data_loader_utils/parsers/uniref.py 55.18% <60.00%> (+13.47%) ⬆️
src/cdm_data_loader_utils/parsers/uniprot.py 49.46% <48.75%> (-7.73%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dd0c28e...72fa98f. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants