Skip to content

Conversation

@candleindark
Copy link
Member

@candleindark candleindark commented Dec 1, 2025

This PR closes #1752. It removes any hardcoded vendor specific information that are now made available through a Config object defined in dandischema. Additionally, it includes the changes in GH workflow in #1771 to verify the correctness of the removal of the vendor specific information.

Specifically, this PR replaces any use of the "DANDI:" string that is associated with the DANDI Archive instance with vendor agnostic (or DANDI instance agnostic) setup.

TODOs:

Notes:

  1. Hardcoded vendor specific info in dandi service-scripts publish-dandiset-version-doi is not replaced/removed in this PR. It will be part of Make the dandi service-scripts publish-dandiset-version-doi command DANDI instance specific, incorporating vendor information of a particular DANDI instance #1704 instead.
  2. The remaining test failures on tests with marker "obolibrary" is unrelated to this PR and are documented in Failures in tests with marker obolibrary #1769.

Release Notes

Removed hardcoded vendor specific info so that dandi-cli can now connect to different DANDI instances with different vendor specific info.

@candleindark candleindark added vendoring patch Increment the patch version when merged labels Dec 1, 2025
@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

❌ Patch coverage is 85.29412% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.07%. Comparing base (3501190) to head (997761c).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
dandi/utils.py 78.57% 3 Missing ⚠️
dandi/cli/cmd_service_scripts.py 71.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1760      +/-   ##
==========================================
+ Coverage   75.03%   75.07%   +0.04%     
==========================================
  Files          84       84              
  Lines       11889    11910      +21     
==========================================
+ Hits         8921     8942      +21     
  Misses       2968     2968              
Flag Coverage Δ
unittests 75.07% <85.29%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@candleindark
Copy link
Member Author

@yarikoptic Is

dandi-cli/dandi/dandiset.py

Lines 131 to 139 in 7cf6ace

# New formalized model, but see below DANDI: way
# TODO: add schemaVersion handling but only after we have them provided
# in all metadata records from dandi-api server
if id_.get("propertyID") != "DANDI":
raise ValueError(
f"Got following identifier record when was expecting a record "
f"with 'propertyID: DANDI': {id_}"
)
id_ = str(id_.get("value", ""))
ever relevant? Has the identifier of the Dandiset model ever be a dictionary?

@yarikoptic
Copy link
Member

@yarikoptic Is

dandi-cli/dandi/dandiset.py

Lines 131 to 139 in 7cf6ace

# New formalized model, but see below DANDI: way
# TODO: add schemaVersion handling but only after we have them provided
# in all metadata records from dandi-api server
if id_.get("propertyID") != "DANDI":
raise ValueError(
f"Got following identifier record when was expecting a record "
f"with 'propertyID: DANDI': {id_}"
)
id_ = str(id_.get("value", ""))

ever relevant? Has the identifier of the Dandiset model ever be a dictionary?

running tests did not step that way and originally added in 9006f96 which lead to dandi/dandi-archive#63 (comment) which shows that indeed it something which was changed on dandi-archive side awhile back. Thus we can safely remove this block IMHO.

So that the checks of `id` and `identifier`
in metadata allow different instance names
So that the check of `id` in metadata allows
different instance names
@yarikoptic
Copy link
Member

should it be taken out of the draft?

@candleindark candleindark mentioned this pull request Dec 8, 2025
1 task
@candleindark candleindark marked this pull request as ready for review December 8, 2025 18:51
@candleindark candleindark marked this pull request as draft December 8, 2025 18:52
@candleindark candleindark marked this pull request as ready for review December 8, 2025 18:52
@candleindark candleindark marked this pull request as draft December 8, 2025 18:53
@candleindark
Copy link
Member Author

candleindark commented Dec 8, 2025

should it be taken out of the draft?

It is a base PR that I will merge other PRs into. I will seek approval from you to merge other PRs into this one. Each of the other PRs warrant a judgment. An example of this those PRs is #1767.

To reflect that instance names are no longer
restricted to "DANDI"
@candleindark
Copy link
Member Author

candleindark commented Dec 9, 2025

I think we can leave the "DANDI:" string in the following lines in place. It is specific to how datasets in the DANDI Archive are published on identifiers.org.

(
re.compile(
rf"https?://identifiers\.org/DANDI:{DANDISET_ID_REGEX}"
rf"(?:/{PUBLISHED_VERSION_REGEX})?",
flags=re.I,
),
{"handle_redirect": "pass"},
"https://identifiers.org/DANDI:<dandiset id>[/<version id>]"
" (<version id> cannot be 'draft')",
),

@yarikoptic Let me know if I am wrong.

…from_doi`

This is only a replacement for the code that strips
the "DANDI:" prefix. The replacement works for dandisets
of different DANDI instances. However, `update_dandiset_from_doi`
is far from robust. This replacement doesn't fix underlying
weakness stems from assumptions. Further improvements of
`update_dandiset_from_doi` are needed.
@candleindark
Copy link
Member Author

I think that keys such as "dandi:dandi-etag", "dandi-etag", and dandi:sha2-256 are not DANDI instance specific and used across different DANDI instances.

For example,

dandi-cli/dandi/download.py

Lines 317 to 324 in f5b8bc3

if "dandi:dandi-etag" in d:
digests = {"dandi-etag": d["dandi:dandi-etag"]}
else:
raise RuntimeError(
f"dandi-etag not available for asset. Known digests: {d}"
)
try:
digests["sha256"] = d["dandi:sha2-256"]

@yarikoptic Let me know if I am wrong.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes hardcoded vendor-specific information (particularly "DANDI:" prefix references) from the codebase and replaces them with vendor-agnostic implementations using configuration from dandischema. The changes allow the CLI to work with different DANDI instances without hardcoded dependencies on the DANDI Archive instance.

Key changes:

  • Replaced hardcoded "DANDI:" string checks with dynamic pattern matching using ID_PATTERN from dandischema
  • Removed vendor-specific validation logic from identifier parsing
  • Updated prefix stripping to handle any known instance name instead of just "DANDI:"

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
dandi/tests/test_download.py Updated assertion to use pattern matching instead of hardcoded "DANDI:" prefix
dandi/dandiset.py Removed hardcoded validation for "DANDI" propertyID in identifier records
dandi/dandiarchive.py Updated comment to be vendor-agnostic
dandi/cli/tests/test_service_scripts.py Changed assertions to use pattern matching with ID_PATTERN
dandi/cli/cmd_service_scripts.py Generalized prefix stripping to handle any known instance name

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@candleindark candleindark marked this pull request as ready for review December 10, 2025 07:43
* feat: reimplement `dandi.utils.is_url`

So that it can determine whether a given
string is a standard HTTP, HTTPS, FTP URL,
or DANDI URL more precisely. Additionally, this
solution supports DANDI URL of different DANDI
archive instances.

* test: Add tests for `dandi.utils.is_url`
@yarikoptic
Copy link
Member

Yes, both identifiers and checksum are ok to remain dandi. Note that some test run is failing... Otherwise this is ready?

@candleindark
Copy link
Member Author

Yes, both identifiers and checksum are ok to remain dandi. Note that some test run is failing... Otherwise this is ready?

Yes, this is ready. The failure in test is unrelated this PR since it appears in #1768 as well. I am looking into the cause of the failure now.

Configuer CI to test against a vendor specific DANDI API
other than the default instance of DANDI API
@yarikoptic yarikoptic merged commit 78b7700 into master Dec 15, 2025
37 of 38 checks passed
@yarikoptic yarikoptic deleted the remove-hardcode branch December 15, 2025 22:46
@github-actions
Copy link

🚀 PR was released in 0.74.0 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Increment the patch version when merged released vendoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Review operation and codebase for further hardcoded DANDI identifiers

3 participants