Skip to content

Conversation

@chinyeungli
Copy link
Contributor

@chinyeungli chinyeungli commented Dec 10, 2025

Ref #630

  • Updated the mine_maven.py pipeline and maven_crawler.py
  • Added Maven Root/Base URLs
  • Updated the regular expressions used to collect links and artifact timestamps in maven_crawler.py
  • Updated the parameter needed for get_classifier_from_artifact_url()

Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Copy link
Member

@JonoYang JonoYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I left some comments about creating test files


def test_collect_links_and_artifact_timestamps_repo_maven_apache_org(self):
# https://repo.maven.apache.org/maven2
text = """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The html code should be in individual test files, then we use them like this https://github.com/aboutcode-org/purldb/blob/main/packagedb/tests/test_api.py#L220

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved html code to individual test files.

return f"""\
Collect PackageURLs from Maven ({commit_batch}/{total_commit_batch})
Tool: {tool_name}@v{VERSION}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use settings.PURLDB_VERSION here

Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
Signed-off-by: Chin Yeung Li <tli@nexb.com>
@chinyeungli
Copy link
Contributor Author

@keshav-space I noticed you’ve already done some refactoring work. The changes in this PR conflict with that effort and can’t be merged directly into main. If you’re available, it would be great if you could handle the refactor again since you have a stronger grasp of the logic. Otherwise, I’m happy to take it on and will do my best to align with the approach you established.

@chinyeungli
Copy link
Contributor Author

Closing this as a new PR for the same issue is created at #805

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants