Skip to content

Conversation

@J535D165
Copy link
Collaborator

Usually, I'm not a big fan of solutions like this, but given the importance of performance, I think this pragmatic solution can be acceptable. I utilize large, real-world datasets to benchmark the parser's performance and frequently need to switch between branches.

Btw, I'm making nice progress on the PubMed parsing PR, but there are still some open challenges. Performance is one of them.

@J535D165 J535D165 requested a review from shapiromatron May 23, 2025 22:13
@shapiromatron shapiromatron self-assigned this May 28, 2025
Copy link
Collaborator

@shapiromatron shapiromatron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an ok solution, though I wonder if it'd be better to see if a synthetic benchmark dataset could be generated instead of hidden data that can only run by one of our repository collaborators.

# created from tests
export.ris

# extra benchmark data only for internal use (because of copyright)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance we could create some synthetic data using something like faker? https://github.com/joke2k/faker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants