build(medcat and medcat-den): CU-869ddh1jv Avoid test resources in releases#503
Draft
mart-r wants to merge 9 commits into
Draft
build(medcat and medcat-den): CU-869ddh1jv Avoid test resources in releases#503mart-r wants to merge 9 commits into
mart-r wants to merge 9 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The underlying issue
medcat-densource distribution are pushed to TestPyPI on every commit. And because they include test-time resources (test / fake models) they are rather large (~32MB). Over time this has meant we've reached PyPI's per project storage limit of 10GB. So now, because of this,medcat-denworkflows on themainbranch are failing because TestPyPI uploads are failing.Caveats to consider
The idea of packaging your tests (along with the resources required to run them) is quite common for source distributions. In fact, the default behaviour seems to be to include everything that is tracked by
git. There are a number of ways to get around this (i.e removing the files before building, pruning inMANIFEST.in), but they seem to be counter to the open source principles or not really following modern package building standards.The proposed plan
In order to make this a viable option, I plan to store test time models centrally to the repo. This means that they won't be included in the builds since they're outside the scope of the source. But it also has the added benefit of allowing us to reused the same test models across multiple projects within the repo (e.g
medcatandmedcat-den, but why notmedcat-serviceas well). On top of that there needs to be a way to access these files from a source distribution. And because that now doesn't include these test-time resources, they need to be fetached. The plan usespoochto do the fetching from the relevant version on GitHub, but the logic defaults to local files if available. This will involve including these files in relevant releases as well. On the way there we also need to make some changes on the exact paths that are used to interact with these models in the test suite (but that shouldn't be extensive).This is the plan: