-
Notifications
You must be signed in to change notification settings - Fork 29
Fix notebooks generation flow #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Great to tighten up the workflow and coupling, but from the diffs it seems that a lot of spelling and grammar mistakes are reintroduced to the notebooks. @mieczyslaw, which did you spellcheck, the .py or the .ipynb? I have a suspicion that we are syncing the wrong way and overwriting your fixes with this pull-request? |
|
Oh, right, I didn't think about that and was hoping that whichever is the newer is a good one (but jupytext behaviour is somewhat still a surprise for me). I suspect @mieczyslaw fixed the .ipynb files, then we should pull them from main and regenerate *.py |
|
If I remember correctly, I corrected everything I found re speelchecks (you would see that in git logs of my merged branch). In my recent branch I wanted to fix import which seemed to be broken (but I suppose ipynb needs to be regenerated from py, basically worth checking in which of two the import is correct after sanitization in class name change) plus I wanted to change the capitalization of one of the notebooks for consistency. With my branch I am happy to reset to the main after this PR is merged (when all works as expected) and do the required changes again. I suppose I am a bit confused about flow - do people write *py files and generate notebooks from them, or other way round? Do we support both ways, or best to keep it one way. |
|
I don't see in this PR any update to dev section of contributing readme - possibly great to update it to the flow for regeneration is clear. |
|
I looked in the git history of my commit and I corrected both py and ipynb files, but not all ipynb unfortuanately - I trusted PyCharm check. Do you want me to look back into changes to spelling/grammar in Py and make sure they are also present in notebooks, so then we can regenerate py from notebooks using the updated machinery in this PR? I think we need to decide whether py or ipynb is the source of truth. |
|
I think we may need a mechanism to sync both ways. I find it easier to write the notebooks as notebooks, but I think the .py files should be the official source as it's easier to track genuine changes to the code there. Thus a command to sync from notebook to .py on a single file basis would be handy, whereas the sync from .py to notebooks and rerunning all could work on all files? |
|
Definitely we should add some notes in the dev doc for us self to remember. |
|
It would be great to create those processes as nox (https://nox.thea.codes/en/stable/) sessions (we can add later on tests as well plus we can generate py from ipynb and check diff in the pipelines). This way we would have a run in a separate env (to avoid local additions) and trackable changes to py <-> ipynb generation. Happy to help with setting those up. |
670e282 to
f3fd021
Compare
|
Update, the issue with the previous notebooks was the fact that new versions of jupytext preserves old pairs, after manual recreation it is probably fine. I used .py files as a source of truth for the new pairs Running Only issue is - notebook 13 looks broken - not sure what is happening and my brain activity is on zero for now |
|
Great! I'll find some time in the weekend and I can also look into notebook 13 issue then. |
|
I think it looks good, and the fix of notebook 13 follows shortly. |
| "source": [ | ||
| "# Molecule standardization\n", | ||
| "When building machine learning models of molecules, it is important to standardize the molecules. We often don't want different predictions just because things are drawn in slightly different forms, such as protonated or deprotonated carboxylic acids. Scikit-mol provides a very basic standardize transformer based on the molvs implementation in RDKit" | ||
| "When building machine learning models of molecules, it is important to standardize the molecules. We often don't want different predictions just because things are drawn in slightly different forms, such as protonated or deprotanted carboxylic acids. Scikit-mol provides a very basic standardize transformer based on the molvs implementation in RDKit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something to update as a small fix later on
| print(f"Test score is :{pipe.score(mol_list_test, y_test):0.2F}") | ||
| # %% [markdown] | ||
| # Nevermind the performance, or the exact value of the prediction, this is for demonstration purposes. We can easily predict on lists of molecules | ||
| # Nevermind the performance, or the exact value of the prediction, this is for demonstration purpures. We can easily predict on lists of molecules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something to fix later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, did we reintroduce a lot of the type fixes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like only the two I found, so easy to fix with the future PR, probably not worth creating a separate PR for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, great improvement @asiomchen
So the notebook pairs were broken in many ways and this PR aims to solve that
Makefilecommand fixed to works with multifolder setup (somehowjupytextdoes not respect global config frompyproject.toml)pre-commitadded - it check that all the pairs exist and are in sync automatically.gitignore- we need them for docsAnd additionally typing issue in
parallel,pyis fixed