Skip to content

Conversation

@asiomchen
Copy link
Collaborator

So the notebook pairs were broken in many ways and this PR aims to solve that

  1. Makefile command fixed to works with multifolder setup (somehow jupytext does not respect global config from pyproject.toml )
  2. Notebook were formatted with ruff and reexecuted
  3. New pre-commit added - it check that all the pairs exist and are in sync automatically
  4. Notebook are excluded from .gitignore - we need them for docs

And additionally typing issue in parallel,py is fixed

@asiomchen asiomchen requested a review from EBjerrum as a code owner May 5, 2025 20:32
@EBjerrum
Copy link
Owner

EBjerrum commented May 6, 2025

Great to tighten up the workflow and coupling, but from the diffs it seems that a lot of spelling and grammar mistakes are reintroduced to the notebooks. @mieczyslaw, which did you spellcheck, the .py or the .ipynb? I have a suspicion that we are syncing the wrong way and overwriting your fixes with this pull-request?

@asiomchen
Copy link
Collaborator Author

Oh, right, I didn't think about that and was hoping that whichever is the newer is a good one (but jupytext behaviour is somewhat still a surprise for me).

I suspect @mieczyslaw fixed the .ipynb files, then we should pull them from main and regenerate *.py

@mieczyslaw
Copy link
Contributor

If I remember correctly, I corrected everything I found re speelchecks (you would see that in git logs of my merged branch). In my recent branch I wanted to fix import which seemed to be broken (but I suppose ipynb needs to be regenerated from py, basically worth checking in which of two the import is correct after sanitization in class name change) plus I wanted to change the capitalization of one of the notebooks for consistency. With my branch I am happy to reset to the main after this PR is merged (when all works as expected) and do the required changes again. I suppose I am a bit confused about flow - do people write *py files and generate notebooks from them, or other way round? Do we support both ways, or best to keep it one way.

@mieczyslaw
Copy link
Contributor

I don't see in this PR any update to dev section of contributing readme - possibly great to update it to the flow for regeneration is clear.

@mieczyslaw
Copy link
Contributor

I looked in the git history of my commit and I corrected both py and ipynb files, but not all ipynb unfortuanately - I trusted PyCharm check. Do you want me to look back into changes to spelling/grammar in Py and make sure they are also present in notebooks, so then we can regenerate py from notebooks using the updated machinery in this PR? I think we need to decide whether py or ipynb is the source of truth.

@EBjerrum
Copy link
Owner

EBjerrum commented May 8, 2025

I think we may need a mechanism to sync both ways. I find it easier to write the notebooks as notebooks, but I think the .py files should be the official source as it's easier to track genuine changes to the code there. Thus a command to sync from notebook to .py on a single file basis would be handy, whereas the sync from .py to notebooks and rerunning all could work on all files?

@EBjerrum
Copy link
Owner

EBjerrum commented May 8, 2025

Definitely we should add some notes in the dev doc for us self to remember.

@mieczyslaw
Copy link
Contributor

It would be great to create those processes as nox (https://nox.thea.codes/en/stable/) sessions (we can add later on tests as well plus we can generate py from ipynb and check diff in the pipelines). This way we would have a run in a separate env (to avoid local additions) and trackable changes to py <-> ipynb generation. Happy to help with setting those up.

@asiomchen asiomchen force-pushed the bugfix/notebook-pairs branch from 670e282 to f3fd021 Compare May 8, 2025 16:30
@asiomchen
Copy link
Collaborator Author

asiomchen commented May 8, 2025

Update, the issue with the previous notebooks was the fact that new versions of jupytext preserves old pairs, after manual recreation it is probably fine. I used .py files as a source of truth for the new pairs

Running make sync-notebooks updates the pairs and if the new notebook is created, then the same command will create corresponding .py file. @EBjerrum this command also works both from notebook to script and vice versa.

Only issue is - notebook 13 looks broken - not sure what is happening and my brain activity is on zero for now

@EBjerrum
Copy link
Owner

EBjerrum commented May 9, 2025

Great! I'll find some time in the weekend and I can also look into notebook 13 issue then.

@EBjerrum
Copy link
Owner

I think it looks good, and the fix of notebook 13 follows shortly.

@EBjerrum EBjerrum merged commit fe0e97d into EBjerrum:main May 11, 2025
10 checks passed
"source": [
"# Molecule standardization\n",
"When building machine learning models of molecules, it is important to standardize the molecules. We often don't want different predictions just because things are drawn in slightly different forms, such as protonated or deprotonated carboxylic acids. Scikit-mol provides a very basic standardize transformer based on the molvs implementation in RDKit"
"When building machine learning models of molecules, it is important to standardize the molecules. We often don't want different predictions just because things are drawn in slightly different forms, such as protonated or deprotanted carboxylic acids. Scikit-mol provides a very basic standardize transformer based on the molvs implementation in RDKit"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something to update as a small fix later on

print(f"Test score is :{pipe.score(mol_list_test, y_test):0.2F}")
# %% [markdown]
# Nevermind the performance, or the exact value of the prediction, this is for demonstration purposes. We can easily predict on lists of molecules
# Nevermind the performance, or the exact value of the prediction, this is for demonstration purpures. We can easily predict on lists of molecules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something to fix later on

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, did we reintroduce a lot of the type fixes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like only the two I found, so easy to fix with the future PR, probably not worth creating a separate PR for that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, great improvement @asiomchen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants