Skip to content

Feat: Add pre-commit hook that strips notebooks of metadata#1519

Open
haroon0x wants to merge 4 commits intogoogle-deepmind:mainfrom
haroon0x:feat/pre-commit
Open

Feat: Add pre-commit hook that strips notebooks of metadata#1519
haroon0x wants to merge 4 commits intogoogle-deepmind:mainfrom
haroon0x:feat/pre-commit

Conversation

@haroon0x
Copy link
Copy Markdown

@haroon0x haroon0x commented Dec 3, 2025

Fixes #1372

Summary

This PR adds a pre-commit hook using nbstripout to automatically strip metadata from Jupyter notebooks before commits. This keeps diffs clean and readable by removing execution counts, cell outputs, and random cell IDs.

Changes Made

1. Pre-commit Configuration (.pre-commit-config.yaml)

2. Documentation (docs/development.md)

  • Added new "Pre-commit Hooks" section explaining:
    • How to install and set up pre-commit hooks
    • What each hook does (including nbstripout)

3. Cleaned Notebooks

  • Stripped metadata from all 19 existing notebooks in the repository:

What nbstripout Does

The hook automatically removes:

  • Execution counts: "execution_count": 1"execution_count": null
  • Cell outputs: All printed results, plots, and output data
  • Cell IDs: Random IDs like "GDUs0Q8gPYes" → Sequential "0", "1", "2"...

@rdyro
Copy link
Copy Markdown
Collaborator

rdyro commented Dec 3, 2025

Thanks!

Is there a way to do this without reformatting the notebooks? I think this PR reindents all notebook files.

Can you squash to a single commit please?

@haroon0x
Copy link
Copy Markdown
Author

haroon0x commented Dec 3, 2025

@rdyro I’ve removed the notebook formatting and squashed the commit.

@haroon0x
Copy link
Copy Markdown
Author

haroon0x commented Dec 7, 2025

@rdyro i don't see any other way to reformat the notebooks without including it any pr.

@rdyro
Copy link
Copy Markdown
Collaborator

rdyro commented Dec 8, 2025

Hmmm, I see this hook removes figures, we definitely want to keep those since they are part of the notebook examples when displayed in the docs.

Can we do this hook without removing cell outputs themselves (just metadata)?

@haroon0x
Copy link
Copy Markdown
Author

haroon0x commented Dec 9, 2025

@rdyro do you want me to include the reformatting of the notebooks in this pr ?

@rdyro
Copy link
Copy Markdown
Collaborator

rdyro commented Dec 9, 2025

I'm trying to review what this hook is actually suggesting. It's still trying to delete output images, I think?

@haroon0x
Copy link
Copy Markdown
Author

haroon0x commented Dec 9, 2025

In the git diff i see the images. what makes you think that the output images are still being removed ?
scrennshot

@rdyro
Copy link
Copy Markdown
Collaborator

rdyro commented Dec 9, 2025

I'm looking at the CI, it seems the hook still wants to remove the images before passing.

You're right that you're not modifying them now.

@rdyro
Copy link
Copy Markdown
Collaborator

rdyro commented Dec 9, 2025

Maybe I'm misreading this. That's a big diff it seems, but this PR is a good idea, so once the CI passes, I'll be happy to merge!

@haroon0x
Copy link
Copy Markdown
Author

@rdyro ci is passing now.

@haroon0x
Copy link
Copy Markdown
Author

@rdyro any changes ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add pre-commit hook that strips notebooks of metadata

2 participants