feat: add Jupyter and Databricks notebook parsing support#69
Open
michael-denyer wants to merge 11 commits intotirth8205:mainfrom
Open
feat: add Jupyter and Databricks notebook parsing support#69michael-denyer wants to merge 11 commits intotirth8205:mainfrom
michael-denyer wants to merge 11 commits intotirth8205:mainfrom
Conversation
Extract code cells from .ipynb files, filter magic/shell commands, concatenate with offset tracking, and parse as Python via tree-sitter. Supports: - Python kernel detection (phase 1) - Magic command filtering (%pip, !ls) - Cell index tracking in node.extra["cell_index"] - Cross-cell function calls and imports - Edge cases: empty notebooks, non-Python kernels, malformed JSON Includes test fixture and 12 tests in TestNotebookParsing.
Split _parse_notebook into two methods: - _parse_notebook: extracts cells from .ipynb JSON, builds list[CellInfo], delegates to _parse_notebook_cells - _parse_notebook_cells: shared method that parses cells grouped by language (Python/R via Tree-sitter, SQL via regex) Also expands supported notebook languages from Python-only to Python and R. Updates test_non_python_kernel to use an actually unsupported language (Scala) since R is now supported.
Detect and parse Databricks-exported .py notebooks (identified by the '# Databricks notebook source' header). Splits on COMMAND delimiters, classifies cells by MAGIC prefix (%sql, %r, %md, %sh), and delegates to the existing _parse_notebook_cells shared method. SQL table refs, Python functions, cross-cell calls, and cell_index tracking all work.
ace72bb to
2da0a49
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
.ipynb(Jupyter/Databricks) notebook parsing — extracts functions, classes, imports, and calls from code cells across Python, SQL, R, and Scala kernels.pynotebook export parsing — detects# Databricks notebook sourceheader and splits on# COMMAND ----------markers_parse_notebook_cellsmethod handles multi-language cell dispatch with per-cell line offset trackingTest plan
.ipynbparsing with Python kernel cells.ipynbwith%python,%sql,%r,%scalamagic commands.pyexport format parsing