feat: add Jupyter and Databricks notebook parsing support by michael-denyer · Pull Request #69 · tirth8205/code-review-graph

michael-denyer · 2026-03-26T16:45:55Z

Summary

Add .ipynb (Jupyter/Databricks) notebook parsing — extracts functions, classes, imports, and calls from code cells across Python, SQL, R, and Scala kernels
Add Databricks .py notebook export parsing — detects # Databricks notebook source header and splits on # COMMAND ---------- markers
Extract SQL table references (FROM, JOIN, INTO, CREATE TABLE/VIEW) as import edges for cross-language lineage
Shared _parse_notebook_cells method handles multi-language cell dispatch with per-cell line offset tracking

Test plan

Jupyter .ipynb parsing with Python kernel cells
Databricks multi-language .ipynb with %python, %sql, %r, %scala magic commands
Databricks .py export format parsing
SQL table regex extraction tests
R-kernel notebook cells (xfail pending R language PR feat: add R language parsing support #43)
Edge cases: empty notebooks, non-code cells, malformed JSON

Extract code cells from .ipynb files, filter magic/shell commands, concatenate with offset tracking, and parse as Python via tree-sitter. Supports: - Python kernel detection (phase 1) - Magic command filtering (%pip, !ls) - Cell index tracking in node.extra["cell_index"] - Cross-cell function calls and imports - Edge cases: empty notebooks, non-Python kernels, malformed JSON Includes test fixture and 12 tests in TestNotebookParsing.

Split _parse_notebook into two methods: - _parse_notebook: extracts cells from .ipynb JSON, builds list[CellInfo], delegates to _parse_notebook_cells - _parse_notebook_cells: shared method that parses cells grouped by language (Python/R via Tree-sitter, SQL via regex) Also expands supported notebook languages from Python-only to Python and R. Updates test_non_python_kernel to use an actually unsupported language (Scala) since R is now supported.

Detect and parse Databricks-exported .py notebooks (identified by the '# Databricks notebook source' header). Splits on COMMAND delimiters, classifies cells by MAGIC prefix (%sql, %r, %md, %sh), and delegates to the existing _parse_notebook_cells shared method. SQL table refs, Python functions, cross-cell calls, and cell_index tracking all work.

michael-denyer added 10 commits March 27, 2026 10:35

docs: add Databricks notebook support spec and implementation plan

c36dc5a

feat(parser): add CellInfo and SQL table regex

a9f937e

test: add SQL table regex extraction tests

061c5fa

test: add Databricks multi-language .ipynb tests

5255d96

test: add R-kernel notebook and edge case tests

b9a49a7

test: mark R parser tests as xfail pending PR tirth8205#43

bc6cda8

fix: address ruff import sorting issues

2da0a49

michael-denyer force-pushed the feat/notebook-support branch from ace72bb to 2da0a49 Compare March 27, 2026 10:35

Merge remote-tracking branch 'origin/main' into feat/notebook-support

4ace7de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Jupyter and Databricks notebook parsing support#69

feat: add Jupyter and Databricks notebook parsing support#69
michael-denyer wants to merge 11 commits intotirth8205:mainfrom
michael-denyer:feat/notebook-support

michael-denyer commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michael-denyer commented Mar 26, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant