Skip to content

review diff_parser

aakash-anko edited this page May 29, 2026 · 2 revisions

review / diff_parser.py

Runs git diff and parses raw unified diff text into structured DiffFile objects using the unidiff library.


Key Concepts

Term Definition Example
diff The set of changes between two versions of code, showing added (+) and removed (-) lines. - old_line\n+ new_line shows old_line was replaced with new_line.
hunk A contiguous block of changes within a diff. One diff can contain multiple hunks (changes in different parts of a file). A diff might have hunk 1 (lines 10-15 changed) and hunk 2 (lines 80-85 changed).

get_diff — line 8

Runs git diff as a subprocess and returns the raw unified diff text as a string.

Parameters

Param Type Default Purpose
staged bool False If True, diffs only staged changes
target_branch str | None None If set, diffs against that branch
commit str | None None If set, shows diff for that specific commit (SHA or ref)
repo_path str | None None Working directory for the git command

Priority: commit > target_branch > staged > unstaged (default).

Example 1: Unstaged diff

Input: staged=False, target_branch=None, commit=None, repo_path="/home/user/myapp"

Line 9: cmd = ["git", "diff", "--unified=5"]["git", "diff", "--unified=5"] Line 10–11: commit is None → skip Line 12–13: staged is False → skip --staged Line 14–15: target_branch is None → skip branch arg Line 17–22: subprocess.run(["git", "diff", "--unified=5"], cwd="/home/user/myapp") runs in that directory Line 24: returns result.stdout → raw diff string like "diff --git a/foo.py b/foo.py\n..."

Example 2: Staged diff against main

Input: staged=True, target_branch="main", commit=None, repo_path=None

Line 9: cmd = ["git", "diff", "--unified=5"] Line 12–13: target_branch is "main"cmd.append("main...HEAD")["git", "diff", "--unified=5", "main...HEAD"] Note: target_branch takes priority over staged since it comes first in the elif chain. Line 17–22: runs in current directory (cwd=None) Line 24: returns stdout

Example 3: Review a specific commit

Input: commit="abc1234", staged=True, target_branch="main", repo_path="/project"

Line 10–11: commit is "abc1234"cmd = ["git", "diff", "--unified=5", "abc1234~1", "abc1234"] Note: commit takes highest priority — staged and target_branch are ignored. Line 17–22: subprocess.run(cmd, cwd="/project") Line 24: returns the diff showing what that commit changed (parent → commit)


get_parsed_diff — line 25

Parses a raw unified diff string into a list of DiffFile objects using the unidiff.PatchSet parser.

Parameters

Param Type Purpose
diff_text str Raw unified diff text from git diff

Example

Input:

diff_text = """diff --git a/math.py b/math.py
--- a/math.py
+++ b/math.py
@@ -10,3 +10,3 @@
 def add(a, b):
-    return a - b
+    return a + b
"""

Walkthrough:

Line 28–29: diff_text.strip() is non-empty → proceed

Line 31: patch = PatchSet(diff_text) → parses into a PatchSet with 1 PatchedFile

Line 32: diff_files = []

Loop — patched_file = first (and only) PatchedFile for math.py:

Line 35: hunks = []

Loop — hunk = first hunk (the @@ block):

Line 37: lines = []

Loop — line = first line (def add(a, b):):

Line 38–40: line.is_addedFalse, line.is_removedFalse Line 44–45: → change_type = "context", line_no = 10 (target_line_no) Line 47–50: appends ChangedLine(line_number=10, content="def add(a, b):\n", change_type="context")

Loop — line = second line (- return a - b):

Line 41–43: line.is_removedTruechange_type = "removed", line_no = 11 (source_line_no) Line 47–50: appends ChangedLine(line_number=11, content=" return a - b\n", change_type="removed")

Loop — line = third line (+ return a + b):

Line 38–40: line.is_addedTruechange_type = "added", line_no = 11 (target_line_no) Line 47–50: appends ChangedLine(line_number=11, content=" return a + b\n", change_type="added")

Line 52–55: appends DiffHunk(start_line=10, end_line=13, lines=[...3 ChangedLines...])

Line 57–65: appends DiffFile(...):

DiffFile(
    file_path="math.py",
    language=detect_language(Path("math.py")),   →  "python"
    hunks=[DiffHunk(start_line=10, end_line=13, lines=[...])],
    is_new_file=False,
    is_deleted=False,
    added_lines=1,
    removed_lines=1,
)

Return: [DiffFile(file_path="math.py", language="python", added_lines=1, removed_lines=1, ...)]

Clone this wiki locally