Skip to content

Fix: duplicate_graph defined at module level instead of as ParsingService method#660

Open
arnavp27 wants to merge 3 commits intopotpie-ai:mainfrom
arnavp27:fix/duplicate-graph-indentation
Open

Fix: duplicate_graph defined at module level instead of as ParsingService method#660
arnavp27 wants to merge 3 commits intopotpie-ai:mainfrom
arnavp27:fix/duplicate-graph-indentation

Conversation

@arnavp27
Copy link
Copy Markdown

@arnavp27 arnavp27 commented Feb 27, 2026

Summary

Fixes a silent structural bug where duplicate_graph was accidentally defined
as a module-level function instead of as a method of the ParsingService class,
making it completely unreachable through normal usage.

Root Cause

In app/modules/parsing/graph_construction/parsing_service.py, the
duplicate_graph function was defined at column 0 (no indentation), placing
it outside the ParsingService class body. Every other method in the class is
correctly indented at 4 spaces.

# BEFORE (broken) — defined at module level, outside the class
async def duplicate_graph(self, old_repo_id: str, new_repo_id: str):
    await self.search_service.clone_search_indices(old_repo_id, new_repo_id)
    ...

# AFTER (fixed) — correctly indented as a class method
    async def duplicate_graph(self, old_repo_id: str, new_repo_id: str):
        await self.search_service.clone_search_indices(old_repo_id, new_repo_id)
        ...

What Breaks at Runtime

Two failure modes result from this bug:

  1. AttributeError on the class - Any caller doing
    service.duplicate_graph(old_id, new_id) on a ParsingService instance
    would immediately get:

AttributeError: 'ParsingService' object has no attribute 'duplicate_graph'

because the method simply does not exist on the class.

  1. AttributeError inside the function — If somehow called directly as a
    standalone function (e.g. duplicate_graph(some_obj, old_id, new_id)), both
    self.search_service and self.inference_service would raise AttributeError
    unless the caller manually passed a ParsingService instance as the first
    argument — which is not how it was ever intended to be called.

The function references self.search_service (line 622) and
self.inference_service (lines 627, 670) — both are instance attributes set in
ParsingService.__init__ - confirming it was always intended to be a class method.

Fix

Re-indented the entire duplicate_graph function body (lines 621–711) by 4
spaces so it is correctly nested inside the ParsingService class.

Test Added

Added a regression test in tests/unit/parsing/test_parsing_service_method.py
that directly proves the bug and guards against regression:

def test_duplicate_graph_is_a_method_of_parsing_service(): assert hasattr(ParsingService, "duplicate_graph"), ( "duplicate_graph is not a method of ParsingService. " "It is defined at module level due to missing indentation." )

This test fails on the original code and passes after the fix. It requires
no external dependencies (no database, no Neo4j, no mocks) — it purely validates
class structure.

Files Changed

  • app/modules/parsing/graph_construction/parsing_service.py - indentation fix
  • tests/unit/parsing/test_parsing_service_method.py - new regression test

Summary by CodeRabbit

  • Bug Fixes

    • Improved reliability of repository graph duplication with robust error propagation and clearer diagnostics for failures.
  • Tests

    • Added regression tests to verify graph duplication behavior and prevent regressions.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9d7256 and 38ffc8f.

📒 Files selected for processing (1)
  • app/modules/parsing/graph_construction/parsing_service.py

Walkthrough

The top-level duplicate_graph function was moved into ParsingService as an async method duplicate_graph(self, old_repo_id, new_repo_id), preserves batched node/relationship copying and now clones search indices first; errors now raise ParsingServiceError. A unit test ensures the method exists on the class.

Changes

Cohort / File(s) Summary
ParsingService Refactor
app/modules/parsing/graph_construction/parsing_service.py
Moved duplicate_graph from module-level to ParsingService.async method; awaits clone_search_indices first; performs batched node and relationship duplication using separate driver sessions; on exception logs and raises ParsingServiceError.
Test Addition
tests/unit/parsing/test_parsing_service_method.py
Added regression test asserting ParsingService exposes a duplicate_graph attribute (method).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐇 I hopped through code from field to tree,
A function found a class to be.
Async whiskers, batches tight,
Errors wrapped and logs alight.
Together now—repo twins gleam bright.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main structural fix: moving duplicate_graph from module-level into the ParsingService class where it belongs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
app/modules/intelligence/tools/code_changes_manager.py (1)

5395-5402: Avoid per-file project/service initialization inside the diff loop.

Line [5395] re-queries Project and Line [5401] recreates CodeProviderService for every file. For larger change sets, this can add significant avoidable latency and DB load. Consider resolving these once before the loop and reusing them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/intelligence/tools/code_changes_manager.py` around lines 5395 -
5402, The code is re-querying Project and instantiating CodeProviderService
inside the per-file diff loop; move the Project lookup
(db.query(Project).filter(Project.id == project_id).first()) and the
CodeProviderService creation (CodeProviderService(db)) out of the loop so they
are resolved once and reused; if the loop may contain multiple project_id
values, build a small cache keyed by project_id to reuse the same Project
instance and a single CodeProviderService per DB/session instead of recreating
them per file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/modules/parsing/graph_construction/parsing_service.py`:
- Around line 705-710: In duplicate_graph(), the except block currently logs the
exception with logger.exception but swallows it; update the handler to re-raise
a descriptive error after logging (e.g., raise ParsingError(f"Failed to
duplicate graph: {e}") from e or re-raise the original exception) so callers
know the operation failed; locate the try/except inside duplicate_graph and
replace the silent swallow with a raise that wraps the caught exception to
preserve context.

---

Nitpick comments:
In `@app/modules/intelligence/tools/code_changes_manager.py`:
- Around line 5395-5402: The code is re-querying Project and instantiating
CodeProviderService inside the per-file diff loop; move the Project lookup
(db.query(Project).filter(Project.id == project_id).first()) and the
CodeProviderService creation (CodeProviderService(db)) out of the loop so they
are resolved once and reused; if the loop may contain multiple project_id
values, build a small cache keyed by project_id to reuse the same Project
instance and a single CodeProviderService per DB/session instead of recreating
them per file.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1571110 and b20915a.

📒 Files selected for processing (4)
  • app/modules/intelligence/agents/chat_agents/history_processor.py
  • app/modules/intelligence/tools/code_changes_manager.py
  • app/modules/parsing/graph_construction/parsing_service.py
  • tests/unit/parsing/test_parsing_service_method.py

Comment thread app/modules/parsing/graph_construction/parsing_service.py Outdated
@arnavp27 arnavp27 force-pushed the fix/duplicate-graph-indentation branch from b20915a to a9d7256 Compare February 27, 2026 22:32
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/modules/parsing/graph_construction/parsing_service.py`:
- Around line 621-625: Move the await
self.search_service.clone_search_indices(old_repo_id, new_repo_id) call inside
the existing try block in duplicate_graph so any exceptions are caught and
rethrown as ParsingServiceError; locate the duplicate_graph method and the
try/except that currently wraps node_batch_size/relationship_batch_size and
ensure clone_search_indices is invoked before other work but after entering the
try, and that the except block catches errors and raises ParsingServiceError
(using the same pattern/format as other error handling in this method).
- Around line 627-628: The async method duplicate_graph() currently opens a
synchronous Neo4j session with with self.inference_service.driver.session(),
which blocks the event loop; fix by either switching InferenceService.driver to
use the async Neo4j driver and replace that block with an async context (async
with self.inference_service.driver.session() as session and await calls), or
keep the sync driver but move the blocking session work into a threadpool via
asyncio.get_running_loop().run_in_executor(...) (i.e., wrap the whole session
usage and any session.run()/transaction calls inside a function submitted to
run_in_executor). Update duplicate_graph() accordingly and ensure any subsequent
calls that expect results are awaited/returned from the executor task.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b20915a and a9d7256.

📒 Files selected for processing (2)
  • app/modules/parsing/graph_construction/parsing_service.py
  • tests/unit/parsing/test_parsing_service_method.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/parsing/test_parsing_service_method.py

Comment thread app/modules/parsing/graph_construction/parsing_service.py
Comment on lines +627 to +628
with self.inference_service.driver.session() as session:
offset = 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# First, find the method around lines 627-628
echo "=== Method signature and context around lines 627-628 ==="
sed -n '600,650p' app/modules/parsing/graph_construction/parsing_service.py | head -60

# Find the method definition that contains line 627
echo -e "\n=== Searching for method definition before line 627 ==="
sed -n '580,627p' app/modules/parsing/graph_construction/parsing_service.py | tail -50

Repository: potpie-ai/potpie

Length of output: 5089


🏁 Script executed:

#!/bin/bash

# Find InferenceService class definition
echo "=== Finding InferenceService class ==="
fd -t f -name "*.py" | xargs grep -l "class InferenceService" 

# Check how driver is defined in InferenceService
echo -e "\n=== InferenceService driver initialization ==="
fd -t f -name "*.py" | xargs grep -B5 -A10 "class InferenceService"

Repository: potpie-ai/potpie

Length of output: 627


🏁 Script executed:

#!/bin/bash

# Check where inference_service is instantiated and what driver it gets
echo "=== Find inference_service instantiation ==="
rg -n "inference_service\s*=" --type py | head -20

# Check for self.inference_service definition
echo -e "\n=== Find self.inference_service assignment ==="
rg -n "self\.inference_service\s*=" --type py -A3 | head -30

Repository: potpie-ai/potpie

Length of output: 1610


🏁 Script executed:

#!/bin/bash

# Check Neo4j driver usage patterns in the codebase
echo "=== Check driver session usage patterns ==="
rg -n "driver\.session\(\)" --type py | head -20

# Check if there are any async driver usages
echo -e "\n=== Check for async driver usage ==="
rg -n "async.*driver|driver.*async" --type py | head -20

# Specifically check if session() has async variant
echo -e "\n=== Search for async session calls ==="
rg -n "async with.*session\(\)" --type py

Repository: potpie-ai/potpie

Length of output: 2694


🏁 Script executed:

#!/bin/bash

# Find and display InferenceService class and driver property
echo "=== InferenceService class definition ==="
rg -n "class InferenceService" --type py -A 30

# Find driver initialization in InferenceService
echo -e "\n=== InferenceService driver initialization ==="
rg -n "self.driver\s*=" app/modules/parsing/knowledge_graph/inference_service.py -A 3

# Check the driver property definition
echo -e "\n=== InferenceService driver property ==="
rg -n "@property" app/modules/parsing/knowledge_graph/inference_service.py -A 5 | grep -A 5 "driver"

Repository: potpie-ai/potpie

Length of output: 3468


🏁 Script executed:

#!/bin/bash

# Look at the __init__ method of InferenceService to see how driver is set
echo "=== InferenceService __init__ method ==="
sed -n '1,150p' app/modules/parsing/knowledge_graph/inference_service.py | head -100

Repository: potpie-ai/potpie

Length of output: 3634


Use async Neo4j driver or wrap synchronous session calls with asyncio.run_in_executor().

The async def duplicate_graph() method uses synchronous blocking Neo4j session calls (with self.inference_service.driver.session()). The InferenceService.driver is initialized with GraphDatabase.driver(), which is the synchronous driver. These blocking calls will block the event loop in an async context. Either migrate to async Neo4j driver (async with) or wrap the session calls with asyncio.run_in_executor() to run them in a thread pool.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/modules/parsing/graph_construction/parsing_service.py` around lines 627 -
628, The async method duplicate_graph() currently opens a synchronous Neo4j
session with with self.inference_service.driver.session(), which blocks the
event loop; fix by either switching InferenceService.driver to use the async
Neo4j driver and replace that block with an async context (async with
self.inference_service.driver.session() as session and await calls), or keep the
sync driver but move the blocking session work into a threadpool via
asyncio.get_running_loop().run_in_executor(...) (i.e., wrap the whole session
usage and any session.run()/transaction calls inside a function submitted to
run_in_executor). Update duplicate_graph() accordingly and ensure any subsequent
calls that expect results are awaited/returned from the executor task.

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)
D Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant