Fix: duplicate_graph defined at module level instead of as ParsingService method#660
Fix: duplicate_graph defined at module level instead of as ParsingService method#660arnavp27 wants to merge 3 commits intopotpie-ai:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
WalkthroughThe top-level Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
app/modules/intelligence/tools/code_changes_manager.py (1)
5395-5402: Avoid per-file project/service initialization inside the diff loop.Line [5395] re-queries
Projectand Line [5401] recreatesCodeProviderServicefor every file. For larger change sets, this can add significant avoidable latency and DB load. Consider resolving these once before the loop and reusing them.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/modules/intelligence/tools/code_changes_manager.py` around lines 5395 - 5402, The code is re-querying Project and instantiating CodeProviderService inside the per-file diff loop; move the Project lookup (db.query(Project).filter(Project.id == project_id).first()) and the CodeProviderService creation (CodeProviderService(db)) out of the loop so they are resolved once and reused; if the loop may contain multiple project_id values, build a small cache keyed by project_id to reuse the same Project instance and a single CodeProviderService per DB/session instead of recreating them per file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/modules/parsing/graph_construction/parsing_service.py`:
- Around line 705-710: In duplicate_graph(), the except block currently logs the
exception with logger.exception but swallows it; update the handler to re-raise
a descriptive error after logging (e.g., raise ParsingError(f"Failed to
duplicate graph: {e}") from e or re-raise the original exception) so callers
know the operation failed; locate the try/except inside duplicate_graph and
replace the silent swallow with a raise that wraps the caught exception to
preserve context.
---
Nitpick comments:
In `@app/modules/intelligence/tools/code_changes_manager.py`:
- Around line 5395-5402: The code is re-querying Project and instantiating
CodeProviderService inside the per-file diff loop; move the Project lookup
(db.query(Project).filter(Project.id == project_id).first()) and the
CodeProviderService creation (CodeProviderService(db)) out of the loop so they
are resolved once and reused; if the loop may contain multiple project_id
values, build a small cache keyed by project_id to reuse the same Project
instance and a single CodeProviderService per DB/session instead of recreating
them per file.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
app/modules/intelligence/agents/chat_agents/history_processor.pyapp/modules/intelligence/tools/code_changes_manager.pyapp/modules/parsing/graph_construction/parsing_service.pytests/unit/parsing/test_parsing_service_method.py
b20915a to
a9d7256
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/modules/parsing/graph_construction/parsing_service.py`:
- Around line 621-625: Move the await
self.search_service.clone_search_indices(old_repo_id, new_repo_id) call inside
the existing try block in duplicate_graph so any exceptions are caught and
rethrown as ParsingServiceError; locate the duplicate_graph method and the
try/except that currently wraps node_batch_size/relationship_batch_size and
ensure clone_search_indices is invoked before other work but after entering the
try, and that the except block catches errors and raises ParsingServiceError
(using the same pattern/format as other error handling in this method).
- Around line 627-628: The async method duplicate_graph() currently opens a
synchronous Neo4j session with with self.inference_service.driver.session(),
which blocks the event loop; fix by either switching InferenceService.driver to
use the async Neo4j driver and replace that block with an async context (async
with self.inference_service.driver.session() as session and await calls), or
keep the sync driver but move the blocking session work into a threadpool via
asyncio.get_running_loop().run_in_executor(...) (i.e., wrap the whole session
usage and any session.run()/transaction calls inside a function submitted to
run_in_executor). Update duplicate_graph() accordingly and ensure any subsequent
calls that expect results are awaited/returned from the executor task.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
app/modules/parsing/graph_construction/parsing_service.pytests/unit/parsing/test_parsing_service_method.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/parsing/test_parsing_service_method.py
| with self.inference_service.driver.session() as session: | ||
| offset = 0 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, find the method around lines 627-628
echo "=== Method signature and context around lines 627-628 ==="
sed -n '600,650p' app/modules/parsing/graph_construction/parsing_service.py | head -60
# Find the method definition that contains line 627
echo -e "\n=== Searching for method definition before line 627 ==="
sed -n '580,627p' app/modules/parsing/graph_construction/parsing_service.py | tail -50Repository: potpie-ai/potpie
Length of output: 5089
🏁 Script executed:
#!/bin/bash
# Find InferenceService class definition
echo "=== Finding InferenceService class ==="
fd -t f -name "*.py" | xargs grep -l "class InferenceService"
# Check how driver is defined in InferenceService
echo -e "\n=== InferenceService driver initialization ==="
fd -t f -name "*.py" | xargs grep -B5 -A10 "class InferenceService"Repository: potpie-ai/potpie
Length of output: 627
🏁 Script executed:
#!/bin/bash
# Check where inference_service is instantiated and what driver it gets
echo "=== Find inference_service instantiation ==="
rg -n "inference_service\s*=" --type py | head -20
# Check for self.inference_service definition
echo -e "\n=== Find self.inference_service assignment ==="
rg -n "self\.inference_service\s*=" --type py -A3 | head -30Repository: potpie-ai/potpie
Length of output: 1610
🏁 Script executed:
#!/bin/bash
# Check Neo4j driver usage patterns in the codebase
echo "=== Check driver session usage patterns ==="
rg -n "driver\.session\(\)" --type py | head -20
# Check if there are any async driver usages
echo -e "\n=== Check for async driver usage ==="
rg -n "async.*driver|driver.*async" --type py | head -20
# Specifically check if session() has async variant
echo -e "\n=== Search for async session calls ==="
rg -n "async with.*session\(\)" --type pyRepository: potpie-ai/potpie
Length of output: 2694
🏁 Script executed:
#!/bin/bash
# Find and display InferenceService class and driver property
echo "=== InferenceService class definition ==="
rg -n "class InferenceService" --type py -A 30
# Find driver initialization in InferenceService
echo -e "\n=== InferenceService driver initialization ==="
rg -n "self.driver\s*=" app/modules/parsing/knowledge_graph/inference_service.py -A 3
# Check the driver property definition
echo -e "\n=== InferenceService driver property ==="
rg -n "@property" app/modules/parsing/knowledge_graph/inference_service.py -A 5 | grep -A 5 "driver"Repository: potpie-ai/potpie
Length of output: 3468
🏁 Script executed:
#!/bin/bash
# Look at the __init__ method of InferenceService to see how driver is set
echo "=== InferenceService __init__ method ==="
sed -n '1,150p' app/modules/parsing/knowledge_graph/inference_service.py | head -100Repository: potpie-ai/potpie
Length of output: 3634
Use async Neo4j driver or wrap synchronous session calls with asyncio.run_in_executor().
The async def duplicate_graph() method uses synchronous blocking Neo4j session calls (with self.inference_service.driver.session()). The InferenceService.driver is initialized with GraphDatabase.driver(), which is the synchronous driver. These blocking calls will block the event loop in an async context. Either migrate to async Neo4j driver (async with) or wrap the session calls with asyncio.run_in_executor() to run them in a thread pool.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/modules/parsing/graph_construction/parsing_service.py` around lines 627 -
628, The async method duplicate_graph() currently opens a synchronous Neo4j
session with with self.inference_service.driver.session(), which blocks the
event loop; fix by either switching InferenceService.driver to use the async
Neo4j driver and replace that block with an async context (async with
self.inference_service.driver.session() as session and await calls), or keep the
sync driver but move the blocking session work into a threadpool via
asyncio.get_running_loop().run_in_executor(...) (i.e., wrap the whole session
usage and any session.run()/transaction calls inside a function submitted to
run_in_executor). Update duplicate_graph() accordingly and ensure any subsequent
calls that expect results are awaited/returned from the executor task.
|




Summary
Fixes a silent structural bug where
duplicate_graphwas accidentally definedas a module-level function instead of as a method of the
ParsingServiceclass,making it completely unreachable through normal usage.
Root Cause
In
app/modules/parsing/graph_construction/parsing_service.py, theduplicate_graphfunction was defined at column 0 (no indentation), placingit outside the
ParsingServiceclass body. Every other method in the class iscorrectly indented at 4 spaces.
What Breaks at Runtime
Two failure modes result from this bug:
AttributeErroron the class - Any caller doingservice.duplicate_graph(old_id, new_id)on aParsingServiceinstancewould immediately get:
AttributeError: 'ParsingService' object has no attribute 'duplicate_graph'because the method simply does not exist on the class.
AttributeErrorinside the function — If somehow called directly as astandalone function (e.g.
duplicate_graph(some_obj, old_id, new_id)), bothself.search_serviceandself.inference_servicewould raiseAttributeErrorunless the caller manually passed a
ParsingServiceinstance as the firstargument — which is not how it was ever intended to be called.
The function references
self.search_service(line 622) andself.inference_service(lines 627, 670) — both are instance attributes set inParsingService.__init__- confirming it was always intended to be a class method.Fix
Re-indented the entire
duplicate_graphfunction body (lines 621–711) by 4spaces so it is correctly nested inside the
ParsingServiceclass.Test Added
Added a regression test in
tests/unit/parsing/test_parsing_service_method.pythat directly proves the bug and guards against regression:
def test_duplicate_graph_is_a_method_of_parsing_service(): assert hasattr(ParsingService, "duplicate_graph"), ( "duplicate_graph is not a method of ParsingService. " "It is defined at module level due to missing indentation." )This test fails on the original code and passes after the fix. It requires
no external dependencies (no database, no Neo4j, no mocks) — it purely validates
class structure.
Files Changed
app/modules/parsing/graph_construction/parsing_service.py- indentation fixtests/unit/parsing/test_parsing_service_method.py- new regression testSummary by CodeRabbit
Bug Fixes
Tests