generation module_explainer

generation / module_explainer.py

Generates LLM-powered explanations for individual modules or all modules at once, including their files, dependencies, and role in the system.

Key Concepts

Term	Definition	Example
embedding	A numerical vector (list of numbers) that represents the meaning of text. Similar text → similar vectors.	The code `def add(a, b): return a+b` might become `[0.12, -0.45, 0.78, ...]` (1536 numbers for OpenAI).
RAG	Retrieval-Augmented Generation — instead of asking an LLM to answer from memory, first retrieve relevant documents, then include them in the prompt.	Question: `"What does scan_directory do?"` → retrieve the source code of scan_directory → include it in the LLM prompt → get an accurate answer.
LLM	Large Language Model — an AI model (like GPT-4, Claude) that generates text given a prompt.	`get_llm()` returns a ChatOpenAI instance that can answer questions about code.
LangChain	A Python framework for building LLM-powered applications. Provides chains (prompt → LLM → parser), structured output, and more.	`prompt \| llm \| StrOutputParser()` creates a chain that formats a prompt, sends to LLM, and extracts the string response.

Source: src/codewalk/generation/module_explainer.py

Functions

`_get_depended_by(module_name, module_graph)` (line 48)

What it does: Performs a reverse lookup on the module dependency graph to find which modules depend ON the given module.

Example input:

module_name = "embeddings"
module_graph = {
    "rag": ["embeddings", "analysis"],
    "analysis": ["ingestion"],
    "generation": ["embeddings"],
}

Line-by-line walkthrough:

Line 59: depended_by = [] — empty list to collect results.
Line 60: for other_module, dependencies in module_graph.items() — iterates all modules.
- First iteration: other_module = "rag", dependencies = ["embeddings", "analysis"].
- Line 61: if module_name in dependencies → "embeddings" in ["embeddings", "analysis"] → True.
- Line 62: depended_by.append("rag") → depended_by = ["rag"].
- Second iteration: other_module = "analysis", dependencies = ["ingestion"].
- Line 61: "embeddings" in ["ingestion"] → False. Skip.
- Third iteration: other_module = "generation", dependencies = ["embeddings"].
- Line 61: "embeddings" in ["embeddings"] → True.
- Line 62: depended_by.append("generation") → depended_by = ["rag", "generation"].
Line 64: return depended_by → ["rag", "generation"].

Return value: ["rag", "generation"]

`_format_file_list(files)` (line 66)

What it does: Converts a list of full file paths into a bulleted Markdown list showing just the filenames.

Example input:

files = [
    "src/codewalk/analysis/dependency_graph.py",
    "src/codewalk/analysis/code_parser.py"
]

Line-by-line walkthrough:

Line 74: return "\n".join(f"- {path.split('/')[-1]}" for path in sorted(files))
- sorted(files) → ["src/codewalk/analysis/code_parser.py", "src/codewalk/analysis/dependency_graph.py"] (alphabetical).
- For "src/codewalk/analysis/code_parser.py": .split("/")[-1] → "code_parser.py" → "- code_parser.py".
- For "src/codewalk/analysis/dependency_graph.py": .split("/")[-1] → "dependency_graph.py" → "- dependency_graph.py".
- Joins with "\n".

Return value:

- code_parser.py
- dependency_graph.py

`explain_module(module_name, module_info, module_graph)` (line 76)

What it does: Generates a Markdown explanation for one specific module by formatting its metadata into an LLM prompt and returning the generated text.

Example input:

module_name = "analysis"
module_info = {
    "files": ["src/codewalk/analysis/code_parser.py", "src/codewalk/analysis/dependency_graph.py"],
    "languages": {"python": 2},
    "file_count": 2
}
module_graph = {
    "analysis": ["ingestion"],
    "rag": ["embeddings", "analysis"],
}

Line-by-line walkthrough:

Line 90: depends_on = module_graph.get(module_name, []) → module_graph.get("analysis", []) → ["ingestion"].
Line 93: depended_by = _get_depended_by(module_name, module_graph) — calls the reverse lookup. Checks every module: "rag" depends on ["embeddings", "analysis"] which includes "analysis" → adds "rag". Result: depended_by = ["rag"].
Line 96: file_list = _format_file_list(module_info["files"]) → "- code_parser.py\n- dependency_graph.py".
Lines 97–99: languages = ", ".join(...) — iterates sorted(module_info["languages"].items()) → [("python", 2)] → "python(2)". Result: languages = "python(2)".
Lines 102–105: prompt = ChatPromptTemplate.from_messages([...]) — builds the prompt template with the system prompt (lines 11–33) and human prompt (lines 35–46). Placeholders: {module_name}, {file_count}, {file_list}, {languages}, {depends_on}, {depended_by}.
Line 107: llm = get_llm() — gets configured LLM.
Line 108: chain = prompt | llm | StrOutputParser() — prompt → LLM → string.
Lines 110–117: explanation = chain.invoke({...}) — fills placeholders:
- "module_name" → "analysis"
- "file_count" → 2
- "file_list" → "- code_parser.py\n- dependency_graph.py"
- "languages" → "python(2)"
- "depends_on" → "ingestion" (joined from ["ingestion"])
- "depended_by" → "rag" (joined from ["rag"])
Line 119: return explanation — the LLM-generated Markdown string.

Return value: A Markdown string explaining the analysis module's purpose, key files, dependencies, and role in the system.

`explain_all_modules(module_results)` (line 121)

What it does: Iterates over every module in the codebase and calls explain_module() for each one, returning a dictionary of all explanations.

Example input:

module_results = {
    "modules": {
        "analysis": {"files": ["src/codewalk/analysis/code_parser.py"], "languages": {"python": 1}, "file_count": 1},
        "rag": {"files": ["src/codewalk/rag/chain.py"], "languages": {"python": 1}, "file_count": 1},
    },
    "module_graph": {"analysis": ["ingestion"], "rag": ["embeddings"]},
}

Line-by-line walkthrough:

Line 131: explanations = {} — empty dict to collect module → explanation pairs.
Line 132: for module_name, module_info in sorted(module_results["modules"].items()) — iterates alphabetically. First: module_name = "analysis", module_info = {"files": [...], "languages": {"python": 1}, "file_count": 1}.
Line 133: _log(f"[explainer] Explaining module: {module_name}...") — logs "[explainer] Explaining module: analysis...".
Lines 134–138: explanations[module_name] = explain_module(module_name, module_info, module_results["module_graph"]) — calls explain_module("analysis", {...}, {"analysis": ["ingestion"], "rag": ["embeddings"]}). Stores the returned Markdown in explanations["analysis"].
Second iteration: module_name = "rag". Logs "[explainer] Explaining module: rag...". Calls explain_module("rag", {...}, {...}). Stores result in explanations["rag"].
Line 140: return explanations.

Return value:

{
    "analysis": "## analysis\n**Purpose**: ...(LLM-generated markdown)...",
    "rag": "## rag\n**Purpose**: ...(LLM-generated markdown)...",
}

generation module_explainer

generation / module_explainer.py

Key Concepts

Functions

_get_depended_by(module_name, module_graph) (line 48)

_format_file_list(files) (line 66)

explain_module(module_name, module_info, module_graph) (line 76)

explain_all_modules(module_results) (line 121)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`_get_depended_by(module_name, module_graph)` (line 48)

`_format_file_list(files)` (line 66)

`explain_module(module_name, module_info, module_graph)` (line 76)

`explain_all_modules(module_results)` (line 121)