Skip to content

generation module_explainer

aakash-anko edited this page May 25, 2026 · 1 revision

generation / module_explainer.py

Generates LLM-powered explanations for individual modules or all modules at once, including their files, dependencies, and role in the system.


Key Concepts

Term Definition Example
embedding A numerical vector (list of numbers) that represents the meaning of text. Similar text → similar vectors. The code def add(a, b): return a+b might become [0.12, -0.45, 0.78, ...] (1536 numbers for OpenAI).
RAG Retrieval-Augmented Generation — instead of asking an LLM to answer from memory, first retrieve relevant documents, then include them in the prompt. Question: "What does scan_directory do?" → retrieve the source code of scan_directory → include it in the LLM prompt → get an accurate answer.
LLM Large Language Model — an AI model (like GPT-4, Claude) that generates text given a prompt. get_llm() returns a ChatOpenAI instance that can answer questions about code.
LangChain A Python framework for building LLM-powered applications. Provides chains (prompt → LLM → parser), structured output, and more. prompt | llm | StrOutputParser() creates a chain that formats a prompt, sends to LLM, and extracts the string response.

Source: src/codewalk/generation/module_explainer.py


Functions

_get_depended_by(module_name, module_graph) (line 48)

What it does: Performs a reverse lookup on the module dependency graph to find which modules depend ON the given module.

Example input:

module_name = "embeddings"
module_graph = {
    "rag": ["embeddings", "analysis"],
    "analysis": ["ingestion"],
    "generation": ["embeddings"],
}

Line-by-line walkthrough:

  • Line 59: depended_by = [] — empty list to collect results.

  • Line 60: for other_module, dependencies in module_graph.items() — iterates all modules.

    • First iteration: other_module = "rag", dependencies = ["embeddings", "analysis"].

    • Line 61: if module_name in dependencies"embeddings" in ["embeddings", "analysis"]True.

    • Line 62: depended_by.append("rag")depended_by = ["rag"].

    • Second iteration: other_module = "analysis", dependencies = ["ingestion"].

    • Line 61: "embeddings" in ["ingestion"]False. Skip.

    • Third iteration: other_module = "generation", dependencies = ["embeddings"].

    • Line 61: "embeddings" in ["embeddings"]True.

    • Line 62: depended_by.append("generation")depended_by = ["rag", "generation"].

  • Line 64: return depended_by["rag", "generation"].

Return value: ["rag", "generation"]


_format_file_list(files) (line 66)

What it does: Converts a list of full file paths into a bulleted Markdown list showing just the filenames.

Example input:

files = [
    "src/codewalk/analysis/dependency_graph.py",
    "src/codewalk/analysis/code_parser.py"
]

Line-by-line walkthrough:

  • Line 74: return "\n".join(f"- {path.split('/')[-1]}" for path in sorted(files))
    • sorted(files)["src/codewalk/analysis/code_parser.py", "src/codewalk/analysis/dependency_graph.py"] (alphabetical).
    • For "src/codewalk/analysis/code_parser.py": .split("/")[-1]"code_parser.py""- code_parser.py".
    • For "src/codewalk/analysis/dependency_graph.py": .split("/")[-1]"dependency_graph.py""- dependency_graph.py".
    • Joins with "\n".

Return value:

- code_parser.py
- dependency_graph.py

explain_module(module_name, module_info, module_graph) (line 76)

What it does: Generates a Markdown explanation for one specific module by formatting its metadata into an LLM prompt and returning the generated text.

Example input:

module_name = "analysis"
module_info = {
    "files": ["src/codewalk/analysis/code_parser.py", "src/codewalk/analysis/dependency_graph.py"],
    "languages": {"python": 2},
    "file_count": 2
}
module_graph = {
    "analysis": ["ingestion"],
    "rag": ["embeddings", "analysis"],
}

Line-by-line walkthrough:

  • Line 90: depends_on = module_graph.get(module_name, [])module_graph.get("analysis", [])["ingestion"].

  • Line 93: depended_by = _get_depended_by(module_name, module_graph) — calls the reverse lookup. Checks every module: "rag" depends on ["embeddings", "analysis"] which includes "analysis" → adds "rag". Result: depended_by = ["rag"].

  • Line 96: file_list = _format_file_list(module_info["files"])"- code_parser.py\n- dependency_graph.py".

  • Lines 97–99: languages = ", ".join(...) — iterates sorted(module_info["languages"].items())[("python", 2)]"python(2)". Result: languages = "python(2)".

  • Lines 102–105: prompt = ChatPromptTemplate.from_messages([...]) — builds the prompt template with the system prompt (lines 11–33) and human prompt (lines 35–46). Placeholders: {module_name}, {file_count}, {file_list}, {languages}, {depends_on}, {depended_by}.

  • Line 107: llm = get_llm() — gets configured LLM.

  • Line 108: chain = prompt | llm | StrOutputParser() — prompt → LLM → string.

  • Lines 110–117: explanation = chain.invoke({...}) — fills placeholders:

    • "module_name""analysis"
    • "file_count"2
    • "file_list""- code_parser.py\n- dependency_graph.py"
    • "languages""python(2)"
    • "depends_on""ingestion" (joined from ["ingestion"])
    • "depended_by""rag" (joined from ["rag"])
  • Line 119: return explanation — the LLM-generated Markdown string.

Return value: A Markdown string explaining the analysis module's purpose, key files, dependencies, and role in the system.


explain_all_modules(module_results) (line 121)

What it does: Iterates over every module in the codebase and calls explain_module() for each one, returning a dictionary of all explanations.

Example input:

module_results = {
    "modules": {
        "analysis": {"files": ["src/codewalk/analysis/code_parser.py"], "languages": {"python": 1}, "file_count": 1},
        "rag": {"files": ["src/codewalk/rag/chain.py"], "languages": {"python": 1}, "file_count": 1},
    },
    "module_graph": {"analysis": ["ingestion"], "rag": ["embeddings"]},
}

Line-by-line walkthrough:

  • Line 131: explanations = {} — empty dict to collect module → explanation pairs.

  • Line 132: for module_name, module_info in sorted(module_results["modules"].items()) — iterates alphabetically. First: module_name = "analysis", module_info = {"files": [...], "languages": {"python": 1}, "file_count": 1}.

  • Line 133: _log(f"[explainer] Explaining module: {module_name}...") — logs "[explainer] Explaining module: analysis...".

  • Lines 134–138: explanations[module_name] = explain_module(module_name, module_info, module_results["module_graph"]) — calls explain_module("analysis", {...}, {"analysis": ["ingestion"], "rag": ["embeddings"]}). Stores the returned Markdown in explanations["analysis"].

  • Second iteration: module_name = "rag". Logs "[explainer] Explaining module: rag...". Calls explain_module("rag", {...}, {...}). Stores result in explanations["rag"].

  • Line 140: return explanations.

Return value:

{
    "analysis": "## analysis\n**Purpose**: ...(LLM-generated markdown)...",
    "rag": "## rag\n**Purpose**: ...(LLM-generated markdown)...",
}

Clone this wiki locally