-
Notifications
You must be signed in to change notification settings - Fork 0
generation module_explainer
Generates LLM-powered explanations for individual modules or all modules at once, including their files, dependencies, and role in the system.
| Term | Definition | Example |
|---|---|---|
| embedding | A numerical vector (list of numbers) that represents the meaning of text. Similar text → similar vectors. | The code def add(a, b): return a+b might become [0.12, -0.45, 0.78, ...] (1536 numbers for OpenAI). |
| RAG | Retrieval-Augmented Generation — instead of asking an LLM to answer from memory, first retrieve relevant documents, then include them in the prompt. | Question: "What does scan_directory do?" → retrieve the source code of scan_directory → include it in the LLM prompt → get an accurate answer. |
| LLM | Large Language Model — an AI model (like GPT-4, Claude) that generates text given a prompt. |
get_llm() returns a ChatOpenAI instance that can answer questions about code. |
| LangChain | A Python framework for building LLM-powered applications. Provides chains (prompt → LLM → parser), structured output, and more. |
prompt | llm | StrOutputParser() creates a chain that formats a prompt, sends to LLM, and extracts the string response. |
Source: src/codewalk/generation/module_explainer.py
What it does: Performs a reverse lookup on the module dependency graph to find which modules depend ON the given module.
Example input:
module_name = "embeddings"
module_graph = {
"rag": ["embeddings", "analysis"],
"analysis": ["ingestion"],
"generation": ["embeddings"],
}Line-by-line walkthrough:
-
Line 59:
depended_by = []— empty list to collect results. -
Line 60:
for other_module, dependencies in module_graph.items()— iterates all modules.-
First iteration:
other_module = "rag",dependencies = ["embeddings", "analysis"]. -
Line 61:
if module_name in dependencies→"embeddings" in ["embeddings", "analysis"]→True. -
Line 62:
depended_by.append("rag")→depended_by = ["rag"]. -
Second iteration:
other_module = "analysis",dependencies = ["ingestion"]. -
Line 61:
"embeddings" in ["ingestion"]→False. Skip. -
Third iteration:
other_module = "generation",dependencies = ["embeddings"]. -
Line 61:
"embeddings" in ["embeddings"]→True. -
Line 62:
depended_by.append("generation")→depended_by = ["rag", "generation"].
-
-
Line 64:
return depended_by→["rag", "generation"].
Return value: ["rag", "generation"]
What it does: Converts a list of full file paths into a bulleted Markdown list showing just the filenames.
Example input:
files = [
"src/codewalk/analysis/dependency_graph.py",
"src/codewalk/analysis/code_parser.py"
]Line-by-line walkthrough:
-
Line 74:
return "\n".join(f"- {path.split('/')[-1]}" for path in sorted(files))-
sorted(files)→["src/codewalk/analysis/code_parser.py", "src/codewalk/analysis/dependency_graph.py"](alphabetical). - For
"src/codewalk/analysis/code_parser.py":.split("/")[-1]→"code_parser.py"→"- code_parser.py". - For
"src/codewalk/analysis/dependency_graph.py":.split("/")[-1]→"dependency_graph.py"→"- dependency_graph.py". - Joins with
"\n".
-
Return value:
- code_parser.py
- dependency_graph.py
What it does: Generates a Markdown explanation for one specific module by formatting its metadata into an LLM prompt and returning the generated text.
Example input:
module_name = "analysis"
module_info = {
"files": ["src/codewalk/analysis/code_parser.py", "src/codewalk/analysis/dependency_graph.py"],
"languages": {"python": 2},
"file_count": 2
}
module_graph = {
"analysis": ["ingestion"],
"rag": ["embeddings", "analysis"],
}Line-by-line walkthrough:
-
Line 90:
depends_on = module_graph.get(module_name, [])→module_graph.get("analysis", [])→["ingestion"]. -
Line 93:
depended_by = _get_depended_by(module_name, module_graph)— calls the reverse lookup. Checks every module:"rag"depends on["embeddings", "analysis"]which includes"analysis"→ adds"rag". Result:depended_by = ["rag"]. -
Line 96:
file_list = _format_file_list(module_info["files"])→"- code_parser.py\n- dependency_graph.py". -
Lines 97–99:
languages = ", ".join(...)— iteratessorted(module_info["languages"].items())→[("python", 2)]→"python(2)". Result:languages = "python(2)". -
Lines 102–105:
prompt = ChatPromptTemplate.from_messages([...])— builds the prompt template with the system prompt (lines 11–33) and human prompt (lines 35–46). Placeholders:{module_name},{file_count},{file_list},{languages},{depends_on},{depended_by}. -
Line 107:
llm = get_llm()— gets configured LLM. -
Line 108:
chain = prompt | llm | StrOutputParser()— prompt → LLM → string. -
Lines 110–117:
explanation = chain.invoke({...})— fills placeholders:-
"module_name"→"analysis" -
"file_count"→2 -
"file_list"→"- code_parser.py\n- dependency_graph.py" -
"languages"→"python(2)" -
"depends_on"→"ingestion"(joined from["ingestion"]) -
"depended_by"→"rag"(joined from["rag"])
-
-
Line 119:
return explanation— the LLM-generated Markdown string.
Return value: A Markdown string explaining the analysis module's purpose, key files, dependencies, and role in the system.
What it does: Iterates over every module in the codebase and calls explain_module() for each one, returning a dictionary of all explanations.
Example input:
module_results = {
"modules": {
"analysis": {"files": ["src/codewalk/analysis/code_parser.py"], "languages": {"python": 1}, "file_count": 1},
"rag": {"files": ["src/codewalk/rag/chain.py"], "languages": {"python": 1}, "file_count": 1},
},
"module_graph": {"analysis": ["ingestion"], "rag": ["embeddings"]},
}Line-by-line walkthrough:
-
Line 131:
explanations = {}— empty dict to collect module → explanation pairs. -
Line 132:
for module_name, module_info in sorted(module_results["modules"].items())— iterates alphabetically. First:module_name = "analysis",module_info = {"files": [...], "languages": {"python": 1}, "file_count": 1}. -
Line 133:
_log(f"[explainer] Explaining module: {module_name}...")— logs"[explainer] Explaining module: analysis...". -
Lines 134–138:
explanations[module_name] = explain_module(module_name, module_info, module_results["module_graph"])— callsexplain_module("analysis", {...}, {"analysis": ["ingestion"], "rag": ["embeddings"]}). Stores the returned Markdown inexplanations["analysis"]. -
Second iteration:
module_name = "rag". Logs"[explainer] Explaining module: rag...". Callsexplain_module("rag", {...}, {...}). Stores result inexplanations["rag"]. -
Line 140:
return explanations.
Return value:
{
"analysis": "## analysis\n**Purpose**: ...(LLM-generated markdown)...",
"rag": "## rag\n**Purpose**: ...(LLM-generated markdown)...",
}