Skip to content

Latest commit

 

History

History
118 lines (82 loc) · 6.82 KB

File metadata and controls

118 lines (82 loc) · 6.82 KB

Code-Fragment Value Audit

Generated by scripts/_audit_code_value.py.

This audit flags Python code fragments that add no pedagogical or technical value: pure-data schemas with no behaviour, constants-only blocks, string-template-only assignments, identity-print demos, mock/fake API functions returning canned responses, pure-delegation wrappers, and pseudo-API blocks using invented method names.

Each fragment is scored across nine value signals. A fragment is flagged when it matches at least one strong no-value pattern (e.g. mock/fake API), reaches a weighted score of 4, or trips two or more signals.

  • Total Python fragments scanned: 1050
  • Fragments skipped (captions legitimately describe schemas / data models): 27
  • Fragments with parse errors (excluded): 120
  • Fragments flagged as low-value: 2

Calibration note: This audit is intentionally conservative. The initial loose-threshold run produced 46 candidates, of which ~40 were false positives from real PEFT/transformers code where the detector misjudged method names like load_adapter, get_or_create_collection, search_by_metadata as 'invented' (they are real third-party SDK methods). The tightened rules require either an explicit mock/fake function or a fragment with imports of zero known libraries AND an English-sentence-shaped method name (e.g. solve_the_problem).

Parse-error caveat: 120 fragments failed AST parsing (typically due to broken indentation in fragment extraction) and were excluded. Most are high-value implementation code (federated LoRA, bootstrap tests, etc.) but a few could harbour low-value patterns the audit cannot reach. The code-fragment-fix-report.md and missing-imports-audit.md cover related issues in those fragments.

Signal weights (higher = stronger signal of no-value)

  • mock_or_fake_api (weight 6)
  • pure_data_block (weight 5)
  • constants_only (weight 5)
  • wrapper_without_impl (weight 5)
  • pseudo_api_block (weight 5)
  • string_template_only (weight 4)
  • identity_print_demo (weight 4)
  • no_imports_no_calls (weight 3)
  • trivial_print_only (weight 3)

Signal frequency across flagged fragments

  • string_template_only: 1
  • no_imports_no_calls: 1
  • mock_or_fake_api: 1

Top 30 Worst Offenders

Sorted by weighted score (sum of signal weights), then by number of signals tripped.

# File Line Fragment Score # signals Top signals Caption (truncated)
1 part-5-retrieval-conversation/module-20-conversational-ai/section-20.1.html 298 20.1.2 7 2 string_template_only, no_imports_no_calls Building a stateful dialogue manager that tracks conversation phase, collects required slo...
2 appendices/appendix-k-langchain/section-k.1.html 262 L.1.7 6 1 mock_or_fake_api Pattern: pass the question through while also running a retrieval step

Rewrite Sketches (Top 10)

1. part-5-retrieval-conversation/module-20-conversational-ai/section-20.1.html (Code Fragment 20.1.2, line 298)

Caption: Building a stateful dialogue manager that tracks conversation phase, collects required slots, and transitions between states based on user input.

Signals tripped: string_template_only, no_imports_no_calls (score: 7)

# Implementation example
CUSTOMER_SUPPORT_PROMPT = """You are Aria, a customer support specialist for TechFlow,
an electronics retailer. Follow these guidelines precisely.
## Identity and Tone
- Professional yet warm; use the customer's name when available
- Empathetic but efficient; acknowledge frustration, then move to solutions
- Never sarcastic, condescending, or dismissive
- Use simple language; avoid technical jargon unless the customer uses it first
## Capabilities
You CAN:
- Look up order status using the check_order tool
- Process returns and exchanges for orders within 30 days
- Apply discount codes and promotional offers
- Update shipping addresses before an order ships
# ... (truncated)

Rewrite sketch (effort S): Pair the template with a render+call. Use langchain_core.prompts.ChatPromptTemplate.from_messages(...).invoke({...}) (or python's .format(**vars)), then ship the rendered messages to client.chat.completions.create(...) and print the model's reply. Show both the template AND the resulting LLM output side by side. Lesson: templates are abstract until you see what they produce; the reader needs to see one filled-in instance and its response.

2. appendices/appendix-k-langchain/section-k.1.html (Code Fragment L.1.7, line 262)

Caption: Pattern: pass the question through while also running a retrieval step

Signals tripped: mock_or_fake_api (score: 6)

from langchain_core.runnables import RunnablePassthrough, RunnableParallel

# Pattern: pass the question through while also running a retrieval step
# (retriever would be a real vectorstore retriever in practice)
def mock_retriever(query: dict) -> str:
    return "Python was created by Guido van Rossum in 1991."

setup = RunnableParallel(
    context=lambda x: mock_retriever(x),
    question=RunnablePassthrough()
)

rag_prompt = ChatPromptTemplate.from_template(
    "Context: {context}\n\nAnswer this question: {question}"
# ... (truncated)

Rewrite sketch (effort M): Replace the canned response with a real call. Use the openai Python SDK (client.chat.completions.create(model='gpt-4o-mini', messages=[...])) or anthropic (client.messages.create(model='claude-3-5-sonnet', max_tokens=512, messages=[...])). Show the actual response object the reader will see in their terminal, so they recognise the shape (.choices[0].message.content or .content[0].text). Lesson: how a real LLM call looks at the SDK layer and what fields the response carries (usage tokens, finish_reason, model id).

Recommended Editorial Priority

Findings by top-level folder (sorted by count):

Folder Flagged Highest-score example
part-5-retrieval-conversation 1 Fragment 20.1.2 (7 pts)
appendices 1 Fragment L.1.7 (6 pts)

Priority guidance:

  1. Fix mock_or_fake_api first. A function that returns a hard-coded string named call_llm actively teaches the wrong mental model -- readers leave with an API shape that does not exist.

  2. Then pseudo_api_block. Invented method names compile but fail at the REPL; this destroys trust in the entire chapter.

  3. Then pure_data_block and constants_only. These are pedagogically soft but waste page-budget. Move the data into prose and reclaim the code block for a runnable example.

  4. Defer wrapper_without_impl and string_template_only. These are structural smells; rewriting them often requires changes to adjacent fragments and prose, so batch them per-section.

  5. Treat identity_print_demo and trivial_print_only as last-pass polish. They are not actively misleading; they just don't earn their page-budget.