Add loss analysis pipeline for autorater CSVs by mquirosbloch · Pull Request #1 · mquirosbloch/Complete-Python-3-Bootcamp

mquirosbloch · 2025-10-08T01:09:19Z

Summary

add a loss analysis script that batches autorater CSV rows and classifies them with an OpenAI chat model
aggregate category counts across models and generate a comparison chart
document usage requirements and workflow in the repository README

Testing

python loss_analysis.py --help (fails: ModuleNotFoundError: No module named 'matplotlib')

https://chatgpt.com/codex/tasks/task_e_68e5b87712f883329ae943bce47a9a38

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

@codex fix this CI failure
@codex address that feedback

chatgpt-codex-connector · 2025-10-08T01:10:52Z

loss_analysis.py

+def call_openai(model: str, prompt: str) -> str:
+    if openai is None:
+        raise RuntimeError(
+            "openai package is not installed. Install openai>=0.27.0 or run with --dry-run."
+        )
+    response = openai.ChatCompletion.create(
+        model=model,
+        temperature=0.0,
+        messages=[
+            {
+                "role": "system",
+                "content": "You classify agent errors into predefined categories.",
+            },
+            {"role": "user", "content": prompt},
+        ],
+    )
+    return response["choices"][0]["message"]["content"]


Use OpenAI v1 client instead of removed ChatCompletion API

The dependency instructions recommend pip install openai, which today installs the v1+ SDK. In that version the legacy openai.ChatCompletion.create entry point is removed, so call_openai raises AttributeError before any work is done. Either pin the older 0.x series or migrate to the new OpenAI() client (client.chat.completions.create). As written the script crashes immediately for anyone following the README.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-10-08T01:10:53Z

loss_analysis.py

+def build_prompt(categories: Sequence[Category], rows: pd.DataFrame) -> str:
+    category_lines = [
+        f"- {cat.name}: {cat.description}" if cat.description else f"- {cat.name}"
+        for cat in categories
+    ]
+    rows_payload = rows[
+        [
+            "task_id",
+            "step_index",
+            "website_issue",
+            "screenshot_description_correct",
+            "thought_reasonable",
+            "action_matches_thought",
+            "incorrect_coordinates",
+            "issue_summary",
+            "autorater_failure",
+            "g_lab_url",
+        ]
+    ]
+    serialised_rows = json.dumps(rows_payload.to_dict(orient="records"), ensure_ascii=False)
+    prompt = textwrap.dedent(
+        f"""
+        You are an expert analyst labelling loss types for a web browsing agent.
+        You will receive {len(rows)} log entries, each corresponding to an agent
+        step that contained an issue. For each entry choose exactly one category
+        from the list provided below and return a JSON array with {len(rows)}
+        objects in the same order. Each object must contain the keys
+        "task_id", "step_index", "category", and "explanation". The
+        explanation should be a short (<=20 word) reason for your choice.
+
+        Categories:
+        {os.linesep.join(category_lines)}
+
+        Input rows (JSON array):
+        {serialised_rows}
+        """
+    ).strip()
+    return prompt
+
+
+def call_openai(model: str, prompt: str) -> str:
+    if openai is None:
+        raise RuntimeError(
+            "openai package is not installed. Install openai>=0.27.0 or run with --dry-run."
+        )
+    response = openai.ChatCompletion.create(
+        model=model,
+        temperature=0.0,
+        messages=[
+            {
+                "role": "system",
+                "content": "You classify agent errors into predefined categories.",
+            },
+            {"role": "user", "content": prompt},
+        ],
+    )
+    return response["choices"][0]["message"]["content"]
+
+
+def parse_model_output(text: str, expected_len: int) -> List[Dict[str, str]]:
+    try:
+        data = json.loads(text)
+    except json.JSONDecodeError as exc:
+        raise ValueError(f"Model output is not valid JSON: {exc}\n{text}") from exc


Prompt allows non‑JSON wrapping that parse_model_output cannot handle

parse_model_output blindly feeds the model reply into json.loads, but the prompt in build_prompt never instructs the model to emit raw JSON only. Common chat completions wrap results in prose or a ```json code fence, which will cause json.loads to raise and terminate the pipeline even though the model classified correctly. The prompt should explicitly require a pure JSON array with no extra text or the parser should strip code fences before decoding.

Useful? React with 👍 / 👎.

Add loss analysis pipeline for autorater CSVs

630398e

mquirosbloch added the codex label Oct 8, 2025 — with ChatGPT Codex Connector

chatgpt-codex-connector bot reviewed Oct 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add loss analysis pipeline for autorater CSVs#1

Add loss analysis pipeline for autorater CSVs#1
mquirosbloch wants to merge 1 commit intomasterfrom
codex/build-data-insights-generator-for-loss-analysis

mquirosbloch commented Oct 8, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 8, 2025

Uh oh!

chatgpt-codex-connector bot Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mquirosbloch commented Oct 8, 2025

Summary

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant