Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions .agent/workflows/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Job Workflow Orchestration

This directory contains workflow definitions for the job application pipeline.

## Entrypoint

Use the merged workflow as the primary entrypoint:

- `intake-pipeline.md` (end-to-end: intake + tailor + finalize)

Legacy file:

- `tailor-finalize.md` (deprecated shim; points to `intake-pipeline.md`)

## Skills Used

- `job-matching-expertise` (classification stage)
- `resume-crafting-expertise` (resume bullet stage)

## Tools Used

- `scrape_jobs`
- `bulk_read_new_jobs`
- `bulk_update_job_status`
- `initialize_shortlist_trackers`
- `career_tailor`
- `finalize_resume_batch`

## Quick Start

```bash
# Open and execute the merged workflow
/intake-pipeline
```
203 changes: 203 additions & 0 deletions .agent/workflows/intake-pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
---
description: "End-to-end job pipeline: intake + tailor + finalize"
---

# Workflow: End-to-End Job Pipeline

## Goal

Run the full pipeline in one document:

1. Scrape jobs
2. Read new jobs
3. Classify jobs
4. Update job statuses
5. Initialize shortlist trackers
6. Collect shortlist trackers
7. Bootstrap resume workspace
8. Fill resume bullets
9. Compile PDFs
10. Finalize DB + tracker status

---

## Prerequisites

- MCP server running (`JOBWORKFLOW_DB`/`JOBWORKFLOW_ROOT` configured)
- Full resume exists (`JOBWORKFLOW_FULL_RESUME_PATH`)
- Resume template exists (`JOBWORKFLOW_RESUME_TEMPLATE_PATH`)
- Trackers dir configured (`JOBWORKFLOW_TRACKERS_DIR`, default `trackers/`)
- Skills available:
- `job-matching-expertise` (Step 3)
- `resume-crafting-expertise` (Step 8)

---

## Stage A: Intake

### Step 1: Scrape Jobs

**MCP Tool**: `scrape_jobs`

```python
scrape_jobs(dry_run=False)
```

**SUCCESS CRITERIA**: `inserted > 0` OR `duplicate > 0`

### Step 2: Read New Jobs Queue

**MCP Tool**: `bulk_read_new_jobs`

```python
result_read = bulk_read_new_jobs()
jobs = result_read["jobs"]
```

**SUCCESS CRITERIA**: `len(jobs) > 0`

### Step 3: Classify Jobs

**Reference Skill**: `job-matching-expertise`
**Mandatory**: load and apply this skill before classification.

For each job, produce `classification` in `shortlist | reviewed | reject`.

### Step 4: Update Job Statuses

**MCP Tool**: `bulk_update_job_status`

```python
updates = [
{"id": job["id"], "status": classification}
for job, (classification, reason) in zip(jobs, classifications)
]

bulk_update_job_status(updates=updates)
```

**SUCCESS CRITERIA**: `updated_count > 0`

### Step 5: Initialize Shortlist Trackers

**MCP Tool**: `initialize_shortlist_trackers`

```python
initialize_shortlist_trackers(force=False, dry_run=False)
```

**SUCCESS CRITERIA**: `created_count > 0` OR `skipped_count > 0`

---

## Stage B: Tailor and Finalize

```python
# Important: these two passes must use different force settings
bootstrap_force = True
compile_force = False
```

### Step 6: Collect Shortlist Trackers

```bash
trackers_dir="${JOBWORKFLOW_TRACKERS_DIR:-trackers}"
trackers=$(find "$trackers_dir" -name "*.md" -type f -print0 | \
xargs -0 grep -Eil "status:[[:space:]]*(shortlist|reviewed)" | \
head -10)

items=$(echo "$trackers" | jq -R -s -c 'split("\n")[:-1] | map({tracker_path: .})')
```

**SUCCESS CRITERIA**: `len(items) > 0`

### Step 7: Bootstrap Resume Workspace

**MCP Tool**: `career_tailor`

```python
result_bootstrap = career_tailor(items=items, force=bootstrap_force)
```

**SUCCESS CRITERIA**: each item has `resume_tex_path` and `ai_context_path` for Step 8 editing.

### Step 8: Fill Resume Bullets

**Reference Skill**: `resume-crafting-expertise`
**Mandatory**: load and apply this skill before editing any `resume.tex`.

For each successful item from Step 7:

1. Open `resume_tex_path`
2. Use `ai_context_path` + full resume facts
3. Replace tokens matching `*-BULLET-POINT-*`
4. Keep LaTeX structure unchanged

Hard guardrails (must pass):

1. Section-scoped grounding (no cross-section fact drift):
- `Project Experience` bullets: only from project facts in `full_resume.md` / `ai_context.md`.
- `Qishu Data ... Machine Learning Engineer Intern` bullets: only from internship facts.
- `University of Waterloo ... Researcher (...)` bullets: only from Waterloo research facts.
2. No fabricated claims: every bullet must be traceable to `ai_context.md`.
3. No duplicate bullets in one resume: exact duplicate `\\resumeItem{...}` text is forbidden.
4. If any guardrail fails for one resume, do not proceed that resume to Step 9.

Validation:

```bash
find data/applications -name "resume.tex" | xargs grep -n "BULLET-POINT"
```

Duplicate check:

```bash
for f in data/applications/*/resume/resume.tex; do
dups=$(rg -o '\\\\resumeItem\\{.*\\}' "$f" | sort | uniq -d)
if [ -n "$dups" ]; then
echo "Duplicate bullets in $f"
echo "$dups"
fi
done
```

**SUCCESS CRITERIA**:

- no `BULLET-POINT` matches
- duplicate check returns empty
- all edited bullets are section-consistent with `ai_context.md`

### Step 9: Compile PDFs

**MCP Tool**: `career_tailor` (second pass)

```python
result_compile = career_tailor(items=items, force=compile_force)
```

Use only `result_compile["successful_items"]` in the next step.

**SUCCESS CRITERIA**: `result_compile["success_count"] > 0`

### Step 10: Finalize Database and Trackers

**MCP Tool**: `finalize_resume_batch`

```python
result_finalize = finalize_resume_batch(
items=result_compile["successful_items"],
dry_run=False,
)
```

**SUCCESS CRITERIA**: `result_finalize["success_count"] > 0`

---

## Workflow Completion Checklist

- [ ] Step 1-5 intake completed
- [ ] Step 6-10 tailor/finalize completed
- [ ] All placeholders removed
- [ ] PDFs compiled successfully
- [ ] Finalization committed to DB + tracker
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ htmlcov/
.obsidian/workspace.json
.obsidian/cache

# Local editor settings
.vscode/

# macOS clutter
.DS_Store

Expand Down Expand Up @@ -66,4 +69,4 @@ data/templates/full_resume.md
data/templates/resume_skeleton.tex
trackers/*.md
!trackers/template.md
!trackers/Job Application.md
!trackers/job-application.md
114 changes: 114 additions & 0 deletions .kiro/specs/refactor-job-status-enum/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Design Document: Refactor Job Status to Enum

## Overview

This design outlines the migration from using hardcoded 'magic strings' for job statuses to centralized, type-safe Enums.

This refactoring will significantly improve code clarity, reduce the risk of typo-related bugs, and enhance maintainability by creating a single source of truth for status definitions.

## Scope

**In scope:**

* Refactoring all Python code (`.py` files) within the `mcp-server-python` directory to replace hardcoded status strings with Enum members.
* Defining two distinct Enums for the two status systems (database vs. tracker).
* Updating all associated tests to use the new Enums.

**Out of scope:**

* Changes to the database schema itself.
* Changes to the semantic meaning or lifecycle of existing statuses.
* Modifying any frontend or external client-side logic that consumes statuses. The API contract will continue to accept and return plain strings.

## Current State Summary

The codebase currently contains two separate and inconsistent systems for managing job statuses, both relying on hardcoded strings:

1. **Database Statuses:** A set of lowercase strings (`new`, `shortlist`, `reviewed`, etc.) used in the `jobs` database and related data access logic.
2. **Tracker Statuses:** A set of capitalized strings (`Reviewed`, `Resume Written`, etc.) used in the frontmatter of Markdown tracker files and the business logic that governs them.

This duplication and inconsistency is spread across numerous files, including `utils/validation.py`, `db/*.py`, `utils/tracker_policy.py`, and the entire `tests/` suite, making the code brittle and difficult to maintain.

## Target Architecture

### 1) Centralized Enum Definitions

A new file will be introduced to act as the single source of truth for all status definitions:

* **File:** `mcp-server-python/models/status.py`

This file will contain two distinct Enum classes:

```python
from enum import Enum

class JobDbStatus(str, Enum):
"""Enum for statuses used in the 'jobs' database table."""
NEW = "new"
SHORTLIST = "shortlist"
REVIEWED = "reviewed"
REJECT = "reject"
RESUME_WRITTEN = "resume_written"
APPLIED = "applied"

class JobTrackerStatus(str, Enum):
"""Enum for statuses used in the frontmatter of Markdown tracker files."""
REVIEWED = "Reviewed"
RESUME_WRITTEN = "Resume Written"
APPLIED = "Applied"
INTERVIEW = "Interview"
OFFER = "Offer"
REJECTED = "Rejected"
GHOSTED = "Ghosted"
```
Inheriting from `(str, Enum)` ensures that the Enums are compatible with string operations and can be easily serialized to JSON/string format at the API boundaries.

### 2) Refactoring Pattern

All application logic will be updated to import and use these Enums.

**Example (Before):**
```python
# in db/jobs_reader.py
def query_new_jobs(conn):
return conn.execute("SELECT * FROM jobs WHERE status = 'new'")
```

**Example (After):**
```python
# in db/jobs_reader.py
from models.status import JobDbStatus

def query_new_jobs(conn):
return conn.execute("SELECT * FROM jobs WHERE status = ?", (JobDbStatus.NEW,))
```

Pydantic models used at the API boundary will automatically handle the conversion between incoming strings and the internal Enum types, preserving the external contract.

## Migration Phases

### Phase 1: Foundation
- Create the `status.py` file and define the `JobDbStatus` and `JobTrackerStatus` Enums.

### Phase 2: Core Logic
- Refactor `utils/validation.py` to use the new Enums, removing the hardcoded `ALLOWED_STATUSES` lists.
- Refactor `utils/tracker_policy.py` and the database layer (`db/*.py`).

### Phase 3: Tools and API
- Refactor all scripts in the `tools/` directory.
- Update `server.py` and any related API documentation or examples.

### Phase 4: Tests
- Systematically update the entire `tests/` suite to use the Enums for test setup and assertions. This is the largest phase.

### Phase 5: Verification & Cleanup
- Perform a final, full-codebase search for any remaining hardcoded strings to ensure none were missed.
- Ensure all tests pass and the application functions correctly.

## Risks and Mitigations

1. **Risk:** A hardcoded status string is missed during refactoring.
* **Mitigation:** The final verification phase (Phase 5) involves using `grep` or a similar search tool to comprehensively scan for remaining instances. A passing test suite is the primary gate.

2. **Risk:** Confusion between `JobDbStatus` and `JobTrackerStatus` during development.
* **Mitigation:** The clear and explicit naming of the Enums is designed to prevent this. Code reviews should pay special attention to ensuring the correct Enum is used in the correct context.
Loading
Loading