You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been running ApplyPilot as my primary job search tool for about about two days (Senior/Staff Backend/Platform Engineer, Seattle/Remote). This is a detailed field report covering setup experience, pipeline results, bugs I encountered and fixed, and suggestions for improvement.
Note: this report I generated, and contains references to features I build which are not integrated into ApplyPilot, I will need time to clean my fork before publishing since it has substantial changes, they are detailed here in this report as well. I'm also building the next pipeline which is a job tracker, markdown + cli based that auto enriches with information.
And also note for whomever is reading this, if you want to interview or hire me, you can reach me out via LinkedIn: https://www.linkedin.com/in/elninja/
TL;DR: The concept is excellent and the architecture is sound. I got 112 successful applications out of 1,503 discovered jobs (7.5%
end-to-end conversion). The Tier 1→2 pipeline works great after some fixes. Tier 3 (auto-apply) is where most friction lives — Workday login
walls block ~70% of apply attempts.
Pipeline Results
Funnel
Stage
Count
Drop-off
Discovered
1,503
—
Enriched
1,420
5.5% lost (detail errors)
Scored
1,420
0% (all enriched jobs scored)
Score 7+
543
62% filtered (working as intended)
Tailored
543
0% (100% of 7+ tailored)
Cover Letter
543
0% (100% covered)
Applied
112
79% blocked at apply stage
Score Distribution
Score
Count
10
98
9
249
8
106
7
90
6
148
5
185
4
157
3
171
2
70
1
146
Scoring skews high — 36% of jobs scored 7+. For my profile (10+ years, Go/Kotlin/Python/K8s/AWS), this seems reasonable but I wonder if the
scoring prompt could be tighter.
Sources
Source
Jobs
LinkedIn
601
Indeed
235
Thomson Reuters (Workday)
174
Netflix (Workday)
103
Dice
81
SimplyHired
77
Moderna (Workday)
46
Motorola (Workday)
38
NVIDIA (Workday)
34
Talent.com
26
HN "Who is Hiring"
~30
Others
~58
Apply Results
112 successful applications to companies including Netflix (7), Airbnb, HubSpot, Mastercard, NVIDIA, Grafana Labs, Twilio, Rippling, and others.
Apply errors (589 total):
Error
Count
% of Errors
workday_login_required
470
80%
not_eligible_location
40
7%
expired
26
4%
email_verification
10
2%
stuck / captcha / account_required
18
3%
Other
25
4%
The Workday login wall is the #1 blocker. 470 out of 589 apply errors (80%) are because Workday requires an authenticated session that the agent
can't handle.
Artifacts Generated
2,019 tailored resume files (txt + pdf pairs)
1,083 cover letter files (txt + pdf pairs)
336 apply agent log files
Time Estimates Per Stage
These are rough estimates from running the pipeline across multiple sessions:
Stage
Time
Notes
Discovery
~5-10 min
JobSpy + Workday scraping in parallel. Workday is the bottleneck.
Enrichment
~15-20 min
1,420 jobs, mostly fast. Some sites need Playwright fallback.
Scoring
~20-30 min
Gemini Flash handles this well. Rate limits add some wait time.
Tailoring
~45-60 min
Quality model (Gemini Pro), validation loop adds retries. Most time-intensive Tier 2 stage.
Cover Letters
~20-30 min
Similar to scoring in complexity.
Auto-Apply
~4-6 hours total
2 workers, ~2-5 min per job. Chrome startup + form navigation is slow.
Total
~6-8 hours
Spread across multiple sessions over ~1 week
The --stream flag for running score + tailor + cover concurrently is a huge time saver.
Gemini 2.5+ models use "thinking tokens" that consume the max_tokens budget. The default 2048 was far too low — a simple scoring response needs
~30 visible tokens but the model burns 1200+ on thinking. I had to increase to:
Scoring: 8,192
Tailoring validation: 4,096, generation: 16,384
Cover letters: 8,192
2. LLM Client Singleton / Stale Environment (Related to #9)
llm.py reads API keys at module import time. If config.load_env() isn't called before importing llm, the client has no keys. I restructured
the import order to ensure env loading happens first.
3. Model Fallback Chain Needed Updating
The original model list included deprecated Gemini models. I rebuilt the cascade:
The 429 rate-limit handling (mark model exhausted for 5 min, fall to next) works great once the chain is populated.
4. Docker MCP Toolkit Interference
If Docker Desktop with MCP Toolkit is installed, it exposes mcp__MCP_DOCKER__browser_* tools that shadow the local Playwright MCP server. These
Docker-based tools can't access host files, breaking resume/cover letter uploads. Fix: pass --strict-mcp-config to the Claude subprocess in launcher.py.
5. URL Normalization
Many Workday scraped URLs were relative (e.g., /en/sites/CX/job/12345). These broke enrichment. I added URL normalization at insert time using
base URLs from sites.yaml.
6. Company Extraction
Jobs from aggregators (Indeed, LinkedIn) had no company field, making it hard to spread applications across employers. I added company
extraction from application_url domains (patterns for Workday, Greenhouse, Lever, iCIMS, Ashby).
7. ANTHROPIC_API_KEY Leaking to Subprocess
When the apply launcher spawns claude subprocesses, if ANTHROPIC_API_KEY is in the environment, it overrides Max plan auth and bills to the
API key instead. I added explicit env stripping in launcher.py.
8. Fabrication in Cover Letters
One cover letter for SeatGeek fabricated a company name from my resume ("Underground Elephant" was a real company I worked at, but the LLM used
it in the wrong context). The resume_facts system helps but isn't bulletproof.
The fabrication watchlist used substring matching — "rust" matched "TrustSec", "dedicated" matched a legitimate resume phrase. I changed banned
words to warnings rather than hard errors, letting the LLM judge handle tone.
10. Chrome Extension Path Resolution
The apply agent loads uBlock Origin and 1Password from the user's Chrome profile. Extension paths include version directories that change on
updates. I added dynamic resolution that picks the latest version directory and silently skips uninstalled extensions.
What We Built On Top
Beyond bug fixes, here are features I added to my fork:
Hacker News "Who is Hiring" scraper — Parses monthly threads, deobfuscates emails, creates synthetic URLs for contact-only posts
HTML Dashboard (applypilot dashboard) — Rich dashboard with Active/Archive/Applied tabs, fit score badges, company grouping, one-click
links to applications
Company-aware apply prioritization — ROW_NUMBER() PARTITION BY company in the job acquisition query spreads applications across
employers instead of applying to 10 Netflix jobs in a row
Two-tier model strategy — Flash models for speed-critical tasks, Pro models for quality writing
Mark Workday jobs as "manual apply" and generate a manual actions list
Model config should be externalized — Hardcoded model lists break when Google deprecates models. A models.yaml config would let users
update without code changes.
max_tokens should scale with task — Default 2048 is too low for thinking models. The project should detect thinking model capabilities and
auto-adjust, or at minimum document this prominently.
Apply error categorization — Currently errors are free-text strings. A structured error taxonomy would enable better retry logic
(permanent vs transient errors) and reporting.
Company field should be first-class — Add company extraction at discovery/enrichment time, not just from application_url. This enables
better deduplication and employer diversity.
Dashboard should be a long-running web server — The current applypilot dashboard generates a static HTML file. A live dashboard with
auto-refresh would be much more useful during active pipeline runs.
Nice to Have
Job deduplication across sources — Same job appears on LinkedIn, Indeed, and the company's Workday portal. Fuzzy matching on title +
company could reduce noise.
Apply success verification — After submitting, check for confirmation emails or "application received" pages to verify success beyond the
agent's self-reported confidence.
Metrics/analytics — Track conversion rates over time, cost per application, which sources yield the best fit scores, etc.
Config validation (applypilot doctor) — v0.3.0 added this, which is great. Expanding it to validate API key quotas, model availability,
and browser setup would help a lot with onboarding.
Setup Notes for Other Users
Things I wish I knew before starting:
Call config.load_env() before importing llm — The LLM client reads API keys at import time. Get the order wrong and you get silent
failures.
Set high max_tokens — If you're using Gemini 2.5+, thinking tokens eat your budget. 2048 is not enough.
pip install -e . — Editable install means source edits take effect immediately. Great for iterating.
Docker MCP Toolkit — If you have Docker Desktop, disable MCP Toolkit or use --strict-mcp-config for apply.
Workday jobs are the majority — Many "LinkedIn" and "Indeed" jobs link to Workday portals. Expect login walls.
The Gemini free tier works — But you'll hit 429s frequently. The fallback chain handles it, just takes longer.
Summary
ApplyPilot is an impressive project that delivers on its core promise — I went from zero to 112 real job applications in about a week with
minimal manual intervention. The Tier 1→2 pipeline (discover → enrich → score → tailor → cover) is solid. The Tier 3 auto-apply works but is
bottlenecked by Workday login requirements.
The architecture is well-designed and extensible. I was able to add significant features (HN scraper, dashboard, company-aware prioritization,
two-tier models, Chrome extensions) without fighting the codebase. The three-tier separation of concerns is clean.
Thank you @Pickle-Pixel for open-sourcing this! Happy to contribute any of my fixes/features back upstream if there's interest. Just note it might take me a while since I performed a lot of changes, I was using this pipeline at the same time I was building and need to clean it up as well as sanitize it from my information.
Field Report: Running ApplyPilot End-to-End
I've been running ApplyPilot as my primary job search tool for about about two days (Senior/Staff Backend/Platform Engineer, Seattle/Remote). This is a detailed field report covering setup experience, pipeline results, bugs I encountered and fixed, and suggestions for improvement.
Note: this report I generated, and contains references to features I build which are not integrated into ApplyPilot, I will need time to clean my fork before publishing since it has substantial changes, they are detailed here in this report as well. I'm also building the next pipeline which is a job tracker, markdown + cli based that auto enriches with information.
And also note for whomever is reading this, if you want to interview or hire me, you can reach me out via LinkedIn: https://www.linkedin.com/in/elninja/
TL;DR: The concept is excellent and the architecture is sound. I got 112 successful applications out of 1,503 discovered jobs (7.5%
end-to-end conversion). The Tier 1→2 pipeline works great after some fixes. Tier 3 (auto-apply) is where most friction lives — Workday login
walls block ~70% of apply attempts.
Pipeline Results
Funnel
Score Distribution
Scoring skews high — 36% of jobs scored 7+. For my profile (10+ years, Go/Kotlin/Python/K8s/AWS), this seems reasonable but I wonder if the
scoring prompt could be tighter.
Sources
Apply Results
112 successful applications to companies including Netflix (7), Airbnb, HubSpot, Mastercard, NVIDIA, Grafana Labs, Twilio, Rippling, and others.
Apply errors (589 total):
workday_login_requirednot_eligible_locationexpiredemail_verificationstuck/captcha/account_requiredThe Workday login wall is the #1 blocker. 470 out of 589 apply errors (80%) are because Workday requires an authenticated session that the agent
can't handle.
Artifacts Generated
Time Estimates Per Stage
These are rough estimates from running the pipeline across multiple sessions:
The
--streamflag for runningscore + tailor + coverconcurrently is a huge time saver.Bugs I Encountered and Fixed
1. Gemini Thinking Token Budget (Related to #12)
Gemini 2.5+ models use "thinking tokens" that consume the
max_tokensbudget. The default 2048 was far too low — a simple scoring response needs~30 visible tokens but the model burns 1200+ on thinking. I had to increase to:
2. LLM Client Singleton / Stale Environment (Related to #9)
llm.pyreads API keys at module import time. Ifconfig.load_env()isn't called before importingllm, the client has no keys. I restructuredthe import order to ensure env loading happens first.
3. Model Fallback Chain Needed Updating
The original model list included deprecated Gemini models. I rebuilt the cascade:
gemini-2.5-flash → gemini-3-flash → gemini-2-flash → gemini-2-flash-lite → gpt-4.1-nano → gpt-4.1-mini → claude-haiku-4-5gemini-3.1-pro-preview → gemini-2.5-pro → gemini-3-pro → gemini-2.5-flash → gpt-4.1-mini → gpt-4.1-nano → claude-sonnet-4-5 → claude-haiku-4-5The 429 rate-limit handling (mark model exhausted for 5 min, fall to next) works great once the chain is populated.
4. Docker MCP Toolkit Interference
If Docker Desktop with MCP Toolkit is installed, it exposes
mcp__MCP_DOCKER__browser_*tools that shadow the local Playwright MCP server. TheseDocker-based tools can't access host files, breaking resume/cover letter uploads. Fix: pass
--strict-mcp-configto the Claude subprocess inlauncher.py.5. URL Normalization
Many Workday scraped URLs were relative (e.g.,
/en/sites/CX/job/12345). These broke enrichment. I added URL normalization at insert time usingbase URLs from
sites.yaml.6. Company Extraction
Jobs from aggregators (Indeed, LinkedIn) had no
companyfield, making it hard to spread applications across employers. I added companyextraction from
application_urldomains (patterns for Workday, Greenhouse, Lever, iCIMS, Ashby).7. ANTHROPIC_API_KEY Leaking to Subprocess
When the apply launcher spawns
claudesubprocesses, ifANTHROPIC_API_KEYis in the environment, it overrides Max plan auth and bills to theAPI key instead. I added explicit env stripping in
launcher.py.8. Fabrication in Cover Letters
One cover letter for SeatGeek fabricated a company name from my resume ("Underground Elephant" was a real company I worked at, but the LLM used
it in the wrong context). The
resume_factssystem helps but isn't bulletproof.9. Banned Word False Positives (Related to #10)
The fabrication watchlist used substring matching — "rust" matched "TrustSec", "dedicated" matched a legitimate resume phrase. I changed banned
words to warnings rather than hard errors, letting the LLM judge handle tone.
10. Chrome Extension Path Resolution
The apply agent loads uBlock Origin and 1Password from the user's Chrome profile. Extension paths include version directories that change on
updates. I added dynamic resolution that picks the latest version directory and silently skips uninstalled extensions.
What We Built On Top
Beyond bug fixes, here are features I added to my fork:
applypilot dashboard) — Rich dashboard with Active/Archive/Applied tabs, fit score badges, company grouping, one-clicklinks to applications
ROW_NUMBER() PARTITION BY companyin the job acquisition query spreads applications acrossemployers instead of applying to 10 Netflix jobs in a row
applypilot run score tailor cover --streamruns stages concurrentlyemployers.yamlwith 48 Workday employer portals for direct scrapingSuggestions for Improvement
High Priority
Workday auth strategy — 80% of apply failures are
workday_login_required. Options:Model config should be externalized — Hardcoded model lists break when Google deprecates models. A
models.yamlconfig would let usersupdate without code changes.
max_tokens should scale with task — Default 2048 is too low for thinking models. The project should detect thinking model capabilities and
auto-adjust, or at minimum document this prominently.
Apply error categorization — Currently errors are free-text strings. A structured error taxonomy would enable better retry logic
(permanent vs transient errors) and reporting.
Medium Priority
Resume validation strictness should default to
normalorlenient— The strict mode causes excessive retries (related to [EXHAUSTED_RETRIES] attempts=4 for tailored resume #4, Must fit 1 page" hard rule conflicts with preserved_companies requirement causing EXHAUSTED_RETRIES #14). Most"failures" are false positives from substring matching.
Duplicate PDF filename collisions (related to Duplicate job titles cause tailored resume and cover letter PDFs to overwrite each other #11, Duplicate job titles cause tailored resume and cover letter PDFs to overwrite each other; batch size limit compounds the problem #17) — Jobs with identical titles overwrite each other's PDFs. Hash the URL into the
filename.
Company field should be first-class — Add company extraction at discovery/enrichment time, not just from
application_url. This enablesbetter deduplication and employer diversity.
Dashboard should be a long-running web server — The current
applypilot dashboardgenerates a static HTML file. A live dashboard withauto-refresh would be much more useful during active pipeline runs.
Nice to Have
Job deduplication across sources — Same job appears on LinkedIn, Indeed, and the company's Workday portal. Fuzzy matching on title +
company could reduce noise.
Apply success verification — After submitting, check for confirmation emails or "application received" pages to verify success beyond the
agent's self-reported confidence.
Metrics/analytics — Track conversion rates over time, cost per application, which sources yield the best fit scores, etc.
Config validation (
applypilot doctor) — v0.3.0 added this, which is great. Expanding it to validate API key quotas, model availability,and browser setup would help a lot with onboarding.
Setup Notes for Other Users
Things I wish I knew before starting:
config.load_env()before importingllm— The LLM client reads API keys at import time. Get the order wrong and you get silentfailures.
max_tokens— If you're using Gemini 2.5+, thinking tokens eat your budget. 2048 is not enough.pip install -e .— Editable install means source edits take effect immediately. Great for iterating.--strict-mcp-configfor apply.Summary
ApplyPilot is an impressive project that delivers on its core promise — I went from zero to 112 real job applications in about a week with
minimal manual intervention. The Tier 1→2 pipeline (discover → enrich → score → tailor → cover) is solid. The Tier 3 auto-apply works but is
bottlenecked by Workday login requirements.
The architecture is well-designed and extensible. I was able to add significant features (HN scraper, dashboard, company-aware prioritization,
two-tier models, Chrome extensions) without fighting the codebase. The three-tier separation of concerns is clean.
Thank you @Pickle-Pixel for open-sourcing this! Happy to contribute any of my fixes/features back upstream if there's interest. Just note it might take me a while since I performed a lot of changes, I was using this pipeline at the same time I was building and need to clean it up as well as sanitize it from my information.