fix: restore compatibility with Perplexity API changes (2026|06)#39
fix: restore compatibility with Perplexity API changes (2026|06)#39DaveG7 wants to merge 7 commits into
Conversation
|
Hello @DaveG7, thanks for your PR. What I like: What I don't like:
|
…rect fetch Address PR simwai#39 review feedback on conversation-extractor: - Restore ApiResponseSchema, validated against a live 2026 /rest/thread/{id} response. Pagination is the top-level has_next_page/next_cursor pair (not collection_info, which is the list endpoint). Diagnose-and-continue: shape drift writes a diagnostic and falls through to the per-entry EntrySchema gate. - Restore ApiDiagnosticsWriter calls (zod_error / unknown_shape / empty_entries) so the debug/api-diagnostics.jsonl path the REPL references works again. - Keep the page.evaluate()+fetch approach for consistency with library-discovery (the response-listener was the lone divergent /rest/ path); replace hardcoded version=2.18 with shared DEFAULT_API_VERSION. - Remove dead adaptive-timeout no-ops (reduceTimeout/recoverTimeout) and their now-unused worker-pool callers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A live thread with 212 entries paginates across 3 pages of ~99; the new single-fetch path only returned page 1, truncating long conversations. fetchThreadData now keeps the single-fetch fast path for normal threads and, when has_next_page is true, follows the top-level next_cursor (same URL + &cursor=<encoded>) accumulating entries in API order until the thread is complete, capped at 50 pages. Split the per-page fetch+validate into fetchThreadPage. This restores the long-thread coverage the old response listener provided. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Hey @simwai, thanks for the careful review — you were right to push back on the parts I'd stripped out. Went back through all of it. Sanitized request/response you asked for: {
"entries": [
{
"uuid": "<uuid>",
"status": "COMPLETED",
"thread_title": "i am a simple test thread",
"query_str": "i am a simple test thread",
"updated_datetime": "2026-06-04T20:01:55.688657",
"blocks": [
{ "intended_usage": "ask_text_0_markdown",
"markdown_block": { "answer": "Understood — this is a test thread." } },
{ "intended_usage": "ask_text",
"markdown_block": { "answer": "Understood — this is a test thread." } }
// plan / workflow_root / answer_tabs / pending_followups blocks elided
]
// classifier_results / mhe_predictions / social_info etc. elided (not used)
}
],
"background_entries": [],
"has_next_page": false,
"next_cursor": null,
"status": "success",
"thread_metadata": { "title": "i am a simple test thread", "...": "..." }
}Two things it cleared up: pagination is a top-level On your three points: 1 — Validator restored. Dropping it was the wrong move. 2 — Diagnostics restored. 3 — On pagination — I dug into this properly. Short threads come back in one response ( Thanks again — happy to adjust any of it. Cheers Dave |
|
Thank you for the explanation. However, consistency with library-discovery.ts is not a strong enough reason to replace the working network listener. The listener handled pagination reliably and had no timing issues. The page.evaluate fetch approach introduces undeniable timing downsides that the listener did not have. I would prefer to keep the original listener and only update the Zod schema and pagination detection as needed. |
|
I also would like to know if the HTTP request and response you have sent is real or completely from AI? |
|
Ciao @simwai Sorry for the delay, family and so... ;-) In reality, we could switch to our native language, but easy, let's stick to the international language. I fully understand your skepticism. But I am not here to "Dich bespassen" and waste your time; time is what most people lack, me included. I am seriously contributing in the ways I can, and to shortly answer your question: My first PR was rushed; I had no time to dig into your code base. I was searching for a fast solution to extract my data from Perplexity, which I have been using for approximately 1 year. Wanted to build a small pipeline in a Docker container but had some problems getting it to run, as you need the Headless=false option. When I tested your version locally (before #39 ), it wasn't working for me. At that time I wasn't aware of the possible debugging output, which could have helped even better. I leaned on AI assistance to get something working quickly, which stripped some of your original logic—I apologize for that. Second PR should be better, and so far it's working fine for me. It's your repo, you choose what you can use and what not. No offense taken; on the contrary, I will dig into your mentioned timing concerns, as this is precisely the thing I still do not have the needed experience to directly see that stuff. You're an engineer, I am not. I have taken another main path (natural science), but IT and Tech is my second "Stammbein", it's passion and interest since childhood, something I try to add into my primary education path. I wish you a pleasant weekend and I am happy to hear from you. Cheers Dave |
|
@DaveG7 I understand that the last versions were buggy. I did not enough testing myself before merging the output of the agent. I will investigate today a little bit more the requests and responses. I am sure we can get this done somehow. Btw it is helpful when you say on which url you spotted the request/response. |
Description
Perplexity changed their REST API response format, breaking both thread discovery and conversation extraction.
Library discovery (library-discovery.ts):
and mode_type
threads
Conversation extraction (conversation-extractor.ts):
reliable
Related Issues
Fixes #
Checklist
Diagnostic Logs
N/A — The conversation extractor no longer uses debug/api-diagnostics.jsonl — the response-listener that wrote to it was replaced by a direct API call, making per-response diagnostics unnecessary.
Cheers, Dave