One-click collection-intelligence setup#1982
Conversation
New corpora landed with unreadable document indexes (raw import metadata as descriptions, 0% summary coverage) because nothing composed the existing enrichment machinery at corpus setup: the reference-web CTA installed only the deterministic half, and the LLM action templates (descriptions, summaries) sat in the Action Library waiting to be manually added and batch-run. CorpusIntelligenceSetupService composes the default bundle in one idempotent call: installs the reference-enrichment add_document action and starts the first weave, clones the Document Description Updater + Document Summary Generator templates, and batch-runs each over every document already present. Exposed as the setupCorpusIntelligence mutation + corpusIntelligenceSetupStatus query; createCorpus now returns objId so the New Corpus modal's default-on opt-in can chain setup after creation. An IntelligenceSetupBanner inside IntelligencePanel offers setup on both the intelligence overview and the insight-panel CAML embed, and disappears once the bundle is installed. Live-proven on the dev stack: status flips not-set-up -> fully-set-up, the weave starts, and the description/summary agents drain over existing docs (the Fervo demo corpus index is now human-readable).
Code Review — PR #1982: One-click collection-intelligence setupOverall: Solid feature addition. The service architecture is clean, idempotency is handled correctly at every layer, and permission semantics follow project conventions. A few items worth addressing before merge. What it does (summary)
Issues to Address1. Backend
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…note - intelligence_setup.py: count active documents through CorpusDocumentService.get_corpus_documents (corpus-as-gate; the setup user holds UPDATE→READ) instead of reaching into the private Corpus._get_active_documents — same set, include_caml=False on both. - corpus_mutations.py: rate-limit SetupCorpusIntelligence with WRITE_HEAVY, matching StartCorpusActionBatchRun (it likewise fans out batch agent runs). - intelligence_setup.py: document the deliberate non-atomic in-flight check / analysis start (duplicate weave is recoverable; the writer is idempotent). createCorpus already emits obj_id via DRFMutation (base.py:197); no change needed. log_action accepts **extra, so the kwargs call is safe.
|
Code Review: One-click collection-intelligence setup Overview: This PR adds a well-structured orchestration layer composing existing enrichment machinery (reference-web analyzer, description/summary templates) into a single idempotent setupCorpusIntelligence call. No blockers found, but a few items worth addressing. Strengths
Issues to address before merge 1. Incomplete TypeScript output type (mutations.ts) SetupCorpusIntelligenceOutputs.summary is missing referenceActionAlreadyInstalled, which is required=True in the backend CorpusIntelligenceSetupSummaryType. The frontend does not consume it today but the type should mirror the schema. 2. Misleading toast when the batch-run cap is hit (IntelligenceSetupBanner.tsx) When batch_run_on_corpus hits BATCH_RUN_MAX_DOCS, outcome.queued_count stays 0 but ok=True. The toast is built from sum(t.queuedCount), so when the cap fires on all templates the user sees 'Collection intelligence is set up.' even though nothing was queued. Consider checking payload.summary.templates.some(t => t.error) and showing a toast.warn instead. Optional cleanup
Summary: Orchestration design, idempotency, permission gating, and service layer usage are all sound. Items 1 and 2 are worth fixing before merge. The rest is optional cleanup. |
- mutations.ts: add referenceActionAlreadyInstalled to SetupCorpusIntelligenceOutputs.summary (interface + query selection) to mirror the required backend field. - IntelligenceSetupBanner: when ok=True but nothing queued AND a template carried an error (e.g. BATCH_RUN_MAX_DOCS cap), show toast.warning instead of 'Collection intelligence is set up.' - corpus_actions.py: INTELLIGENCE_SETUP_TEMPLATE_NAMES list -> tuple (module constant is never mutated); test compares against list(...) accordingly.
Code Review: One-click collection-intelligence setupOverviewSolid orchestration layer that composes existing enrichment machinery into a single idempotent call. The design is clean — service layer owns the logic, GraphQL is a thin adapter, frontend stays simple. The idempotency story is well-reasoned and the IDOR-safe messaging convention is followed correctly throughout. Issues to Address1.
try:
with transaction.atomic():
action = template.clone_to_corpus(corpus, creator=user)
except IntegrityError:
...If Suggested fix — add a broader except after IntegrityError: except IntegrityError:
action = CorpusAction.objects.filter(corpus=corpus, source_template=template).first()
outcome.already_installed = action is not None
except Exception as exc:
outcome.error = f"Failed to install template: {exc}"
logger.exception("Intelligence setup: clone failed for %r on corpus %s", name, corpus.pk)
continue2.
Please confirm the existing Minor Issues3. CT test mock missing
summary: {
referenceAvailable: true,
referenceActionInstalledNow: true,
+ referenceActionAlreadyInstalled: false,
referenceAnalysisStarted: true,4.
What's Well Done
SummaryTwo items before merge: the unhandled-exception gap in |
Root cause of the codecov backend-patch miss: test_enrichment_backfill.py and test_enrichment_writer.py imported config.graphql.schema at MODULE level. Under --cov instrumentation that builds the graphene schema at collection time and errors (graphene-django CustomField field-resolution), so the whole file is dropped and its coverage — including the bootstrap_authority management command and the enrichment services — never reaches the upload. That is why a well-tested command showed 0% patch. Defer the schema import into the _execute helpers (the pattern already used by test_enrichment_tools.py / test_governance_graph.py), so the build happens at runtime and the files' coverage is always captured. Also close the genuinely-uncovered bootstrap_authority branches: unknown creator, unreadable spec file, and missing 'sections' list (now 100%), plus relink_corpora_for_keys per-corpus failure isolation and the empty-keys short-circuit.
…eature/corpus-intelligence-setup
Review fixes: - _setup_templates now contains non-IntegrityError clone failures per template (broad except + log + continue) instead of letting them abort the loop and return a 500 with earlier templates half-installed — honoring the bundle's graceful partial-success contract. - status(): documented the three-query cost (fine at one-per-page-load; revisit if ever polled or rendered per corpus-list row). - IntelligenceSetupBanner.ct.tsx setupMock now includes referenceActionAlreadyInstalled, matching the real SETUP_CORPUS_INTELLIGENCE selection set. Coverage (codecov/patch/Backend + /Frontend both below target): - Backend: cover intelligence_setup error/edge branches — analyzer not registered, in-flight analysis suppresses a duplicate start, failed analysis start, inactive template, and the new contained clone-failure path. - Frontend: cover the banner's error-toast, soft-warning (nothing queued + template error), clean set-up, and catch (network error) branches. Confirmed (no change needed): createCorpus.objId is exposed server-side via DRFMutation (config/graphql/base.py:197), so the post-create intelligence setup chain in Corpuses.tsx receives a real id.
|
test |
Problem
A freshly created corpus lands with an unreadable document index: titles all share the same truncated prefix and
Document.descriptionholds whatever the importer wrote (raw metadata dumps like**filing_type:** S-1 **doc_role:** exhibit…), and summary coverage sits at 0%. All the machinery to fix this already exists — the reference-web analyzer, and the Document Description Updater / Document Summary Generator action templates from the Action Library — but nothing composed it at corpus setup, and nothing batch-ran the LLM actions over documents already in the corpus.**filing_type:** S-1 **doc_role:** exhibit **exhibit_number:** 10.(2)(…What this adds
CorpusIntelligenceSetupService(opencontractserver/corpuses/services/intelligence_setup.py) — one idempotent composite:add_documentCorpusAction (same row the governance-graph CTA creates) and starts the first weave immediately.INTELLIGENCE_SETUP_TEMPLATE_NAMES) and batch-runs each over every existing document.Re-running converges: existing rows are reused, already-run documents are skipped, an in-flight weave isn't duplicated. Per-template failures (e.g. the
BATCH_RUN_MAX_DOCScap) surface in the summary without failing the whole call.GraphQL:
setupCorpusIntelligencemutation +corpusIntelligenceSetupStatusquery;createCorpusnow returnsobjIdso follow-ups can chain off creation.Frontend (both entry points):
IntelligenceSetupBannermounted insideIntelligencePanel— appears on the intelligence overview and theinsight-panelCAML embed, offers "Set up", reports the queued fan-out in a toast, and hides once the bundle is installed.Verification
test_intelligence_setup.py: service install/idempotence/permission-gating/status + GraphQL smoke (7 tests).IntelligenceSetupBanner.ct.tsx: offer → run → hide, and silent-when-set-up (2 CT tests, doc screenshot).Stacked on #1977.