diff --git a/README.md b/README.md index b31be26ac..6598b122d 100644 --- a/README.md +++ b/README.md @@ -29,14 +29,45 @@ findings = await agent.structured_response( [![Sponsor](https://img.shields.io/static/v1?label=Sponsor&message=%E2%9D%A4&logo=GitHub&color=%23fe8e86)](https://github.com/sponsors/JSv4) | | | -| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Backend coverage | [![backend](https://codecov.io/gh/Open-Source-Legal/OpenContracts/branch/main/graph/badge.svg?flag=backend&token=RdVsiuaTVz)](https://app.codecov.io/gh/Open-Source-Legal/OpenContracts?flags%5B0%5D=backend) | | Frontend coverage | [![frontend](https://codecov.io/gh/Open-Source-Legal/OpenContracts/branch/main/graph/badge.svg?flag=frontend&token=RdVsiuaTVz)](https://app.codecov.io/gh/Open-Source-Legal/OpenContracts?flags%5B0%5D=frontend) | | Meta | [![code style - black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![types - Mypy](https://img.shields.io/badge/types-Mypy-blue.svg)](https://github.com/python/mypy) [![imports - isort](https://img.shields.io/badge/imports-isort-ef8336.svg)](https://github.com/pycqa/isort) [![License - MIT](https://img.shields.io/badge/license-MIT-green)](https://opensource.org/licenses/MIT) | --- -![Discovery Landing Page](docs/assets/images/screenshots/auto/landing--discovery-page--anonymous.png) +## From documents to a citation graph — in about a minute + +Create a corpus, drop in your documents, and click **Set up**. That one click installs the +intelligence bundle: agents describe and summarize every document, and the reference web +starts weaving — every statutory citation detected, resolved, and drawn as an edge. + +![Create a corpus and set up collection intelligence in one click](docs/assets/images/gifs/demo-1-create-and-setup.gif) + +By the end of the clip, 36 SEC filings are a navigable graph — wired to the Delaware +General Corporation Law, the Securities Act, and the SEC rules they cite, section by +section. Law the library doesn't hold yet isn't dropped on the floor: it's tracked as a +backlog, automatically, until you ingest it. + +### Then explore it — and ask it questions + +Citations are highlighted inline on the filings themselves. The References panel lists +everything a document cites — click any cite to open the statute, with its own +cross-references and everything that cites it back. The ask bar runs a corpus-scoped +agent whose answers come back grounded and cited. + +![Explore the citation graph — inline citations, the references panel, and grounded answers](docs/assets/images/gifs/demo-2-explore-and-ask.gif) + +Everything in both clips is the stock product against a local install — no custom code, +and every surface the UI touches is also reachable over the API and MCP server below. + +Here's the artifact those clips produce, frozen so you can read it — every filing wired to +the exact section of law it cites, with bodies of law the library doesn't hold yet drawn as +dashed nodes, tracked until you ingest them: + +![The governance graph — filings linked to the statute sections they cite, down to the section, with un-ingested law tracked as dashed nodes](docs/assets/images/screenshots/auto/corpus--governance-graph--with-data.png) + +--- ## Build on it @@ -155,7 +186,7 @@ The engine — annotation, corpus management, AI agents, MCP server, vector sear This is not another chat-with-your-PDFs tool. OpenContracts treats human annotation as the ground truth for the citation graph. Teams define custom label schemas, annotate documents with precise selections (including multi-page spans), and map relationships between concepts. AI builds on top of that work — it doesn't replace it. -![Document Annotator](docs/assets/images/screenshots/auto/readme--document-annotator--with-pdf.png) +![Precise, layout-faithful annotations on a PDF — colored label spans, multi-page sections, and the annotation sidebar](docs/assets/images/screenshots/auto/annotations--pdf-canvas--with-labels.png) ### Corpuses, Not File Cabinets @@ -163,7 +194,7 @@ Documents are organized into corpuses — version-controlled collections with fo This is `git` for the citation graph: branch, build, share, never lose work. -![Corpus Home](docs/assets/images/screenshots/auto/readme--corpus-home--with-chat.png) +![Collection intelligence overview — document, connection, annotation, and extract counts, summary coverage, dominant labels, and the governance graph](docs/assets/images/screenshots/auto/corpus--intelligence-overview--with-data.png) ### AI Agents That Work With What You've Built @@ -171,7 +202,7 @@ Configurable AI agents can search your documents, query your annotations, and pa @mention an agent in a discussion thread. Ask it to compare clauses across a hundred contracts. Let it surface patterns your team annotated last quarter. The agent's power comes from the quality of the citation graph underneath it. -![AI Agent Response](docs/assets/images/screenshots/auto/threads--agent-message--response.png) +![An agent grounding its answer in tool calls — similarity search, exact-text search, and document lookups over the corpus](docs/assets/images/screenshots/auto/chat--tool-popover--multi-tool.png) ### Collaboration Where the Citations Live @@ -189,7 +220,9 @@ This is the DRY principle applied to the citation graph: annotate once, build on --- -## See it in Action +## Annotation flows + +The human side of the graph — precise, layout-faithful annotation on PDFs and text: ### PDF Annotation Flow @@ -240,10 +273,10 @@ docker compose -f production.yml up -d The discover/landing page and the `/about` page are driven by a JSON content pack so deployers can retarget the messaging without forking the codebase. Two variants ship in the repo: -| Variant key | Framing | Best fit | -| --------------- | ------------------------------------------------------ | ------------------------------------------------------------------------------- | -| `default` | _Open-source document intelligence you can build on._ | The OSS project's repo and most self-hosted deployments — developer-facing. | -| `public-record` | _The citation layer underneath the public record._ | End-user deployments curating public-domain documents (named-incumbents pitch). | +| Variant key | Framing | Best fit | +| --------------- | ----------------------------------------------------- | ------------------------------------------------------------------------------- | +| `default` | _Open-source document intelligence you can build on._ | The OSS project's repo and most self-hosted deployments — developer-facing. | +| `public-record` | _The citation layer underneath the public record._ | End-user deployments curating public-domain documents (named-incumbents pitch). | Switch variants at runtime by setting `REACT_APP_LANDING_VARIANT` in `frontend/public/env-config.js` — no rebuild required. Unknown variant keys fall back to `default`. diff --git a/changelog.d/1982-review-fixes.fixed.md b/changelog.d/1982-review-fixes.fixed.md new file mode 100644 index 000000000..74f291c35 --- /dev/null +++ b/changelog.d/1982-review-fixes.fixed.md @@ -0,0 +1,60 @@ +- **Intelligence setup: large corpora no longer silently skip enrichment.** + `CorpusActionService` gained `batch_run_action(user, action, allow_partial=)` + (`opencontractserver/corpuses/services/corpus_actions.py`) — the trusted-caller + variant the one-click setup now uses with `allow_partial=True`, queuing the + first `BATCH_RUN_MAX_DOCS` documents (deterministic id order) instead of + refusing outright when a corpus exceeds the per-call cap. The per-template + outcome (`TemplateSetupOutcome.remaining_count`, exposed as `remainingCount` + on `IntelligenceTemplateOutcomeType`) reports the deferred remainder and the + banner toast surfaces it. Previously a 250-doc corpus got a success toast, a + permanently hidden banner, and zero documents enriched. +- **Intelligence-setup status no longer demands deployment-unavailable pieces.** + `IntelligenceSetupStatus.is_fully_set_up` + (`opencontractserver/corpuses/services/intelligence_setup.py`) excludes the + reference action when no enrichment analyzer is registered + (`reference_available`, new on the status payload) and excludes bundle + templates that are unseeded/inactive deployment-wide — either condition + previously made the setup CTA an undismissable zombie whose every click + toasted success. +- **Setup CTA hidden from viewers who can't run it.** The status payload gained + `can_setup` (mirrors the mutation's permission gate); + `IntelligenceSetupBanner.tsx` renders nothing unless `canSetup` — read-only + and anonymous viewers of a public not-set-up corpus previously saw a + guaranteed-to-fail "Set up" button. +- **Permission tier harmonized to CRUD.** `setupCorpusIntelligence` (service + + mutation docstrings, `config/graphql/corpus_mutations.py`) now requires CRUD + on the corpus — the tier `AddTemplateToCorpus` and `CreateCorpusAction` + already gate the identical writes at; it previously required only UPDATE, a + weaker path to the same row installs. +- **Reference action can no longer be double-installed.** The governance + graph's "Map the reference web" bootstrap + (`GovernanceGraphLive.tsx`) consults `corpusIntelligenceSetupStatus` and + skips `createCorpusAction` when the add_document reference action already + exists (a duplicate row would run the enrichment analyzer twice on every + future upload); the server side switched to `get_or_create` to narrow the + concurrent-race window. +- **Post-create setup opt-in surfaces soft failures.** `Corpuses.tsx` now + inspects the resolved `setupCorpusIntelligence.ok` and shows the + "couldn't start" toast — an `ok=false` envelope was previously discarded, + leaving users to believe enrichment was running. +- **Setup warning toast names the actual failures.** The banner aggregates + `templates[].error` into the warning instead of a generic guess. +- **In-flight weave reported as started.** When an enrichment analysis is + already QUEUED/RUNNING, `CorpusIntelligenceSetupService.setup` + (`opencontractserver/corpuses/services/intelligence_setup.py`) no longer leaves + `reference_analysis_started=False` — the reference web *is* being built, so the + summary (and the banner toast's "reference web weaving" note) now reflects it + instead of silently omitting it. +- **`setup()` permission lookup collapsed to one IDOR-safe call.** The READ + `get_or_none` + separate `require_permission(CRUD)` pair became a single + `get_or_none(Corpus, …, PermissionTypes.CRUD)` (the canonical pattern + `status()` already uses) — no behavior change, but no longer a divergent + double-check. +- **Dedup/cleanup.** Template installs go through a single shared + `CorpusActionService.install_template` (dedupe fast-path, savepoint clone, + IntegrityError recovery, CRUD grant) used by both `AddTemplateToCorpus` and + the bundle; the enrichment analyzer lookup goes through the new lookup-only + `EnrichmentService.get_analyzer()` next to the converge logic; setup + prefetches bundle templates with `name__in` and derives + `total_active_documents` from the batch summary instead of a redundant + corpus-document count. diff --git a/changelog.d/1982-setup-templates-partial-success.fixed.md b/changelog.d/1982-setup-templates-partial-success.fixed.md new file mode 100644 index 000000000..b9978f9f4 --- /dev/null +++ b/changelog.d/1982-setup-templates-partial-success.fixed.md @@ -0,0 +1 @@ +- `CorpusIntelligenceSetupService._setup_templates` (`opencontractserver/corpuses/services/intelligence_setup.py`) now contains non-`IntegrityError` clone failures (e.g. `OperationalError`, `ValueError`) per template instead of letting them propagate out of the loop. Previously such a failure aborted the remaining templates and returned a 500 with earlier templates left half-installed; the bundle's graceful partial-success contract is now honored — the failing template records its error and the sweep continues. diff --git a/changelog.d/corpus-intelligence-setup.added.md b/changelog.d/corpus-intelligence-setup.added.md new file mode 100644 index 000000000..36329c50e --- /dev/null +++ b/changelog.d/corpus-intelligence-setup.added.md @@ -0,0 +1,31 @@ +- **One-click collection-intelligence setup** — the orchestration layer the + enrichment pieces were missing: nothing previously composed the + deterministic reference web with the LLM document enrichment at corpus + setup, so new corpora landed with unreadable document indexes (raw import + metadata as descriptions, 0% summary coverage) until each action was + manually added from the Action Library and batch-run. + - `CorpusIntelligenceSetupService` + (`opencontractserver/corpuses/services/intelligence_setup.py`): + idempotent composite that installs the reference-enrichment + `add_document` action + starts the first weave, clones the + *Document Description Updater* and *Document Summary Generator* + templates (bundle pinned in + `opencontractserver/constants/corpus_actions.py` + `INTELLIGENCE_SETUP_TEMPLATE_NAMES`), and batch-runs each over every + document already in the corpus. Re-running converges: existing action + rows are reused, already-run documents are skipped, an in-flight + reference analysis is not duplicated. + - GraphQL: `setupCorpusIntelligence` mutation + + `corpusIntelligenceSetupStatus` query + (`config/graphql/corpus_mutations.py`, `corpus_queries.py`, + `corpus_types.py`); `createCorpus` now returns `objId` so follow-up + mutations can chain off creation. + - Frontend: `IntelligenceSetupBanner` + (`frontend/src/components/corpuses/CorpusHome/intelligence/`) renders a + setup CTA inside `IntelligencePanel` (so both the intelligence overview + and the `insight-panel` CAML embed surface it) and hides once the bundle + is installed; the New Corpus modal gains a default-on "Set up collection + intelligence" opt-in that chains the mutation after creation + (`CorpusModal.tsx`, `views/Corpuses.tsx`). + - Tests: `opencontractserver/tests/test_intelligence_setup.py` (service + + GraphQL), `frontend/tests/IntelligenceSetupBanner.ct.tsx`. diff --git a/changelog.d/graphql-spec-validation.fixed.md b/changelog.d/graphql-spec-validation.fixed.md new file mode 100644 index 000000000..f8588dbb1 --- /dev/null +++ b/changelog.d/graphql-spec-validation.fixed.md @@ -0,0 +1,48 @@ +- **GraphQL spec validation restored on the served endpoint (security).** + ``GraphQLView(validation_rules=[DepthLimit…])`` REPLACED graphql-core's + spec rule set (that is ``validate()``'s semantics for an explicit rules + list), silently disabling every standard GraphQL validation — unknown + arguments/fields and variable-type checks — in all environments. Invalid + queries executed with the bogus parts ignored, which let ~26 invalid + frontend documents ship unnoticed (several backing silently-broken + features). ``config/graphql/schema.py`` now builds + ``[*specified_rules, DepthLimitValidationRule(, DisableIntrospection)]``, + pinned by ``test_security_hardening.TestServedValidationRulesIncludeSpecRules``. + Every shipped frontend document is now swept in CI by + ``opencontractserver/tests/architecture/test_frontend_graphql_documents.py`` + (ad-hoc: ``scripts/validate_frontend_graphql.py``; the sweep strips Apollo + ``@client`` selections and skips fragment-only/interpolated documents). +- **All 26 invalid frontend documents repaired**, including features that + could never have worked: ``deleteMetadataColumn`` and ``updateFieldset`` + were called by the UI but did not exist server-side (both now implemented + in ``config/graphql/extract_mutations.py`` via the BaseService + get_or_none/require_permission pattern with IDOR-unified messages); + ``GET_CORPUS_CHAT_MESSAGES`` used a misspelled argument + relay shape on a + plain list field (corpus chat history always loaded empty objects); + ``tokenAuth`` was schema-conditional on ``USE_AUTH0`` (now always the + ``WithUser`` payload, so the login document validates everywhere); the + document-by-id redirect selected the nonexistent ``DocumentType.corpus`` + (corpus context now sourced from the route's slug resolution where it + exists — the previous mock-only field meant graph-node click-throughs + always landed on standalone paths); dead ``ADD_DOCUMENT_TO_CORPUS`` + removed; plus variable-type (ID!/String!, JSONString/GenericScalar, + String/enum) and payload-field corrections across vote, thread-moderation, + research-report, TOC and corpus-list documents. +- **Presigned file URLs no longer outlive their signatures.** The AWS + settings branch derived the shared file-URL cache lifetime from + ``_AWS_EXPIRY`` (the stored objects' HTTP CacheControl max-age, 7 days) + instead of the presign lifetime (``AWS_QUERYSTRING_EXPIRE``, 1 hour), so + redis served dead 403 pdf/pawls/txt links for up to 5 hours. + ``AWS_QUERYSTRING_EXPIRE`` is now explicit, the cache TTL derives from it, + and ``clamp_shared_url_cache_ttl`` (``opencontractserver/utils/files.py``) + enforces TTL ≤ half the signature lifetime even against env overrides. +- **3-minute analysis-annotation responses fixed.** + ``UserFeedbackQuerySet.visible_to_user`` expressed annotation-inherited + visibility as ``commented_annotation_id__in=`` — an uncorrelated ``IN`` materialized over the entire + annotations table on every evaluation (~0.8s each; 216 pagination counts + made ``GetAnnotationsForAnalysis`` take ~176s for a 108-mention document). + Rewritten as a correlated ``Exists`` pinned to the feedback row's + annotation id — identical semantics (permissioning invariant suites pass), + measured 176s → 2.3s. Shape pinned by + ``test_feedback.TestVisibilityQueryShape``. diff --git a/changelog.d/pdf-inline-citations.fixed.md b/changelog.d/pdf-inline-citations.fixed.md new file mode 100644 index 000000000..b91253da8 --- /dev/null +++ b/changelog.d/pdf-inline-citations.fixed.md @@ -0,0 +1,48 @@ +- **Inline citations now render on PDF documents.** The enrichment writer + (`opencontractserver/enrichment/writer.py`) projected nothing visible on + PDFs: it stored every reference mention as a `SPAN_LABEL` char-offset + annotation, which the PDF viewer (token-indexed PAWLs renderer) cannot + paint — citations showed in the References panel but never inline on the + filings themselves. Mentions on PDF documents are now projected onto PAWLs + token bounding boxes via PlasmaPDF (`TOKEN_LABEL`, real page numbers + instead of the hardcoded `page=1`), with the char span preserved in + `data.char_span` for dedupe. Projection handles real-ingest drift between + `txt_extract_file` and the PAWLs text via whitespace-insensitive + ordinal-occurrence remapping (covers hard line-wraps and see-quoted SECTION + refs whose raw text extends left of the span start) and falls back to the + span representation when the mention text genuinely is not in the PAWLs + text. Re-running enrichment upgrades pre-fix span mentions **in place** + (same row — `CorpusReference`/`Relationship` FKs survive), so a corpus + re-enrich is also the backfill. +- **Shared span→token projection utility.** The format-aware document-text + loader and the PlasmaPDF span projection moved from private helpers in + `opencontractserver/utils/extraction_grounding.py` to + `opencontractserver/utils/span_projection.py` + (`load_document_text_and_layer`, `project_span_to_token_annotation`); + datacell grounding and the enrichment writer now share one implementation. +- **Reference-mention merge fixed and made usable.** + `useReferenceMentions` (frontend): (1) the analyses discovery query used + `analyses(corpusId:)` — an argument that does not exist in the schema and + was silently ignored (see validation gap below), so the hook swept every + enrichment analysis platform-wide; it now uses the real + `analyzedCorpusId` filter. (2) The per-analysis fetch used a `useLazyQuery` + handle re-executed in a loop, whose promise was observed never settling — + replaced with `client.query`. (3) The fetch now uses a lean + `GET_REFERENCE_MENTIONS_FOR_ANALYSIS` selection: the previous full + selection (per-annotation userFeedback / relationships / document / corpus) + measured **~176s** server-side for 108 mentions vs ~0s for the lean one. + Net effect: inline cites appear within seconds of opening a PDF. +- **Known issues surfaced during this work (deliberately NOT fixed here):** + (1) `GraphQLView(validation_rules=[DepthLimit…])` REPLACES graphql-core's + spec rule set, so standard GraphQL validation (unknown arguments/fields, + variable types) is disabled on the served endpoint; ~34 shipped frontend + documents currently fail spec validation (`scripts/validate_frontend_graphql.py` + enumerates them) — restoring `[*specified_rules, …]` must land with those + query fixes (documented in `config/graphql/schema.py`). (2) Presigned file + URLs are cached for `FILE_URL_SHARED_CACHE_TTL=21600`s while + `AWS_QUERYSTRING_EXPIRE` defaults to 3600s — cached links 403 for hours. + (3) `Document.update_summary`'s docstring claims it updates + `md_summary_file` but it only writes a revision, so intelligence-panel + summary coverage stays 0%. (4) `add_document`-triggered agent actions + re-fire on agent-authored document writes (runaway agent loop / unbounded + LLM spend). diff --git a/config/graphql/corpus_mutations.py b/config/graphql/corpus_mutations.py index c94f45d14..489e72c0e 100644 --- a/config/graphql/corpus_mutations.py +++ b/config/graphql/corpus_mutations.py @@ -8,12 +8,13 @@ import graphene from django.conf import settings from django.core.exceptions import PermissionDenied -from django.db import DatabaseError, IntegrityError, transaction +from django.db import DatabaseError, transaction from django.utils import timezone from graphql_jwt.decorators import login_required, user_passes_test from graphql_relay import from_global_id, to_global_id from config.graphql.base import DRFDeletion, DRFMutation +from config.graphql.corpus_types import CorpusIntelligenceSetupSummaryType from config.graphql.graphene_types import ( CorpusActionExecutionType, CorpusActionType, @@ -1603,38 +1604,22 @@ def mutate(root, info, template_id: str, corpus_id: str) -> "AddTemplateToCorpus # Get the template (templates are global, no user filter needed) template = CorpusActionTemplate.objects.get(pk=template_pk, is_active=True) - # Fast-path duplicate check (avoids wasted clone + rollback). - # The unique constraint + IntegrityError catch below handles the - # race-condition window between this check and the insert. - if CorpusAction.objects.filter( - corpus=corpus, source_template=template - ).exists(): - return AddTemplateToCorpus( - ok=False, - message="This template has already been added to the corpus", - obj=None, - ) + # Shared install recipe (dedupe fast-path, savepoint-wrapped + # clone, IntegrityError race recovery, CRUD grant) — the same + # method the one-click intelligence setup uses, so the two + # install paths cannot drift. + from opencontractserver.corpuses.services import CorpusActionService - # Clone the template into a CorpusAction. - # Wrap in a savepoint so that a race-condition IntegrityError - # does not abort the outer transaction (PostgreSQL requirement). - try: - with transaction.atomic(): - action = template.clone_to_corpus(corpus, creator=user) - except IntegrityError: + action, created = CorpusActionService.install_template( + user, corpus, template, request=info.context + ) + if not created: return AddTemplateToCorpus( ok=False, message="This template has already been added to the corpus", obj=None, ) - set_permissions_for_obj_to_user( - user, - action, - [PermissionTypes.CRUD], - request=info.context, - ) - return AddTemplateToCorpus( ok=True, message="Template added to corpus successfully", @@ -1655,6 +1640,52 @@ def mutate(root, info, template_id: str, corpus_id: str) -> "AddTemplateToCorpus ) +class SetupCorpusIntelligence(graphene.Mutation): + """One-click collection-intelligence setup. + + Composes the default enrichment bundle in a single idempotent call: + installs the reference-enrichment analyzer as an ``add_document`` action + and starts the first weave (deterministic), then clones the description + + summary action templates and batch-runs each over every document already + in the corpus (LLM). Safe to repeat — every step skips work that already + exists. Requires CRUD permission on the corpus — the tier + AddTemplateToCorpus and CreateCorpusAction gate the identical writes at. + """ + + class Arguments: + corpus_id = graphene.ID( + required=True, description="ID of the corpus to set up." + ) + + ok = graphene.Boolean() + message = graphene.String() + summary = graphene.Field(CorpusIntelligenceSetupSummaryType) + + @login_required + @graphql_ratelimit(rate=RateLimits.WRITE_HEAVY) + def mutate(root, info, corpus_id: str) -> "SetupCorpusIntelligence": + from opencontractserver.corpuses.services import ( + CorpusIntelligenceSetupService, + ) + + failure_msg = "Corpus not found or you don't have permission." + try: + corpus_pk = int(from_global_id(corpus_id)[1]) + except Exception: + return SetupCorpusIntelligence(ok=False, message=failure_msg, summary=None) + + result = CorpusIntelligenceSetupService.setup( + info.context.user, corpus_pk, request=info.context + ) + if not result.ok: + return SetupCorpusIntelligence(ok=False, message=result.error, summary=None) + return SetupCorpusIntelligence( + ok=True, + message="Collection intelligence setup started.", + summary=result.value, + ) + + class ToggleCorpusMemory(graphene.Mutation): """ Toggle the agent memory system on/off for a corpus. diff --git a/config/graphql/corpus_queries.py b/config/graphql/corpus_queries.py index 30033a969..04598b819 100644 --- a/config/graphql/corpus_queries.py +++ b/config/graphql/corpus_queries.py @@ -17,6 +17,7 @@ CorpusDocumentGraphNodeType, CorpusDocumentGraphType, CorpusIntelligenceAggregatesType, + CorpusIntelligenceSetupStatusType, LabelDistributionEntryType, ) from config.graphql.filters import CorpusCategoryFilter, CorpusFilter @@ -304,6 +305,42 @@ def resolve_deleted_documents_in_corpus(self, info, corpus_id) -> Any: ) # CORPUS STATS RESOLVERS ##################################### + corpus_intelligence_setup_status = graphene.Field( + CorpusIntelligenceSetupStatusType, + corpus_id=graphene.ID(required=True), + description=( + "Which pieces of the default collection-intelligence bundle " + "(reference-web action + description/summary templates) are " + "already installed on the corpus. Null when the corpus is not " + "visible to the requesting user." + ), + ) + + @graphql_ratelimit_dynamic(get_rate=get_user_tier_rate("READ_LIGHT")) + def resolve_corpus_intelligence_setup_status(self, info, corpus_id) -> Any: + """Visibility-scoped via ``CorpusIntelligenceSetupService.status``. + + Deliberately NOT ``@login_required``: the setup banner reads this on the + intelligence overview and the ``insight-panel`` CAML embed, both of which + anonymous users can reach for a public corpus. There is no privilege + escalation — ``status`` filters the corpus through ``visible_to_user`` + (returning ``None`` for an invisible corpus) and reports ``can_setup`` + from CRUD, which an anonymous user never has. Anonymous viewers of a + public corpus therefore see read-only status and no actionable button. + """ + from opencontractserver.corpuses.services import ( + CorpusIntelligenceSetupService, + ) + + try: + corpus_pk = int(from_global_id(corpus_id)[1]) + except Exception: + return None + result = CorpusIntelligenceSetupService.status( + info.context.user, corpus_pk, request=info.context + ) + return result.value if result.ok else None + corpus_stats = graphene.Field(CorpusStatsType, corpus_id=graphene.ID(required=True)) @graphql_ratelimit_dynamic(get_rate=get_user_tier_rate("READ_MEDIUM")) diff --git a/config/graphql/corpus_types.py b/config/graphql/corpus_types.py index b578e8f82..4b1e5ba16 100644 --- a/config/graphql/corpus_types.py +++ b/config/graphql/corpus_types.py @@ -956,3 +956,86 @@ def resolve_created(self, info) -> Any: """Document creation timestamp — historical revisions used the same field name.""" return self.created + + +class IntelligenceTemplateOutcomeType(graphene.ObjectType): + """Per-template result from the one-click intelligence setup.""" + + template_name = graphene.String(required=True) + installed_now = graphene.Boolean( + required=True, description="Template was cloned into the corpus by this call." + ) + already_installed = graphene.Boolean( + required=True, description="The corpus already had this template's action." + ) + queued_count = graphene.Int( + required=True, description="Documents queued for an agent run by this call." + ) + skipped_already_run_count = graphene.Int( + required=True, description="Documents skipped because they already ran." + ) + error = graphene.String( + required=True, + description="Per-template failure (empty string when the step succeeded).", + ) + remaining_count = graphene.Int( + required=True, + description=( + "Documents deferred past the per-call batch cap — re-run setup " + "(or wait for the add_document trigger) to process them." + ), + ) + + +class CorpusIntelligenceSetupSummaryType(graphene.ObjectType): + """Result envelope for ``setupCorpusIntelligence``. + + Mirrors ``IntelligenceSetupSummary`` from + ``opencontractserver.corpuses.services.intelligence_setup`` — graphene's + default resolver reads the dataclass attributes directly. + """ + + reference_available = graphene.Boolean( + required=True, + description="The reference-enrichment analyzer is registered on this deployment.", + ) + reference_action_installed_now = graphene.Boolean(required=True) + reference_action_already_installed = graphene.Boolean(required=True) + reference_analysis_started = graphene.Boolean( + required=True, description="An immediate reference-web weave was started." + ) + total_active_documents = graphene.Int(required=True) + templates = graphene.List( + graphene.NonNull(IntelligenceTemplateOutcomeType), required=True + ) + + +class CorpusIntelligenceSetupStatusType(graphene.ObjectType): + """Which intelligence-bundle pieces a corpus already has installed.""" + + reference_available = graphene.Boolean( + required=True, + description="The reference-enrichment analyzer is registered on this deployment.", + ) + reference_action_installed = graphene.Boolean(required=True) + installed_template_names = graphene.List( + graphene.NonNull(graphene.String), required=True + ) + missing_template_names = graphene.List( + graphene.NonNull(graphene.String), required=True + ) + is_fully_set_up = graphene.Boolean( + required=True, + description=( + "Every deployment-installable bundle piece is installed " + "(unavailable pieces — unregistered analyzer, inactive template — " + "are excluded)." + ), + ) + can_setup = graphene.Boolean( + required=True, + description=( + "The requesting user holds the permission setupCorpusIntelligence " + "requires (CRUD) — drives the setup CTA's visibility." + ), + ) diff --git a/config/graphql/extract_mutations.py b/config/graphql/extract_mutations.py index 74fcd69c2..294570d47 100644 --- a/config/graphql/extract_mutations.py +++ b/config/graphql/extract_mutations.py @@ -355,6 +355,59 @@ def mutate(root, info, column_id, **kwargs) -> "UpdateMetadataColumn": ) +class DeleteMetadataColumn(graphene.Mutation): + """Delete a manual-entry metadata column definition (values cascade).""" + + class Arguments: + column_id = graphene.ID(required=True) + + ok = graphene.Boolean() + message = graphene.String() + + @login_required + def mutate(root, info, column_id) -> "DeleteMetadataColumn": + from opencontractserver.types.enums import PermissionTypes + + # Unified message blocks IDOR enumeration: same response whether the + # column does not exist or the caller lacks DELETE permission. + not_found_msg = "Column not found or you do not have permission to delete it." + + try: + user = info.context.user + column = BaseService.get_or_none( + Column, from_global_id(column_id)[1], user, request=info.context + ) + # require_permission returns "" on grant and a non-empty error + # string on denial, so a truthy result means "denied". Guard the + # None case first to avoid calling require_permission on a missing + # object. + if column is None: + return DeleteMetadataColumn(ok=False, message=not_found_msg) + if BaseService.require_permission( + column, user, PermissionTypes.DELETE, request=info.context + ): + return DeleteMetadataColumn(ok=False, message=not_found_msg) + + # Mirrors UpdateMetadataColumn: only manual-entry (metadata) + # columns are managed through this surface — extract columns + # have their own lifecycle (DeleteColumn). + if not column.is_manual_entry: + return DeleteMetadataColumn( + ok=False, message="Only manual entry columns can be deleted" + ) + + column.delete() + return DeleteMetadataColumn( + ok=True, message="Metadata field deleted successfully" + ) + + except Exception: + logger.exception("Error deleting metadata field") + return DeleteMetadataColumn( + ok=False, message="Error deleting metadata field." + ) + + class SetMetadataValue(graphene.Mutation): """Set a metadata value for a document. @@ -541,6 +594,55 @@ def mutate(root, info, name, description) -> "CreateFieldset": return CreateFieldset(ok=True, message="SUCCESS!", obj=fieldset) +class UpdateFieldset(graphene.Mutation): + """Rename / re-describe a fieldset the caller may UPDATE.""" + + class Arguments: + id = graphene.ID(required=True) + name = graphene.String(required=False) + description = graphene.String(required=False) + + ok = graphene.Boolean() + message = graphene.String() + obj = graphene.Field(FieldsetType) + + @login_required + def mutate(root, info, id, name=None, description=None) -> "UpdateFieldset": + from opencontractserver.types.enums import PermissionTypes + + # Unified message blocks IDOR enumeration: same response whether the + # fieldset does not exist or the caller lacks UPDATE permission. + not_found_msg = "Fieldset not found or you do not have permission to update it." + + try: + user = info.context.user + fieldset = BaseService.get_or_none( + Fieldset, from_global_id(id)[1], user, request=info.context + ) + # require_permission returns "" on grant and a non-empty error + # string on denial, so a truthy result means "denied". Guard the + # None case first to avoid calling require_permission on a missing + # object. + if fieldset is None: + return UpdateFieldset(ok=False, message=not_found_msg) + if BaseService.require_permission( + fieldset, user, PermissionTypes.UPDATE, request=info.context + ): + return UpdateFieldset(ok=False, message=not_found_msg) + + if name is not None: + fieldset.name = name + if description is not None: + fieldset.description = description + fieldset.save() + + return UpdateFieldset(ok=True, message="SUCCESS!", obj=fieldset) + + except Exception: + logger.exception("Error updating fieldset") + return UpdateFieldset(ok=False, message="Error updating fieldset.") + + class UpdateColumnMutation(DRFMutation): class Arguments: name = graphene.String(required=False) diff --git a/config/graphql/mutations.py b/config/graphql/mutations.py index 09c1a42eb..abc9a043d 100644 --- a/config/graphql/mutations.py +++ b/config/graphql/mutations.py @@ -7,7 +7,6 @@ import graphene import graphql_jwt -from django.conf import settings # Import agent mutations from config.graphql.agent_mutations import ( @@ -93,6 +92,7 @@ RemoveDocumentsFromCorpus, RunCorpusAction, SetCorpusVisibility, + SetupCorpusIntelligence, StartCorpusActionBatchRun, StartCorpusFork, ToggleCorpusMemory, @@ -139,6 +139,7 @@ CreateMetadataColumn, DeleteColumn, DeleteExtract, + DeleteMetadataColumn, DeleteMetadataValue, EditDatacell, RejectDatacell, @@ -148,6 +149,7 @@ StartExtract, UpdateColumnMutation, UpdateExtractMutation, + UpdateFieldset, UpdateMetadataColumn, ) @@ -244,11 +246,15 @@ class Mutation(graphene.ObjectType): - # TOKEN MUTATIONS (IF WE'RE NOT OUTSOURCING JWT CREATION TO AUTH0) ####### - if not settings.USE_AUTH0: - token_auth = ObtainJSONWebTokenWithUser.Field() - else: - token_auth = graphql_jwt.ObtainJSONWebToken.Field() + # TOKEN MUTATIONS ######################################################### + # Always the ``WithUser`` payload: gating the field's TYPE on USE_AUTH0 + # made the schema change shape per deployment, so the frontend's + # LOGIN_MUTATION (which selects ``user``) was schema-INVALID on Auth0 + # deployments — harmless only while spec validation was disabled. Under + # Auth0 this mutation is simply never used (the frontend gates login on + # REACT_APP_USE_AUTH0, and password auth is rejected by the backends), + # but it stays schema-valid everywhere. + token_auth = ObtainJSONWebTokenWithUser.Field() verify_token = graphql_jwt.Verify.Field() refresh_token = graphql_jwt.Refresh.Field() @@ -334,6 +340,7 @@ class Mutation(graphene.ObjectType): run_corpus_action = RunCorpusAction.Field() start_corpus_action_batch_run = StartCorpusActionBatchRun.Field() add_template_to_corpus = AddTemplateToCorpus.Field() + setup_corpus_intelligence = SetupCorpusIntelligence.Field() toggle_corpus_memory = ToggleCorpusMemory.Field() # CORPUS CATEGORY MUTATIONS (superuser-only) ############################### @@ -371,6 +378,7 @@ class Mutation(graphene.ObjectType): # EXTRACT MUTATIONS ########################################################## create_fieldset = CreateFieldset.Field() + update_fieldset = UpdateFieldset.Field() create_column = CreateColumn.Field() update_column = UpdateColumnMutation.Field() @@ -394,6 +402,7 @@ class Mutation(graphene.ObjectType): # NEW METADATA MUTATIONS (Column/Datacell based) ################################ create_metadata_column = CreateMetadataColumn.Field() update_metadata_column = UpdateMetadataColumn.Field() + delete_metadata_column = DeleteMetadataColumn.Field() set_metadata_value = SetMetadataValue.Field() delete_metadata_value = DeleteMetadataValue.Field() diff --git a/config/graphql/schema.py b/config/graphql/schema.py index da2594e63..8d8fefb8b 100644 --- a/config/graphql/schema.py +++ b/config/graphql/schema.py @@ -1,15 +1,29 @@ import graphene from django.conf import settings +from graphql.validation import specified_rules from config.graphql.mutations import Mutation from config.graphql.queries import Query from config.graphql.security import DepthLimitValidationRule, DisableIntrospection -# Build validation rules: always enforce depth limits, disable introspection -# in production. +# Build validation rules: the FULL GraphQL spec rule set, plus depth limiting +# always and introspection disabling in production. +# +# The spec rules MUST be listed explicitly: graphql-core's +# ``validate(schema, document, rules)`` REPLACES the default rule set when +# ``rules`` is provided. Passing only the custom hardening rules silently +# disabled every standard validation (unknown arguments/fields, variable +# type checks, ...) on the served endpoint — invalid queries executed with +# the bogus parts ignored instead of erroring, which let ~26 invalid +# frontend documents ship unnoticed. Pinned by +# ``test_security_hardening.TestServedValidationRulesIncludeSpecRules``; the +# frontend documents themselves are swept by +# ``tests/architecture/test_frontend_graphql_documents.py`` (and +# ``scripts/validate_frontend_graphql.py`` for ad-hoc runs). +# # NOTE: This list is built at import time. Tests that override settings.DEBUG # after import must use graphql-core's validate() directly with the rule classes. -validation_rules: list = [DepthLimitValidationRule] +validation_rules: list = [*specified_rules, DepthLimitValidationRule] if not settings.DEBUG: validation_rules.append(DisableIntrospection) diff --git a/config/settings/base.py b/config/settings/base.py index 629d230f3..717ac6be8 100644 --- a/config/settings/base.py +++ b/config/settings/base.py @@ -393,7 +393,14 @@ AWS_STORAGE_BUCKET_NAME = env("AWS_STORAGE_BUCKET_NAME", default="dummy-bucket") # https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html#settings AWS_QUERYSTRING_AUTH = True + # Presigned-URL signature lifetime (seconds). Made explicit (rather than + # relying on django-storages' implicit 3600 default) because the shared + # file-URL cache TTL below MUST be derived from it. + AWS_QUERYSTRING_EXPIRE = env.int("AWS_QUERYSTRING_EXPIRE", default=3600) # DO NOT change these unless you know what you're doing. + # NOTE: this is the HTTP CacheControl max-age for the stored OBJECTS — + # it has nothing to do with how long presigned URLs stay valid (that is + # AWS_QUERYSTRING_EXPIRE above). _AWS_EXPIRY = 60 * 60 * 24 * 7 # https://django-storages.readthedocs.io/en/latest/backends/amazon-S3.html#settings AWS_S3_OBJECT_PARAMETERS = { @@ -533,15 +540,28 @@ # window. The TTL is held well under the signed-URL lifetime so a cached URL is # always served with ample validity remaining. 0 disables the shared cache # (LOCAL storage URLs are relative + free; only the per-request memo applies). +from opencontractserver.utils.files import ( # noqa: E402 (pure helper, no app/model imports at module level) + clamp_shared_url_cache_ttl as _clamp_shared_url_cache_ttl, +) + if STORAGE_BACKEND == "GCP": _signed_url_lifetime_seconds = int(GS_EXPIRATION.total_seconds()) elif STORAGE_BACKEND == "AWS": - _signed_url_lifetime_seconds = _AWS_EXPIRY + # The PRESIGN lifetime (AWS_QUERYSTRING_EXPIRE) — NOT ``_AWS_EXPIRY``, + # which is the stored objects' HTTP CacheControl max-age (7 days) and + # says nothing about signature validity. Deriving from the wrong value + # let this cache serve dead (403) links for up to 5 hours. + _signed_url_lifetime_seconds = AWS_QUERYSTRING_EXPIRE else: _signed_url_lifetime_seconds = 0 -FILE_URL_SHARED_CACHE_TTL = env.int( - "FILE_URL_SHARED_CACHE_TTL", - default=max(0, min(_signed_url_lifetime_seconds // 2, 6 * 60 * 60)), +# Clamped even when set explicitly via env: a TTL beyond half the signature +# lifetime can only ever serve expired links. +FILE_URL_SHARED_CACHE_TTL = _clamp_shared_url_cache_ttl( + env.int( + "FILE_URL_SHARED_CACHE_TTL", + default=max(0, min(_signed_url_lifetime_seconds // 2, 6 * 60 * 60)), + ), + _signed_url_lifetime_seconds, ) # Max concurrent signBlob round trips when ``FileUrlPrewarmMiddleware`` pre-signs diff --git a/docs/assets/images/gifs/demo-1-create-and-setup.gif b/docs/assets/images/gifs/demo-1-create-and-setup.gif new file mode 100644 index 000000000..8f7e0a28b Binary files /dev/null and b/docs/assets/images/gifs/demo-1-create-and-setup.gif differ diff --git a/docs/assets/images/gifs/demo-2-explore-and-ask.gif b/docs/assets/images/gifs/demo-2-explore-and-ask.gif new file mode 100644 index 000000000..8c427d596 Binary files /dev/null and b/docs/assets/images/gifs/demo-2-explore-and-ask.gif differ diff --git a/docs/assets/images/screenshots/auto/corpus--governance-graph--with-data.png b/docs/assets/images/screenshots/auto/corpus--governance-graph--with-data.png new file mode 100644 index 000000000..903e364f8 Binary files /dev/null and b/docs/assets/images/screenshots/auto/corpus--governance-graph--with-data.png differ diff --git a/docs/assets/images/screenshots/auto/corpus--intelligence-overview--with-data.png b/docs/assets/images/screenshots/auto/corpus--intelligence-overview--with-data.png new file mode 100644 index 000000000..a3c989466 Binary files /dev/null and b/docs/assets/images/screenshots/auto/corpus--intelligence-overview--with-data.png differ diff --git a/frontend/src/__tests__/id-based-navigation.test.tsx b/frontend/src/__tests__/id-based-navigation.test.tsx index 7be752c1e..af7dd0035 100644 --- a/frontend/src/__tests__/id-based-navigation.test.tsx +++ b/frontend/src/__tests__/id-based-navigation.test.tsx @@ -339,11 +339,16 @@ describe("ID-based Navigation", () => { ); await waitFor(() => { - // Should redirect to canonical URL with corpus context - expect(mockNavigate).toHaveBeenCalledWith( - "/d/john-doe/test-corpus/test-document", - { replace: true } - ); + // Slug resolution failed entirely (corpusBySlugs is null in the mock + // above), so no corpus context survives — the ID fallback redirects + // to the document's STANDALONE canonical path. DocumentType exposes + // no `corpus` field (documents relate to corpora via paths); the old + // corpus-context expectation here was sourced from a mock-only field + // the server never returned. The corpus-context variant of this + // redirect is taken when corpusBySlugs resolves. + expect(mockNavigate).toHaveBeenCalledWith("/d/john-doe/test-document", { + replace: true, + }); }); }); }); diff --git a/frontend/src/components/admin/global_agent_management.graphql.ts b/frontend/src/components/admin/global_agent_management.graphql.ts index cc25172ca..6c97409f6 100644 --- a/frontend/src/components/admin/global_agent_management.graphql.ts +++ b/frontend/src/components/admin/global_agent_management.graphql.ts @@ -52,7 +52,7 @@ export const CREATE_GLOBAL_AGENT_CONFIGURATION = gql` $systemInstructions: String! $availableTools: [String] $permissionRequiredTools: [String] - $badgeConfig: JSONString + $badgeConfig: GenericScalar $avatarUrl: String $scope: String! $isPublic: Boolean @@ -88,7 +88,7 @@ export const UPDATE_GLOBAL_AGENT_CONFIGURATION = gql` $systemInstructions: String $availableTools: [String] $permissionRequiredTools: [String] - $badgeConfig: JSONString + $badgeConfig: GenericScalar $avatarUrl: String $isActive: Boolean $isPublic: Boolean diff --git a/frontend/src/components/cookies/CookieConsent.tsx b/frontend/src/components/cookies/CookieConsent.tsx index fa5c66ea8..bd6ddbbd4 100644 --- a/frontend/src/components/cookies/CookieConsent.tsx +++ b/frontend/src/components/cookies/CookieConsent.tsx @@ -472,9 +472,7 @@ export const CookieConsentDialog = () => { setAnalyticsConsent(true); showCookieAcceptModal(false); } else { - toast.error( - `Failed to record consent: ${data.acceptCookieConsent.message}` - ); + toast.error("Failed to record consent"); // Still close the modal and set localStorage as fallback localStorage.setItem("oc_cookieAccepted", "true"); setAnalyticsConsent(true); diff --git a/frontend/src/components/corpuses/CorpusHome/intelligence/GovernanceGraphLive.tsx b/frontend/src/components/corpuses/CorpusHome/intelligence/GovernanceGraphLive.tsx index 197d19b5b..1a2bdcb84 100644 --- a/frontend/src/components/corpuses/CorpusHome/intelligence/GovernanceGraphLive.tsx +++ b/frontend/src/components/corpuses/CorpusHome/intelligence/GovernanceGraphLive.tsx @@ -13,9 +13,12 @@ import { import { GET_GOVERNANCE_GRAPH, GET_ANALYZERS_FOR_ENRICHMENT, + GET_CORPUS_INTELLIGENCE_SETUP_STATUS, GetGovernanceGraphInputType, GetGovernanceGraphOutputType, GetAnalyzersForEnrichmentOutputType, + GetCorpusIntelligenceSetupStatusInputType, + GetCorpusIntelligenceSetupStatusOutputType, GovernanceGraphNode, GovernanceGraphEdge, } from "../../../../graphql/queries"; @@ -166,6 +169,10 @@ export const GovernanceGraphLive: React.FC = ({ GET_ANALYZERS_FOR_ENRICHMENT, { fetchPolicy: "network-only" } ); + const [fetchSetupStatus] = useLazyQuery< + GetCorpusIntelligenceSetupStatusOutputType, + GetCorpusIntelligenceSetupStatusInputType + >(GET_CORPUS_INTELLIGENCE_SETUP_STATUS, { fetchPolicy: "network-only" }); const [startAnalysis] = useMutation( START_ANALYSIS ); @@ -204,23 +211,33 @@ export const GovernanceGraphLive: React.FC = ({ return; } - // Keep the web growing: install the add_document action. A failure - // here (e.g. collaborator without update rights) shouldn't abort the - // already-running first weave — surface it softly instead. - try { - await createCorpusAction({ - variables: { - corpusId, - trigger: "add_document", - analyzerId: analyzer.id, - name: "Reference enrichment (auto)", - }, - }); - } catch { - toast.info( - "Mapping started — but the keep-it-updated action couldn't be " + - "installed (you may need edit rights on this corpus)." - ); + // Keep the web growing: install the add_document action — unless the + // corpus already has one (e.g. installed by one-click intelligence + // setup); a second row would run the analyzer twice on every future + // upload. A failure here (e.g. collaborator without edit rights) + // shouldn't abort the already-running first weave — surface it softly. + const { data: setupStatusData } = await fetchSetupStatus({ + variables: { corpusId }, + }); + const actionAlreadyInstalled = + !!setupStatusData?.corpusIntelligenceSetupStatus + ?.referenceActionInstalled; + if (!actionAlreadyInstalled) { + try { + await createCorpusAction({ + variables: { + corpusId, + trigger: "add_document", + analyzerId: analyzer.id, + name: "Reference enrichment (auto)", + }, + }); + } catch { + toast.info( + "Mapping started — but the keep-it-updated action couldn't be " + + "installed (you may need edit rights on this corpus)." + ); + } } setWeaving(true); @@ -234,6 +251,7 @@ export const GovernanceGraphLive: React.FC = ({ }, [ corpusId, fetchAnalyzers, + fetchSetupStatus, startAnalysis, createCorpusAction, startPolling, diff --git a/frontend/src/components/corpuses/CorpusHome/intelligence/IntelligencePanel.tsx b/frontend/src/components/corpuses/CorpusHome/intelligence/IntelligencePanel.tsx index ab9ced474..81a478174 100644 --- a/frontend/src/components/corpuses/CorpusHome/intelligence/IntelligencePanel.tsx +++ b/frontend/src/components/corpuses/CorpusHome/intelligence/IntelligencePanel.tsx @@ -15,6 +15,7 @@ import { GetCorpusIntelligenceAggregatesInputType, GetCorpusIntelligenceAggregatesOutputType, } from "../../../../graphql/queries"; +import { IntelligenceSetupBanner } from "./IntelligenceSetupBanner"; /** * IntelligencePanel — the insight-framed "at a glance" panel of the Corpus @@ -244,6 +245,10 @@ export const IntelligencePanel: React.FC = ({ return ( + {/* One-click bundle setup — silent once the corpus is fully set up. + Mounted here so both the overview and the insight-panel CAML embed + surface it. */} + {statsInitialLoading ? ( <> diff --git a/frontend/src/components/corpuses/CorpusHome/intelligence/IntelligenceSetupBanner.tsx b/frontend/src/components/corpuses/CorpusHome/intelligence/IntelligenceSetupBanner.tsx new file mode 100644 index 000000000..4094bbaee --- /dev/null +++ b/frontend/src/components/corpuses/CorpusHome/intelligence/IntelligenceSetupBanner.tsx @@ -0,0 +1,211 @@ +import React, { useCallback, useState } from "react"; +import { useMutation, useQuery } from "@apollo/client"; +import styled, { keyframes } from "styled-components"; +import { Loader2, Sparkles } from "lucide-react"; +import { toast } from "react-toastify"; + +import { OS_LEGAL_COLORS } from "../../../../assets/configurations/osLegalStyles"; +import { + GET_CORPUS_INTELLIGENCE_SETUP_STATUS, + GetCorpusIntelligenceSetupStatusInputType, + GetCorpusIntelligenceSetupStatusOutputType, +} from "../../../../graphql/queries"; +import { + SETUP_CORPUS_INTELLIGENCE, + SetupCorpusIntelligenceInputs, + SetupCorpusIntelligenceOutputs, +} from "../../../../graphql/mutations"; + +/** + * IntelligenceSetupBanner — the one-click "set up collection intelligence" + * entry point. + * + * Renders a slim call-to-action when the corpus is missing pieces of the + * default intelligence bundle (reference-web action, document descriptions, + * document summaries) and nothing at all once the bundle is installed — + * a fully set-up corpus shouldn't advertise setup. Mounted inside + * ``IntelligencePanel`` so every surface that shows the at-a-glance panel + * (intelligence overview and the ``insight-panel`` CAML embed) gets it. + * + * The mutation is idempotent server-side: it installs whatever is missing, + * starts the first reference weave, and batch-runs the description/summary + * agents over every document already in the corpus. + */ + +interface IntelligenceSetupBannerProps { + corpusId: string; + testId?: string; +} + +const spin = keyframes` + to { transform: rotate(360deg); } +`; + +const Banner = styled.div` + display: flex; + align-items: center; + justify-content: space-between; + flex-wrap: wrap; + gap: 0.75rem; + padding: 0.85rem 1.25rem; + background: linear-gradient( + 100deg, + rgba(37, 99, 235, 0.06), + rgba(201, 164, 92, 0.08) + ); + border: 1px solid ${OS_LEGAL_COLORS.border}; + border-radius: 14px; +`; + +const BannerText = styled.div` + display: flex; + align-items: center; + gap: 0.6rem; + font-size: 0.8125rem; + color: ${OS_LEGAL_COLORS.textSecondary}; + + svg { + flex-shrink: 0; + width: 16px; + height: 16px; + color: ${OS_LEGAL_COLORS.primaryBlue}; + } + + strong { + color: ${OS_LEGAL_COLORS.textPrimary}; + font-weight: 600; + } +`; + +const SetupButton = styled.button` + display: inline-flex; + align-items: center; + gap: 0.45rem; + padding: 0.5rem 1rem; + border: none; + border-radius: 10px; + background: ${OS_LEGAL_COLORS.primaryBlue}; + color: white; + font-size: 0.8125rem; + font-weight: 600; + cursor: pointer; + transition: background 0.15s ease; + + &:hover { + background: ${OS_LEGAL_COLORS.primaryBlueHover}; + } + + &:disabled { + opacity: 0.7; + cursor: default; + } + + svg { + width: 14px; + height: 14px; + } + + .spinning { + animation: ${spin} 1.1s linear infinite; + } +`; + +export const IntelligenceSetupBanner: React.FC< + IntelligenceSetupBannerProps +> = ({ corpusId, testId = "intelligence-setup-banner" }) => { + const [submitting, setSubmitting] = useState(false); + + const { data, refetch } = useQuery< + GetCorpusIntelligenceSetupStatusOutputType, + GetCorpusIntelligenceSetupStatusInputType + >(GET_CORPUS_INTELLIGENCE_SETUP_STATUS, { variables: { corpusId } }); + + const [setupIntelligence] = useMutation< + SetupCorpusIntelligenceOutputs, + SetupCorpusIntelligenceInputs + >(SETUP_CORPUS_INTELLIGENCE); + + const handleSetup = useCallback(async () => { + setSubmitting(true); + try { + const { data: result } = await setupIntelligence({ + variables: { corpusId }, + }); + const payload = result?.setupCorpusIntelligence; + if (!payload?.ok) { + toast.error( + payload?.message || "Couldn't set up collection intelligence." + ); + return; + } + const templates = payload.summary?.templates ?? []; + const queued = templates.reduce((sum, t) => sum + t.queuedCount, 0); + const remaining = templates.reduce( + (sum, t) => sum + (t.remainingCount ?? 0), + 0 + ); + const templateErrors = templates + .filter((t) => t.error) + .map((t) => `${t.templateName}: ${t.error}`); + if (queued > 0) { + toast.success( + `Setting up — ${queued} document enrichment ${ + queued === 1 ? "run" : "runs" + } queued${ + payload.summary?.referenceAnalysisStarted + ? ", reference web weaving" + : "" + }${ + remaining > 0 + ? `; ${remaining} more deferred past the per-run cap — re-run later to continue` + : "" + }.` + ); + } else if (templateErrors.length > 0) { + // ok=True but nothing queued and a template carried an error — don't + // claim it's fully set up; surface the actual failures. + toast.warning( + "Collection intelligence installed, but some document runs " + + `couldn't be queued — ${templateErrors.join("; ")}` + ); + } else { + toast.success("Collection intelligence is set up."); + } + // Status flips to fully-set-up as soon as the actions exist, which + // hides the banner — the per-surface panels (summary coverage, + // governance graph) show the enrichment landing as it completes. + await refetch(); + } catch { + toast.error("Couldn't set up collection intelligence."); + } finally { + setSubmitting(false); + } + }, [corpusId, refetch, setupIntelligence]); + + const status = data?.corpusIntelligenceSetupStatus; + // Silent while loading, on error, once fully set up, and for viewers who + // can't run setup (canSetup mirrors the mutation's CRUD gate — rendering + // the CTA for them would offer a guaranteed-to-fail click). The banner + // only exists to offer setup, never to report state. + if (!status || status.isFullySetUp || !status.canSetup) return null; + + return ( + + + + + Set up collection intelligence — map the reference + web, then describe and summarize every document automatically. + + + + {submitting ? : } + Set up + + + ); +}; diff --git a/frontend/src/components/corpuses/CorpusMapView.tsx b/frontend/src/components/corpuses/CorpusMapView.tsx index b0ee76792..2e0ac1db6 100644 --- a/frontend/src/components/corpuses/CorpusMapView.tsx +++ b/frontend/src/components/corpuses/CorpusMapView.tsx @@ -181,7 +181,7 @@ export const CorpusMapView: React.FC = ({ if (!resolvedDoc) { return; } - const url = getDocumentUrl(resolvedDoc, resolvedDoc.corpus); + const url = getDocumentUrl(resolvedDoc); if (url !== "#") { navigate(url); } diff --git a/frontend/src/components/corpuses/CorpusModal.tsx b/frontend/src/components/corpuses/CorpusModal.tsx index 049039dcb..669822447 100644 --- a/frontend/src/components/corpuses/CorpusModal.tsx +++ b/frontend/src/components/corpuses/CorpusModal.tsx @@ -7,6 +7,7 @@ import { ModalBody, ModalFooter, Button, + Checkbox, Input, Textarea, Spinner, @@ -41,6 +42,9 @@ export type CorpusModalMode = "CREATE" | "EDIT" | "VIEW"; export interface CorpusFormData { id?: string; + /** Create mode only: run the one-click collection-intelligence setup + * (reference web + document descriptions/summaries) after creation. */ + setupIntelligence?: boolean; title?: string; slug?: string; description?: string; @@ -362,6 +366,9 @@ export const CorpusModal: React.FC = ({ // Form state const [title, setTitle] = useState(""); + // Create-mode opt-in for the post-create intelligence setup (default on — + // the recommended path; the agent runs scale with document count). + const [setupIntelligence, setSetupIntelligence] = useState(true); const [slug, setSlug] = useState(""); const [description, setDescription] = useState(""); const [icon, setIcon] = useState(null); @@ -619,6 +626,7 @@ export const CorpusModal: React.FC = ({ formData.categories = categories; formData.license = license; formData.licenseLink = licenseLink; + formData.setupIntelligence = setupIntelligence; } onSubmit(formData); @@ -637,6 +645,7 @@ export const CorpusModal: React.FC = ({ categories, license, licenseLink, + setupIntelligence, ]); // Get header text based on mode @@ -840,6 +849,21 @@ export const CorpusModal: React.FC = ({ upward /> + + {isCreate && ( + + ) => + setSetupIntelligence(e.target.checked) + } + disabled={loading} + label="Set up collection intelligence — map the reference web and auto-describe/summarize documents as they arrive" + /> + + )} diff --git a/frontend/src/components/documents/VersionHistoryPanel.tsx b/frontend/src/components/documents/VersionHistoryPanel.tsx index e6c85bd61..f3640af0a 100644 --- a/frontend/src/components/documents/VersionHistoryPanel.tsx +++ b/frontend/src/components/documents/VersionHistoryPanel.tsx @@ -56,7 +56,7 @@ export const GET_DOCUMENT_VERSION_HISTORY = gql` // GraphQL mutation for restoring a document to a previous version export const RESTORE_DOCUMENT_TO_VERSION = gql` - mutation RestoreDocumentToVersion($documentId: ID!, $corpusId: ID!) { + mutation RestoreDocumentToVersion($documentId: String!, $corpusId: String!) { restoreDocumentToVersion(documentId: $documentId, corpusId: $corpusId) { ok message diff --git a/frontend/src/components/knowledge_base/document/document_kb/useReferenceMentions.ts b/frontend/src/components/knowledge_base/document/document_kb/useReferenceMentions.ts index 768e2b301..961801620 100644 --- a/frontend/src/components/knowledge_base/document/document_kb/useReferenceMentions.ts +++ b/frontend/src/components/knowledge_base/document/document_kb/useReferenceMentions.ts @@ -1,9 +1,9 @@ import { useEffect, useRef } from "react"; -import { useLazyQuery } from "@apollo/client"; +import { useApolloClient } from "@apollo/client"; import { GET_ANALYSES_FOR_CORPUS_ENRICHMENT, - GET_ANNOTATIONS_FOR_ANALYSIS, + GET_REFERENCE_MENTIONS_FOR_ANALYSIS, GetAnalysesForCorpusEnrichmentInputType, GetAnalysesForCorpusEnrichmentOutputType, } from "../../../../graphql/queries"; @@ -25,11 +25,13 @@ import { convertToServerAnnotation } from "../../../../utils/transform"; * cross-references to be visible and clickable without hunting for an * analysis selector. * - * Flow: discover the corpus's reference-enrichment Analyses (matched on - * `analyzer.taskName` — the corpus filter on the analyses query is loose, so - * every matching analysis is tried; `fullAnnotationList(documentId)` scopes - * each to this document), fetch their annotations via the existing - * per-analysis query (which enforces analysis visibility server-side), and + * Flow: discover the corpus's reference-enrichment Analyses (scoped + * server-side via `analyses(analyzedCorpusId:)` and matched on + * `analyzer.taskName`; `fullAnnotationList(documentId)` scopes + * each to this document), fetch their annotations via the LEAN + * per-analysis query (analysis visibility is enforced server-side; the full + * GET_ANNOTATIONS_FOR_ANALYSIS selection costs minutes for ~100 mentions — + * see GET_REFERENCE_MENTIONS_FOR_ANALYSIS), and * merge them into `pdfAnnotations` — id-deduped, and gated one-shot per * (document, corpus) so Apollo's cache/network double emissions can't * double-insert. @@ -49,14 +51,12 @@ export function useReferenceMentions( const mergedForRef = useRef(""); const mergeKey = `${documentId}:${corpusId ?? ""}`; - const [discoverAnalyses] = useLazyQuery< - GetAnalysesForCorpusEnrichmentOutputType, - GetAnalysesForCorpusEnrichmentInputType - >(GET_ANALYSES_FOR_CORPUS_ENRICHMENT, { fetchPolicy: "cache-first" }); - - const [fetchMentions] = useLazyQuery(GET_ANNOTATIONS_FOR_ANALYSIS, { - fetchPolicy: "cache-first", - }); + // client.query (not useLazyQuery): this is an imperative await-in-a-loop + // flow, and a lazy-query handle re-executed with changing variables can + // leave a promise unsettled (observed live: the per-analysis loop hung on + // the first await and the merge never ran). client.query promises always + // settle. + const client = useApolloClient(); // Keep a live ref of current annotations for the async merge below — // depending on `pdfAnnotations` directly would re-run the effect on every @@ -73,7 +73,14 @@ export function useReferenceMentions( let succeeded = false; (async () => { try { - const { data } = await discoverAnalyses({ variables: { corpusId } }); + const { data } = await client.query< + GetAnalysesForCorpusEnrichmentOutputType, + GetAnalysesForCorpusEnrichmentInputType + >({ + query: GET_ANALYSES_FOR_CORPUS_ENRICHMENT, + variables: { corpusId }, + fetchPolicy: "cache-first", + }); const enrichmentAnalyses = (data?.analyses?.edges ?? []) .map((e) => e.node) .filter( @@ -91,8 +98,10 @@ export function useReferenceMentions( const fresh: ReturnType[] = []; const existingIds = new Set(annotationsRef.current.map((a) => a.id)); for (const analysis of enrichmentAnalyses) { - const { data: annData } = await fetchMentions({ + const { data: annData } = await client.query({ + query: GET_REFERENCE_MENTIONS_FOR_ANALYSIS, variables: { analysisId: analysis.id, documentId }, + fetchPolicy: "cache-first", }); if (cancelled) return; for (const ann of annData?.analysis?.fullAnnotationList ?? []) { @@ -123,13 +132,5 @@ export function useReferenceMentions( // retry while the component stays mounted. if (!succeeded) mergedForRef.current = ""; }; - }, [ - ready, - mergeKey, - corpusId, - documentId, - discoverAnalyses, - fetchMentions, - addMultipleAnnotations, - ]); + }, [ready, mergeKey, corpusId, documentId, client, addMultipleAnnotations]); } diff --git a/frontend/src/components/layout/NavMenu.tsx b/frontend/src/components/layout/NavMenu.tsx index eeb17f364..860453b15 100644 --- a/frontend/src/components/layout/NavMenu.tsx +++ b/frontend/src/components/layout/NavMenu.tsx @@ -68,14 +68,17 @@ const navbarCustomStyles = ` background: rgba(255, 255, 255, 0.15) !important; color: rgba(255, 255, 255, 0.9) !important; } - /* cite wordmark — Source Serif 4 with the brackets preserved. + /* [OpenContracts] wordmark — Source Serif 4 with the brackets preserved. Overrides the @os-legal/ui default of 600-weight Inter so the - wordmark reads as a typographic mark, not a UI label. */ + wordmark reads as a typographic mark, not a UI label. Sized a step + below the old [cite] mark so the longer name keeps the same visual + weight in the bar. */ .oc-navbar__brand-name { font-family: ${OS_LEGAL_TYPOGRAPHY.fontFamilySerif} !important; font-weight: 400 !important; - font-size: 22px !important; - letter-spacing: -0.5px !important; + font-size: 19px !important; + letter-spacing: -0.4px !important; + white-space: nowrap !important; } `; @@ -232,7 +235,7 @@ export const NavMenu = () => { size={28} bracketColor={OS_LEGAL_COLORS.warmPaper} nodeColor={OS_LEGAL_COLORS.accent} - ariaLabel="cite" + ariaLabel="OpenContracts" /> ), [] @@ -244,7 +247,7 @@ export const NavMenu = () => { { = ({ // getDocumentUrl accepts the redirect query's slug/creator shape // directly (a structural subset of DocumentType) and returns "#" when // slugs are missing, so no cast is needed. - const url = getDocumentUrl(document, document.corpus); + const url = getDocumentUrl(document); if (url !== "#") { navigate(url); } diff --git a/frontend/src/graphql/mutations.ts b/frontend/src/graphql/mutations.ts index e0847eb96..afda66009 100644 --- a/frontend/src/graphql/mutations.ts +++ b/frontend/src/graphql/mutations.ts @@ -249,6 +249,9 @@ export interface CreateCorpusOutputs { createCorpus: { ok?: boolean; message?: string; + /** Global id of the created corpus — lets follow-up mutations (e.g. + * setupCorpusIntelligence) chain off the create. */ + objId?: string | null; }; } @@ -277,6 +280,68 @@ export const CREATE_CORPUS = gql` ) { ok message + objId + } + } +`; + +// ---------------- Collection-intelligence setup ---------------- +// One-click composite: installs the reference-enrichment add_document action +// and the description/summary agent templates, starts the first reference +// weave, and batch-runs the agents over every document already present. +// Idempotent server-side — safe to call repeatedly. + +export interface IntelligenceTemplateOutcome { + templateName: string; + installedNow: boolean; + alreadyInstalled: boolean; + queuedCount: number; + skippedAlreadyRunCount: number; + error: string; + /** Documents deferred past the per-call batch cap — re-run to continue. */ + remainingCount: number; +} + +export interface SetupCorpusIntelligenceInputs { + corpusId: string; +} + +export interface SetupCorpusIntelligenceOutputs { + setupCorpusIntelligence: { + ok: boolean; + message?: string | null; + summary?: { + referenceAvailable: boolean; + referenceActionInstalledNow: boolean; + referenceActionAlreadyInstalled: boolean; + referenceAnalysisStarted: boolean; + totalActiveDocuments: number; + templates: IntelligenceTemplateOutcome[]; + } | null; + }; +} + +export const SETUP_CORPUS_INTELLIGENCE = gql` + mutation setupCorpusIntelligence($corpusId: ID!) { + setupCorpusIntelligence(corpusId: $corpusId) { + ok + message + summary { + referenceAvailable + referenceActionInstalledNow + referenceActionAlreadyInstalled + referenceAnalysisStarted + totalActiveDocuments + templates { + templateName + installedNow + alreadyInstalled + queuedCount + skippedAlreadyRunCount + error + remainingCount + } + } } } `; @@ -323,7 +388,6 @@ export interface AcceptCookieConsentInputs {} export interface AcceptCookieConsentOutputs { acceptCookieConsent: { ok?: boolean; - message?: string; }; } @@ -331,7 +395,6 @@ export const ACCEPT_COOKIE_CONSENT = gql` mutation { acceptCookieConsent { ok - message } } `; @@ -1487,7 +1550,7 @@ export interface RequestUpdateFieldsetInputType { export const REQUEST_UPDATE_FIELDSET = gql` mutation UpdateFieldset($id: ID!, $name: String, $description: String) { updateFieldset(id: $id, name: $name, description: $description) { - msg + message ok obj { id @@ -2792,7 +2855,7 @@ export interface UpdateMessageOutput { * Returns the updated message with vote counts and current user's vote status. */ export const UPVOTE_MESSAGE = gql` - mutation UpvoteMessage($messageId: ID!) { + mutation UpvoteMessage($messageId: String!) { voteMessage(messageId: $messageId, voteType: "upvote") { ok message @@ -2831,7 +2894,7 @@ export interface UpvoteMessageOutput { * Returns the updated message with vote counts and current user's vote status. */ export const DOWNVOTE_MESSAGE = gql` - mutation DownvoteMessage($messageId: ID!) { + mutation DownvoteMessage($messageId: String!) { voteMessage(messageId: $messageId, voteType: "downvote") { ok message @@ -2858,7 +2921,7 @@ export interface DownvoteMessageOutput { * Returns the updated message with vote counts and current user's vote status (null after removal). */ export const REMOVE_VOTE = gql` - mutation RemoveVote($messageId: ID!) { + mutation RemoveVote($messageId: String!) { removeVote(messageId: $messageId) { ok message @@ -3083,11 +3146,11 @@ export interface RemoveCorpusVoteOutput { // ============================================================================ export const PIN_THREAD = gql` - mutation PinThread($conversationId: ID!) { + mutation PinThread($conversationId: String!) { pinThread(conversationId: $conversationId) { ok message - conversation { + obj { id isPinned pinnedBy { @@ -3108,7 +3171,7 @@ export interface PinThreadOutput { pinThread: { ok: boolean; message: string; - conversation: { + obj: { id: string; isPinned: boolean; pinnedBy: { @@ -3121,11 +3184,11 @@ export interface PinThreadOutput { } export const UNPIN_THREAD = gql` - mutation UnpinThread($conversationId: ID!) { + mutation UnpinThread($conversationId: String!) { unpinThread(conversationId: $conversationId) { ok message - conversation { + obj { id isPinned pinnedBy { @@ -3146,7 +3209,7 @@ export interface UnpinThreadOutput { unpinThread: { ok: boolean; message: string; - conversation: { + obj: { id: string; isPinned: boolean; pinnedBy: { @@ -3159,11 +3222,11 @@ export interface UnpinThreadOutput { } export const LOCK_THREAD = gql` - mutation LockThread($conversationId: ID!) { + mutation LockThread($conversationId: String!) { lockThread(conversationId: $conversationId) { ok message - conversation { + obj { id isLocked lockedBy { @@ -3184,7 +3247,7 @@ export interface LockThreadOutput { lockThread: { ok: boolean; message: string; - conversation: { + obj: { id: string; isLocked: boolean; lockedBy: { @@ -3197,11 +3260,11 @@ export interface LockThreadOutput { } export const UNLOCK_THREAD = gql` - mutation UnlockThread($conversationId: ID!) { + mutation UnlockThread($conversationId: String!) { unlockThread(conversationId: $conversationId) { ok message - conversation { + obj { id isLocked lockedBy { @@ -3222,7 +3285,7 @@ export interface UnlockThreadOutput { unlockThread: { ok: boolean; message: string; - conversation: { + obj: { id: string; isLocked: boolean; lockedBy: { @@ -3239,15 +3302,6 @@ export const DELETE_THREAD = gql` deleteThread(conversationId: $conversationId) { ok message - conversation { - id - isDeleted - deletedBy { - id - username - } - deletedAt - } } } `; @@ -3260,15 +3314,6 @@ export interface DeleteThreadOutput { deleteThread: { ok: boolean; message: string; - conversation: { - id: string; - isDeleted: boolean; - deletedBy: { - id: string; - username: string; - } | null; - deletedAt: string | null; - } | null; }; } @@ -3277,15 +3322,6 @@ export const RESTORE_THREAD = gql` restoreThread(conversationId: $conversationId) { ok message - conversation { - id - isDeleted - deletedBy { - id - username - } - deletedAt - } } } `; @@ -3298,15 +3334,6 @@ export interface RestoreThreadOutput { restoreThread: { ok: boolean; message: string; - conversation: { - id: string; - isDeleted: boolean; - deletedBy: { - id: string; - username: string; - } | null; - deletedAt: string | null; - } | null; }; } @@ -3317,7 +3344,7 @@ export interface RestoreThreadOutput { /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// export const RESTORE_DELETED_DOCUMENT = gql` - mutation RestoreDeletedDocument($documentId: ID!, $corpusId: ID!) { + mutation RestoreDeletedDocument($documentId: String!, $corpusId: String!) { restoreDeletedDocument(documentId: $documentId, corpusId: $corpusId) { ok message diff --git a/frontend/src/graphql/queries.ts b/frontend/src/graphql/queries.ts index a8821a6d3..ece12de16 100644 --- a/frontend/src/graphql/queries.ts +++ b/frontend/src/graphql/queries.ts @@ -73,7 +73,6 @@ export const GET_DOCUMENTS = gql` $hasLabelWithId: String $annotateDocLabels: Boolean! $hasAnnotationsWithIds: String - $includeMetadata: Boolean! $includeCaml: Boolean ) { documents( @@ -712,6 +711,40 @@ export interface GetCorpusStatsOutputType { corpusStats: CorpusStats; } +// Which pieces of the default collection-intelligence bundle (reference-web +// action + description/summary agent templates) a corpus already has — +// drives the one-click setup banner's visibility. +export interface CorpusIntelligenceSetupStatus { + referenceAvailable: boolean; + referenceActionInstalled: boolean; + installedTemplateNames: string[]; + missingTemplateNames: string[]; + isFullySetUp: boolean; + /** The viewer holds the permission the setup mutation requires (CRUD). */ + canSetup: boolean; +} + +export interface GetCorpusIntelligenceSetupStatusInputType { + corpusId: string; +} + +export interface GetCorpusIntelligenceSetupStatusOutputType { + corpusIntelligenceSetupStatus: CorpusIntelligenceSetupStatus | null; +} + +export const GET_CORPUS_INTELLIGENCE_SETUP_STATUS = gql` + query corpusIntelligenceSetupStatus($corpusId: ID!) { + corpusIntelligenceSetupStatus(corpusId: $corpusId) { + referenceAvailable + referenceActionInstalled + installedTemplateNames + missingTemplateNames + isFullySetUp + canSetup + } + } +`; + export const GET_CORPUS_STATS = gql` query corpusStats($corpusId: ID!) { corpusStats(corpusId: $corpusId) { @@ -995,8 +1028,8 @@ export interface GetAnalysesForCorpusEnrichmentOutputType { } export const GET_ANALYSES_FOR_CORPUS_ENRICHMENT = gql` - query analysesForCorpusEnrichment($corpusId: ID) { - analyses(corpusId: $corpusId) { + query analysesForCorpusEnrichment($corpusId: String) { + analyses(analyzedCorpusId: $corpusId) { edges { node { id @@ -2971,6 +3004,36 @@ export interface GetAnnotationsForAnalysisOutput { analysis: AnalysisType; } +// Lean variant for the reference-mention merge (useReferenceMentions): only +// the fields convertToServerAnnotation renders. The full +// GET_ANNOTATIONS_FOR_ANALYSIS selection drags per-annotation userFeedback / +// relationship / document / corpus resolvers — measured at ~176s for 108 +// mention annotations vs ~0s for this selection. +export const GET_REFERENCE_MENTIONS_FOR_ANALYSIS = gql` + query GetReferenceMentionsForAnalysis($analysisId: ID!, $documentId: ID) { + analysis(id: $analysisId) { + id + fullAnnotationList(documentId: $documentId) { + id + annotationLabel { + id + text + color + icon + description + labelType + } + annotationType + page + rawText + linkUrl + json + structural + } + } + } +`; + export const GET_ANNOTATIONS_FOR_ANALYSIS = gql` query GetAnnotationsForAnalysis($analysisId: ID!, $documentId: ID) { analysis(id: $analysisId) { @@ -3017,7 +3080,7 @@ export const GET_ANNOTATIONS_FOR_ANALYSIS = gql` } allSourceNodeInRelationship { id - annotationLabel { + relationshipLabel { id text color @@ -3034,7 +3097,7 @@ export const GET_ANNOTATIONS_FOR_ANALYSIS = gql` } allTargetNodeInRelationship { id - annotationLabel { + relationshipLabel { id text color @@ -4446,38 +4509,6 @@ export const GET_DOCUMENT_ONLY = GET_DOCUMENT_WITH_STRUCTURE; export type GetDocumentOnlyInput = GetDocumentWithStructureInput; export type GetDocumentOnlyOutput = GetDocumentWithStructureOutput; -/** - * Mutation to add a document to a corpus - */ -export interface AddDocumentToCorpusInput { - documentId: string; - corpusId: string; -} - -export interface AddDocumentToCorpusOutput { - addDocumentToCorpus: { - success: boolean; - message: string; - corpus: { - id: string; - title: string; - }; - }; -} - -export const ADD_DOCUMENT_TO_CORPUS = gql` - mutation AddDocumentToCorpus($documentId: ID!, $corpusId: ID!) { - addDocumentToCorpus(documentId: $documentId, corpusId: $corpusId) { - success - message - corpus { - id - title - } - } - } -`; - /** * Query to get user's corpuses for the Add to Corpus modal */ @@ -4496,7 +4527,7 @@ export interface GetMyCorpusesOutput { export const GET_MY_CORPUSES = gql` query GetMyCorpuses { - corpuses(isPublic: false, myPermissions: ["UPDATE"]) { + corpuses(isPublic: false) { edges { node { id @@ -4731,29 +4762,17 @@ export const GET_CORPUS_CONVERSATIONS = gql` `; export const GET_CORPUS_CHAT_MESSAGES = gql` - query GetCorpusChatMessages( - $conversationId: ID! - $cursor: String - $limit: Int - ) { - chatMessages( - conversation_Id: $conversationId - first: $limit - after: $cursor - ) { - edges { - node { - id - content - msgType - createdAt - data - creator { - id - slug - email - } - } + query GetCorpusChatMessages($conversationId: ID!) { + chatMessages(conversationId: $conversationId) { + id + content + msgType + createdAt + data + creator { + id + slug + email } } } @@ -4799,23 +4818,19 @@ export interface GetCorpusChatMessagesInputs { } export interface GetCorpusChatMessagesOutputs { - chatMessages: { - edges: Array<{ - node: { - id: string; - content: string; - msgType: string; - createdAt: string; - data: { - sources?: WebSocketSources[]; - message_id?: string; - }; - creator: { - email: string; - }; - }; - }>; - }; + chatMessages: Array<{ + id: string; + content: string; + msgType: string; + createdAt: string; + data: { + sources?: WebSocketSources[]; + message_id?: string; + }; + creator: { + email: string; + }; + }>; } export const GET_ME = gql` @@ -4947,17 +4962,6 @@ export const GET_DOCUMENT_BY_ID_FOR_REDIRECT = gql` username email } - corpus { - id - slug - title - creator { - id - slug - username - email - } - } } } `; @@ -4977,17 +4981,6 @@ export interface GetDocumentByIdForRedirectOutput { username: string; email: string; }; - corpus: { - id: string; - slug: string; - title: string; - creator: { - id: string; - slug: string; - username: string; - email: string; - }; - } | null; } | null; } @@ -6365,7 +6358,7 @@ export const GET_CORPUS_DOCUMENT_TOC_EDGES = gql` query GetCorpusDocumentTocEdges( $corpusId: ID $first: Int - $relationshipType: String + $relationshipType: DocumentsDocumentRelationshipRelationshipTypeChoices $annotationLabelText: String ) { documentRelationships( @@ -6606,14 +6599,6 @@ export const GET_RESEARCH_REPORT = gql` id slug } - corpus { - id - slug - creator { - id - slug - } - } } } } @@ -6672,14 +6657,6 @@ export const RESOLVE_RESEARCH_REPORT_BY_SLUG = gql` id slug } - corpus { - id - slug - creator { - id - slug - } - } } } } diff --git a/frontend/src/hooks/useNavigateToDocumentById.ts b/frontend/src/hooks/useNavigateToDocumentById.ts index 00f1f1547..899652a87 100644 --- a/frontend/src/hooks/useNavigateToDocumentById.ts +++ b/frontend/src/hooks/useNavigateToDocumentById.ts @@ -8,7 +8,7 @@ import { GetDocumentByIdForRedirectOutput, } from "../graphql/queries"; import { buildCanonicalPath } from "../utils/navigationUtils"; -import { CorpusType, DocumentType } from "../types/graphql-api"; +import { DocumentType } from "../types/graphql-api"; /** * Navigate to a document's canonical slug path given only its global id. @@ -42,10 +42,10 @@ export function useNavigateToDocumentById(): ( // CorpusType — enough for buildCanonicalPath, which only reads slug and // creator.slug. Narrow through `unknown` rather than `any` so the cast // stays explicit and the any-baseline guard is not tripped. - const path = buildCanonicalPath( - doc as unknown as DocumentType, - doc.corpus as unknown as CorpusType - ); + // The redirect query carries no corpus context (DocumentType has no + // corpus field — documents relate to corpora via paths), so this + // resolves the document's standalone canonical path. + const path = buildCanonicalPath(doc as unknown as DocumentType); if (path) navigate(path + (queryString || "")); }, [resolveDocumentById, navigate] diff --git a/frontend/src/routing/CentralRouteManager.tsx b/frontend/src/routing/CentralRouteManager.tsx index c99ab5791..42155efef 100644 --- a/frontend/src/routing/CentralRouteManager.tsx +++ b/frontend/src/routing/CentralRouteManager.tsx @@ -481,10 +481,13 @@ export function CentralRouteManager() { if (idData?.document) { // Redirect to canonical slug URL // Type assertion: redirect query doesn't include analyses field, - // but buildCanonicalPath only needs slug and creator + // but buildCanonicalPath only needs slug and creator. + // Corpus context comes from the ROUTE's slug resolution above — + // the redirect query itself carries none (DocumentType has no + // corpus field; documents relate to corpora via paths). const canonicalPath = buildCanonicalPath( idData.document as any, - idData.document.corpus as any + (data?.corpusBySlugs as any) ?? undefined ); if (canonicalPath) { navigate(canonicalPath + location.search, { replace: true }); diff --git a/frontend/src/views/Corpuses.tsx b/frontend/src/views/Corpuses.tsx index 6877836aa..9c38a0fc3 100644 --- a/frontend/src/views/Corpuses.tsx +++ b/frontend/src/views/Corpuses.tsx @@ -73,6 +73,9 @@ import { CREATE_CORPUS, CreateCorpusOutputs, CreateCorpusInputs, + SETUP_CORPUS_INTELLIGENCE, + SetupCorpusIntelligenceInputs, + SetupCorpusIntelligenceOutputs, DELETE_CORPUS, DeleteCorpusOutputs, DeleteCorpusInputs, @@ -834,6 +837,11 @@ export const Corpuses = () => { /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// // Query to delete corpus /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// + const [trySetupIntelligence] = useMutation< + SetupCorpusIntelligenceOutputs, + SetupCorpusIntelligenceInputs + >(SETUP_CORPUS_INTELLIGENCE); + const [tryCreateCorpus, { loading: create_corpus_loading }] = useMutation< CreateCorpusOutputs, CreateCorpusInputs @@ -950,6 +958,28 @@ export const Corpuses = () => { .then((data) => { if (data.data?.createCorpus.ok) { toast.success("SUCCESS. Created corpus."); + // Opt-in one-click intelligence setup: installs the reference-web + // action + description/summary agents and kicks them off. Failure + // here must not read as a failed corpus creation — surface softly. + const newCorpusId = data.data.createCorpus.objId; + if (formData.setupIntelligence && newCorpusId) { + const setupNotStarted = () => + toast.info( + "Corpus created, but intelligence setup couldn't start — " + + "you can run it from the corpus page." + ); + trySetupIntelligence({ + variables: { corpusId: newCorpusId }, + }) + .then((result) => { + // The mutation reports soft failures as ok=false rather than + // throwing — without this check the opt-in fails silently. + if (!result.data?.setupCorpusIntelligence?.ok) { + setupNotStarted(); + } + }) + .catch(setupNotStarted); + } } else { toast.error(`FAILED on server: ${data.data?.createCorpus.message}`); } diff --git a/frontend/tests/DocumentKnowledgeBaseCorpusless.ct.tsx b/frontend/tests/DocumentKnowledgeBaseCorpusless.ct.tsx index 58e7a801d..b2411ffb1 100644 --- a/frontend/tests/DocumentKnowledgeBaseCorpusless.ct.tsx +++ b/frontend/tests/DocumentKnowledgeBaseCorpusless.ct.tsx @@ -7,7 +7,6 @@ import { GET_DOCUMENT_KNOWLEDGE_AND_ANNOTATIONS, GET_DOCUMENT_ANNOTATIONS_ONLY, GET_MY_CORPUSES, - ADD_DOCUMENT_TO_CORPUS, GET_CONVERSATIONS, } from "../src/graphql/queries"; import { LINK_DOCUMENTS_TO_CORPUS } from "../src/graphql/mutations"; diff --git a/frontend/tests/DocumentReferencesPanel.ct.tsx b/frontend/tests/DocumentReferencesPanel.ct.tsx index 552ce3978..cc360033b 100644 --- a/frontend/tests/DocumentReferencesPanel.ct.tsx +++ b/frontend/tests/DocumentReferencesPanel.ct.tsx @@ -198,6 +198,9 @@ test.describe("DocumentReferencesPanel", () => { }) => { // Inbound rows have no link_url — they resolve the source document's slugs // via GET_DOCUMENT_BY_ID_FOR_REDIRECT, then navigate to its canonical path. + // The redirect query carries no corpus context (DocumentType has no corpus + // field — documents relate to corpora via paths), so the resulting path is + // the document's standalone canonical path: /d/{creator}/{doc}. const redirectMock = { request: { query: GET_DOCUMENT_BY_ID_FOR_REDIRECT, @@ -215,17 +218,6 @@ test.describe("DocumentReferencesPanel", () => { username: "acme", email: "acme@example.com", }, - corpus: { - id: CORPUS_ID, - slug: "ipo-s1-filings", - title: "Select 2026 IPO S-1 Filings", - creator: { - id: "VXNlcjox", - slug: "acme", - username: "acme", - email: "acme@example.com", - }, - }, }, }, }, @@ -243,7 +235,7 @@ test.describe("DocumentReferencesPanel", () => { > arrived} /> ({ + request: { + query: GET_CORPUS_INTELLIGENCE_SETUP_STATUS, + variables: { corpusId: CORPUS_ID }, + }, + result: { + data: { + corpusIntelligenceSetupStatus: { + referenceAvailable: true, + referenceActionInstalled, + installedTemplateNames: [], + missingTemplateNames: [], + isFullySetUp: false, + canSetup: true, + }, + }, + }, +}); + // Node click-through resolves the document's slugs via the redirect query, -// then navigates to its canonical path. Beta Energy's primary node is the +// then navigates to its standalone canonical path (the redirect query +// carries no corpus context). Beta Energy's primary node is the // unambiguous target (no exhibit shares its title). const redirectMock = { request: { @@ -242,17 +267,6 @@ const redirectMock = { username: "acme", email: "acme@example.com", }, - corpus: { - id: CORPUS_ID, - slug: "ipo-s1-filings", - title: "Select 2026 IPO S-1 Filings", - creator: { - id: "VXNlcjox", - slug: "acme", - username: "acme", - email: "acme@example.com", - }, - }, }, }, }, @@ -391,6 +405,7 @@ test.describe("GovernanceGraphLive", () => { makeGraphMock(null), analyzersMock(ENRICHMENT_ANALYZER_TASK_NAME), startAnalysisMock, + setupStatusMock(false), createCorpusActionMock, ]} addTypename={false} @@ -414,6 +429,52 @@ test.describe("GovernanceGraphLive", () => { await component.unmount(); }); + test("bootstrap skips installing the action when one is already installed", async ({ + mount, + page, + }) => { + // No createCorpusActionMock on purpose: with the reference action already + // installed (e.g. by one-click intelligence setup) the bootstrap must not + // fire CREATE_CORPUS_ACTION at all — an unexpected call would error and + // surface the "couldn't be installed" info toast. + const component = await mount( + + + <> + + + + + + ); + + const bootstrap = page.locator( + '[data-testid="governance-graph-live-bootstrap"]' + ); + await expect(bootstrap).toBeVisible({ timeout: 10000 }); + await bootstrap.click(); + + await expect( + page.locator('[data-testid="governance-graph-live-weaving"]') + ).toBeVisible({ timeout: 10000 }); + await expect( + page.getByText(/keep-it-updated action couldn't be installed/i) + ).toHaveCount(0); + + await component.unmount(); + }); + test("bootstrap surfaces an error when enrichment is unavailable", async ({ mount, page, @@ -466,7 +527,7 @@ test.describe("GovernanceGraphLive", () => { > arrived} />