Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 37 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,39 @@ findings = await agent.structured_response(
[![Sponsor](https://img.shields.io/static/v1?label=Sponsor&message=%E2%9D%A4&logo=GitHub&color=%23fe8e86)](https://github.com/sponsors/JSv4)

| | |
| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Backend coverage | [![backend](https://codecov.io/gh/Open-Source-Legal/OpenContracts/branch/main/graph/badge.svg?flag=backend&token=RdVsiuaTVz)](https://app.codecov.io/gh/Open-Source-Legal/OpenContracts?flags%5B0%5D=backend) |
| Frontend coverage | [![frontend](https://codecov.io/gh/Open-Source-Legal/OpenContracts/branch/main/graph/badge.svg?flag=frontend&token=RdVsiuaTVz)](https://app.codecov.io/gh/Open-Source-Legal/OpenContracts?flags%5B0%5D=frontend) |
| Meta | [![code style - black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![types - Mypy](https://img.shields.io/badge/types-Mypy-blue.svg)](https://github.com/python/mypy) [![imports - isort](https://img.shields.io/badge/imports-isort-ef8336.svg)](https://github.com/pycqa/isort) [![License - MIT](https://img.shields.io/badge/license-MIT-green)](https://opensource.org/licenses/MIT) |

---

![Discovery Landing Page](docs/assets/images/screenshots/auto/landing--discovery-page--anonymous.png)
## From documents to a citation graph — in about a minute

Create a corpus, drop in your documents, and click **Set up**. That one click installs the
intelligence bundle: agents describe and summarize every document, and the reference web
starts weaving — every statutory citation detected, resolved, and drawn as an edge.

![Create a corpus and set up collection intelligence in one click](docs/assets/images/gifs/demo-1-create-and-setup.gif)

By the end of the clip, 36 SEC filings are a navigable graph — wired to the Delaware
General Corporation Law, the Securities Act, and the SEC rules they cite, section by
section. Law the library doesn't hold yet isn't dropped on the floor: it's tracked as a
backlog, automatically, until you ingest it.

### Then explore it — and ask it questions

Citations are highlighted inline on the filings themselves. The References panel lists
everything a document cites — click any cite to open the statute, with its own
cross-references and everything that cites it back. The ask bar runs a corpus-scoped
agent whose answers come back grounded and cited.

![Explore the citation graph — inline citations, the references panel, and grounded answers](docs/assets/images/gifs/demo-2-explore-and-ask.gif)

Everything in both clips is the stock product against a local install — no custom code,
and every surface the UI touches is also reachable over the API and MCP server below.

---

## Build on it

Expand Down Expand Up @@ -155,23 +180,23 @@ The engine — annotation, corpus management, AI agents, MCP server, vector sear

This is not another chat-with-your-PDFs tool. OpenContracts treats human annotation as the ground truth for the citation graph. Teams define custom label schemas, annotate documents with precise selections (including multi-page spans), and map relationships between concepts. AI builds on top of that work — it doesn't replace it.

![Document Annotator](docs/assets/images/screenshots/auto/readme--document-annotator--with-pdf.png)
![Precise, layout-faithful annotations on a PDF — colored label spans, multi-page sections, and the annotation sidebar](docs/assets/images/screenshots/auto/annotations--pdf-canvas--with-labels.png)

### Corpuses, Not File Cabinets

Documents are organized into corpuses — version-controlled collections with folder hierarchies, fine-grained permissions, and full history. Fork a public corpus to build on someone else's annotations. Restore any previous version. Every change is tracked.

This is `git` for the citation graph: branch, build, share, never lose work.

![Corpus Home](docs/assets/images/screenshots/auto/readme--corpus-home--with-chat.png)
![Collection intelligence overview — document, connection, annotation, and extract counts, summary coverage, dominant labels, and the governance graph](docs/assets/images/screenshots/auto/corpus--intelligence-overview--with-data.png)

### AI Agents That Work With What You've Built

Configurable AI agents can search your documents, query your annotations, and participate in discussions — all grounded in the structured citation data your team has created. They don't hallucinate in a vacuum; they reason over real, curated edges.

@mention an agent in a discussion thread. Ask it to compare clauses across a hundred contracts. Let it surface patterns your team annotated last quarter. The agent's power comes from the quality of the citation graph underneath it.

![AI Agent Response](docs/assets/images/screenshots/auto/threads--agent-message--response.png)
![An agent grounding its answer in tool calls — similarity search, exact-text search, and document lookups over the corpus](docs/assets/images/screenshots/auto/chat--tool-popover--multi-tool.png)

### Collaboration Where the Citations Live

Expand All @@ -189,7 +214,9 @@ This is the DRY principle applied to the citation graph: annotate once, build on

---

## See it in Action
## Annotation flows

The human side of the graph — precise, layout-faithful annotation on PDFs and text:

### PDF Annotation Flow

Expand Down Expand Up @@ -240,10 +267,10 @@ docker compose -f production.yml up -d

The discover/landing page and the `/about` page are driven by a JSON content pack so deployers can retarget the messaging without forking the codebase. Two variants ship in the repo:

| Variant key | Framing | Best fit |
| --------------- | ------------------------------------------------------ | ------------------------------------------------------------------------------- |
| `default` | _Open-source document intelligence you can build on._ | The OSS project's repo and most self-hosted deployments — developer-facing. |
| `public-record` | _The citation layer underneath the public record._ | End-user deployments curating public-domain documents (named-incumbents pitch). |
| Variant key | Framing | Best fit |
| --------------- | ----------------------------------------------------- | ------------------------------------------------------------------------------- |
| `default` | _Open-source document intelligence you can build on._ | The OSS project's repo and most self-hosted deployments — developer-facing. |
| `public-record` | _The citation layer underneath the public record._ | End-user deployments curating public-domain documents (named-incumbents pitch). |

Switch variants at runtime by setting `REACT_APP_LANDING_VARIANT` in `frontend/public/env-config.js` — no rebuild required. Unknown variant keys fall back to `default`.

Expand Down
49 changes: 49 additions & 0 deletions changelog.d/1982-review-fixes.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
- **Intelligence setup: large corpora no longer silently skip enrichment.**
`CorpusActionService` gained `batch_run_action(user, action, allow_partial=)`
(`opencontractserver/corpuses/services/corpus_actions.py`) — the trusted-caller
variant the one-click setup now uses with `allow_partial=True`, queuing the
first `BATCH_RUN_MAX_DOCS` documents (deterministic id order) instead of
refusing outright when a corpus exceeds the per-call cap. The per-template
outcome (`TemplateSetupOutcome.remaining_count`, exposed as `remainingCount`
on `IntelligenceTemplateOutcomeType`) reports the deferred remainder and the
banner toast surfaces it. Previously a 250-doc corpus got a success toast, a
permanently hidden banner, and zero documents enriched.
- **Intelligence-setup status no longer demands deployment-unavailable pieces.**
`IntelligenceSetupStatus.is_fully_set_up`
(`opencontractserver/corpuses/services/intelligence_setup.py`) excludes the
reference action when no enrichment analyzer is registered
(`reference_available`, new on the status payload) and excludes bundle
templates that are unseeded/inactive deployment-wide — either condition
previously made the setup CTA an undismissable zombie whose every click
toasted success.
- **Setup CTA hidden from viewers who can't run it.** The status payload gained
`can_setup` (mirrors the mutation's permission gate);
`IntelligenceSetupBanner.tsx` renders nothing unless `canSetup` — read-only
and anonymous viewers of a public not-set-up corpus previously saw a
guaranteed-to-fail "Set up" button.
- **Permission tier harmonized to CRUD.** `setupCorpusIntelligence` (service +
mutation docstrings, `config/graphql/corpus_mutations.py`) now requires CRUD
on the corpus — the tier `AddTemplateToCorpus` and `CreateCorpusAction`
already gate the identical writes at; it previously required only UPDATE, a
weaker path to the same row installs.
- **Reference action can no longer be double-installed.** The governance
graph's "Map the reference web" bootstrap
(`GovernanceGraphLive.tsx`) consults `corpusIntelligenceSetupStatus` and
skips `createCorpusAction` when the add_document reference action already
exists (a duplicate row would run the enrichment analyzer twice on every
future upload); the server side switched to `get_or_create` to narrow the
concurrent-race window.
- **Post-create setup opt-in surfaces soft failures.** `Corpuses.tsx` now
inspects the resolved `setupCorpusIntelligence.ok` and shows the
"couldn't start" toast — an `ok=false` envelope was previously discarded,
leaving users to believe enrichment was running.
- **Setup warning toast names the actual failures.** The banner aggregates
`templates[].error` into the warning instead of a generic guess.
- **Dedup/cleanup.** Template installs go through a single shared
`CorpusActionService.install_template` (dedupe fast-path, savepoint clone,
IntegrityError recovery, CRUD grant) used by both `AddTemplateToCorpus` and
the bundle; the enrichment analyzer lookup goes through the new lookup-only
`EnrichmentService.get_analyzer()` next to the converge logic; setup
prefetches bundle templates with `name__in` and derives
`total_active_documents` from the batch summary instead of a redundant
corpus-document count.
1 change: 1 addition & 0 deletions changelog.d/1982-setup-templates-partial-success.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
- `CorpusIntelligenceSetupService._setup_templates` (`opencontractserver/corpuses/services/intelligence_setup.py`) now contains non-`IntegrityError` clone failures (e.g. `OperationalError`, `ValueError`) per template instead of letting them propagate out of the loop. Previously such a failure aborted the remaining templates and returned a 500 with earlier templates left half-installed; the bundle's graceful partial-success contract is now honored — the failing template records its error and the sweep continues.
31 changes: 31 additions & 0 deletions changelog.d/corpus-intelligence-setup.added.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
- **One-click collection-intelligence setup** — the orchestration layer the
enrichment pieces were missing: nothing previously composed the
deterministic reference web with the LLM document enrichment at corpus
setup, so new corpora landed with unreadable document indexes (raw import
metadata as descriptions, 0% summary coverage) until each action was
manually added from the Action Library and batch-run.
- `CorpusIntelligenceSetupService`
(`opencontractserver/corpuses/services/intelligence_setup.py`):
idempotent composite that installs the reference-enrichment
`add_document` action + starts the first weave, clones the
*Document Description Updater* and *Document Summary Generator*
templates (bundle pinned in
`opencontractserver/constants/corpus_actions.py`
`INTELLIGENCE_SETUP_TEMPLATE_NAMES`), and batch-runs each over every
document already in the corpus. Re-running converges: existing action
rows are reused, already-run documents are skipped, an in-flight
reference analysis is not duplicated.
- GraphQL: `setupCorpusIntelligence` mutation +
`corpusIntelligenceSetupStatus` query
(`config/graphql/corpus_mutations.py`, `corpus_queries.py`,
`corpus_types.py`); `createCorpus` now returns `objId` so follow-up
mutations can chain off creation.
- Frontend: `IntelligenceSetupBanner`
(`frontend/src/components/corpuses/CorpusHome/intelligence/`) renders a
setup CTA inside `IntelligencePanel` (so both the intelligence overview
and the `insight-panel` CAML embed surface it) and hides once the bundle
is installed; the New Corpus modal gains a default-on "Set up collection
intelligence" opt-in that chains the mutation after creation
(`CorpusModal.tsx`, `views/Corpuses.tsx`).
- Tests: `opencontractserver/tests/test_intelligence_setup.py` (service +
GraphQL), `frontend/tests/IntelligenceSetupBanner.ct.tsx`.
48 changes: 48 additions & 0 deletions changelog.d/graphql-spec-validation.fixed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
- **GraphQL spec validation restored on the served endpoint (security).**
``GraphQLView(validation_rules=[DepthLimit…])`` REPLACED graphql-core's
spec rule set (that is ``validate()``'s semantics for an explicit rules
list), silently disabling every standard GraphQL validation — unknown
arguments/fields and variable-type checks — in all environments. Invalid
queries executed with the bogus parts ignored, which let ~26 invalid
frontend documents ship unnoticed (several backing silently-broken
features). ``config/graphql/schema.py`` now builds
``[*specified_rules, DepthLimitValidationRule(, DisableIntrospection)]``,
pinned by ``test_security_hardening.TestServedValidationRulesIncludeSpecRules``.
Every shipped frontend document is now swept in CI by
``opencontractserver/tests/architecture/test_frontend_graphql_documents.py``
(ad-hoc: ``scripts/validate_frontend_graphql.py``; the sweep strips Apollo
``@client`` selections and skips fragment-only/interpolated documents).
- **All 26 invalid frontend documents repaired**, including features that
could never have worked: ``deleteMetadataColumn`` and ``updateFieldset``
were called by the UI but did not exist server-side (both now implemented
in ``config/graphql/extract_mutations.py`` via the BaseService
get_or_none/require_permission pattern with IDOR-unified messages);
``GET_CORPUS_CHAT_MESSAGES`` used a misspelled argument + relay shape on a
plain list field (corpus chat history always loaded empty objects);
``tokenAuth`` was schema-conditional on ``USE_AUTH0`` (now always the
``WithUser`` payload, so the login document validates everywhere); the
document-by-id redirect selected the nonexistent ``DocumentType.corpus``
(corpus context now sourced from the route's slug resolution where it
exists — the previous mock-only field meant graph-node click-throughs
always landed on standalone paths); dead ``ADD_DOCUMENT_TO_CORPUS``
removed; plus variable-type (ID!/String!, JSONString/GenericScalar,
String/enum) and payload-field corrections across vote, thread-moderation,
research-report, TOC and corpus-list documents.
- **Presigned file URLs no longer outlive their signatures.** The AWS
settings branch derived the shared file-URL cache lifetime from
``_AWS_EXPIRY`` (the stored objects' HTTP CacheControl max-age, 7 days)
instead of the presign lifetime (``AWS_QUERYSTRING_EXPIRE``, 1 hour), so
redis served dead 403 pdf/pawls/txt links for up to 5 hours.
``AWS_QUERYSTRING_EXPIRE`` is now explicit, the cache TTL derives from it,
and ``clamp_shared_url_cache_ttl`` (``opencontractserver/utils/files.py``)
enforces TTL ≤ half the signature lifetime even against env overrides.
- **3-minute analysis-annotation responses fixed.**
``UserFeedbackQuerySet.visible_to_user`` expressed annotation-inherited
visibility as ``commented_annotation_id__in=<visible-annotations
subquery>`` — an uncorrelated ``IN`` materialized over the entire
annotations table on every evaluation (~0.8s each; 216 pagination counts
made ``GetAnnotationsForAnalysis`` take ~176s for a 108-mention document).
Rewritten as a correlated ``Exists`` pinned to the feedback row's
annotation id — identical semantics (permissioning invariant suites pass),
measured 176s → 2.3s. Shape pinned by
``test_feedback.TestVisibilityQueryShape``.
Loading