Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
dist/
lib/
node_modules/
8 changes: 8 additions & 0 deletions .prettierrc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"semi": true,
"trailingComma": "none",
"singleQuote": true,
"printWidth": 80,
"tabWidth": 2,
"endOfLine": "lf"
}
58 changes: 47 additions & 11 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
npm run build # TypeScript check + esbuild bundle → dist/index.js
npm run format # Prettier format all .ts files
npm run format-check # Check formatting without writing
npm run lint # ESLint
npm run lint # ESLint (src/ only)
npm test # Vitest
npm run test:coverage # Vitest with coverage report
```
Expand All @@ -25,16 +25,15 @@ npm run test:coverage # Vitest with coverage report
- esbuild for bundling
- vitest for testing (80% coverage threshold)
- ESLint 9 flat config
- `@actions/core`, `@octokit/rest`, `adm-zip`
- `@actions/core`, `@octokit/rest`, `adm-zip`, `js-yaml`

## Architecture

### Pipeline

```
Discover → Fetch → Parse → Filter → Download → Extract → Index
(repo) (releases) (metadata) (channel/ (zip) (files) (JSON/
stage) JSONL)
Discover → Check Manifest → Fetch → Parse → Filter → Dedup → Download → Extract → Index → Delta Save
(repo) (channels.yml) (ETag) (meta) (ch/st) (hash) (zip) (files) (JSON) (state)
```

Each stage is an interface with a default implementation. `main.ts` is the composition root — it constructs all dependencies and injects them into `AggregationPipeline`.
Expand All @@ -44,17 +43,25 @@ Each stage is an interface with a default implementation. `main.ts` is the compo
```
src/
├── domain/ Types and interfaces
│ └── types.ts ReleaseMetadataJson, AggregatedDocument, IRepoDiscoverer, etc.
│ └── types.ts All domain types, pipeline interfaces (IRepoDiscoverer, IReleaseFetcher, etc.)
├── discovery/ Repo discovery strategies
│ ├── topic-discoverer.ts Search by GitHub topic across organizations
│ └── explicit-discoverer.ts Parse explicit owner/repo list
├── fetching/ Release fetching with pagination + ETag
│ └── release-fetcher.ts PaginatedReleaseFetcher — fetches all pages, handles 304
├── manifest/ Channel manifest reading
│ └── manifest-reader.ts GitHubManifestReader — reads .metanorma/channels.yml
├── filtering/ Release filtering
│ ├── channel-filter.ts Filter by channel (audience/category)
│ ├── channel-filter.ts Filter by channel (with overlaps() for manifest check)
│ └── stage-filter.ts Filter by publication stage
├── processing/ Asset processing
│ └── asset-processor.ts Download zip, extract, canonicalize filenames
├── caching/ Persistent cache (backed by actions/cache)
│ └── cache-store.ts FileCacheStore + NullCacheStore
├── processing/ Asset processing with file routing
│ └── asset-processor.ts Extract zip, canonicalize filenames, route to subdirs
├── indexing/ Index generation
│ └── index-generator.ts JSON and JSONL document index
├── delta/ Delta aggregation state
│ └── state-manager.ts DeltaStateManager — content-hash dedup, stale file cleanup
├── shared/ Utilities
│ ├── logger.ts PrefixLogger with scoped() for per-repo context
│ └── concurrency.ts mapWithConcurrency for bounded parallelism
Expand All @@ -66,10 +73,30 @@ src/
### Pipeline Interfaces

- `IRepoDiscoverer` — discover repos (topic search or explicit list)
- `ChannelFilter` — filter releases by configured channels
- `IReleaseFetcher` — fetch all releases with pagination and ETag support
- `IManifestReader` — read `.metanorma/channels.yml` for early repo filtering
- `ICacheStore` — key-value cache for ETags and delta state
- `ChannelFilter` — filter releases by configured channels (with `overlaps()` for manifest)
- `StageFilter` — filter releases by configured stages
- `AssetProcessor` — download, extract, and canonicalize zip contents
- `AssetProcessor` — download, extract, canonicalize, and route zip contents
- `IndexGenerator` — produce JSON/JSONL document index
- `DeltaStateManager` — content-hash dedup, stale file cleanup, state persistence

### Caching Architecture

When `cache-dir` is set, the action uses a `FileCacheStore` (persisted via `actions/cache`) for:
- **ETags**: Skip repos whose releases haven't changed (HTTP 304)
- **Content hashes**: Skip re-downloading releases with unchanged content
- **Delta state**: Track processed releases per repo, clean up stale files

Null implementations (`NullCacheStore`, `NullManifestReader`, `NullDeltaManager`) are used when caching is disabled.

### File Routing

`AssetProcessor` supports three output structures via `file-routing` input:
- `flat` (default): all files in `output-dir/`
- `by-doctype`: `{output-dir}/{doctype}/` subdirectories
- `by-format`: `{output-dir}/{ext}/` subdirectories

### Release Metadata Protocol

Expand Down Expand Up @@ -106,6 +133,12 @@ Strip edition suffixes using regex `/-ed\d+(\.\d+)?(-[a-z0-9]+)?\./`:
}
```

### Error Reporting

- `aggregation-report` output includes per-release error details (`errors` array in `RepoReport`)
- `failed-repos` output lists repos that had processing errors
- `fail-on-error: true` fails the action when any repo has errors

## Conventions

- Immutable value objects (readonly props, no setters)
Expand All @@ -115,3 +148,6 @@ Strip edition suffixes using regex `/-ed\d+(\.\d+)?(-[a-z0-9]+)?\./`:
- Logger prefix: `[mn-aggregate]`
- Content hash on first line of release body for change detection
- Channels use hierarchical `audience/category` format (e.g., `public/standards`)
- Test helper factories (`makeDeps`, `makeRelease`, `mockFetch`) for DRY test setup
- `vi.fn()` for all mocks, `vi.clearAllMocks()` in `beforeEach`
- Real temp directories in tests, cleaned up in `afterEach`
59 changes: 52 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,24 @@ Part of the [actions-mn](https://github.com/actions-mn) ecosystem. Consumes rele
token: ${{ secrets.GITHUB_TOKEN }}
```

### With caching (recommended)

```yaml
- uses: actions/cache@v4
with:
path: .cache/mn-aggregate
key: mn-aggregate-${{ github.run_id }}
restore-keys: mn-aggregate-

- uses: actions-mn/aggregate@v1
with:
organizations: CalConnect
channels: 'public/standards'
output-dir: _site/cc
cache-dir: .cache/mn-aggregate
token: ${{ secrets.GITHUB_TOKEN }}
```

## Inputs

| Input | Description | Default |
Expand All @@ -28,9 +46,13 @@ Part of the [actions-mn](https://github.com/actions-mn) ecosystem. Consumes rele
| `stages` | Comma-separated stages to include. Empty = all. | `''` |
| `output-dir` | Directory for extracted document files | `_site/documents` |
| `index-format` | Index format: `json` or `jsonl` | `json` |
| `file-routing` | File output structure: `flat`, `by-doctype`, or `by-format` | `flat` |
| `canonicalize` | Strip edition suffixes from filenames | `true` |
| `include-drafts` | Include GitHub draft releases | `false` |
| `fail-on-error` | Fail the action if any repo processing fails | `false` |
| `concurrency` | Max parallel repo processing | `4` |
| `cache-dir` | Directory for persistent cache (ETags, content hashes, delta state). Empty = no caching. | `''` |
| `force-full` | Force full aggregation, ignoring cached state | `false` |
| `token` | GitHub token for API access | `${{ github.token }}` |

## Outputs
Expand All @@ -41,16 +63,21 @@ Part of the [actions-mn](https://github.com/actions-mn) ecosystem. Consumes rele
| `index-path` | Path to the generated index file |
| `repo-count` | Number of repos scanned |
| `channels-found` | JSON array of all channels found |
| `aggregation-report` | JSON object with per-repo statistics |
| `aggregation-report` | JSON object with per-repo statistics and error details |
| `failed-repos` | JSON array of repos that had processing errors |

## How it works

1. **Discover** — Finds repos by GitHub topic or from an explicit list
2. **Fetch** — Lists releases from each repo via GitHub API
3. **Parse** — Extracts `mn-release-metadata` JSON from release bodies
4. **Filter** — Includes releases matching configured channels and stages
5. **Download** — Downloads zip assets, extracts, and canonicalizes filenames
6. **Index** — Generates a structured JSON document index
2. **Check manifest** — Reads `.metanorma/channels.yml` to skip repos with no matching channels
3. **Fetch** — Lists all releases with pagination; sends ETag to skip unchanged repos
4. **Parse** — Extracts `mn-release-metadata` JSON from release bodies
5. **Filter** — Includes releases matching configured channels and stages
6. **Dedup** — Skips releases with unchanged content hashes
7. **Download** — Downloads zip assets, extracts, and canonicalizes filenames
8. **Route** — Organizes files by flat/by-doctype/by-format structure
9. **Index** — Generates a structured JSON document index
10. **Delta save** — Persists state for incremental runs

## Index format

Expand Down Expand Up @@ -78,14 +105,21 @@ The action writes `index.json` (or `index.jsonl`) to the output directory:
output-dir: _site/guides
```

### Multi-org
### Multi-org with caching

```yaml
- uses: actions/cache@v4
with:
path: .cache/mn-aggregate
key: mn-aggregate-${{ github.run_id }}
restore-keys: mn-aggregate-

- uses: actions-mn/aggregate@v1
with:
organizations: 'OrgA,OrgB'
channels: 'public/standards'
output-dir: _site/docs
cache-dir: .cache/mn-aggregate
token: ${{ secrets.PAT_TOKEN }}
```

Expand All @@ -102,6 +136,17 @@ The action writes `index.json` (or `index.jsonl`) to the output directory:
token: ${{ secrets.MEMBER_TOKEN }}
```

### Structured output by document type

```yaml
- uses: actions-mn/aggregate@v1
with:
organizations: CalConnect
channels: 'public/standards'
output-dir: _site/cc
file-routing: by-doctype
```

## Backward compatibility

Releases without `mn-release-metadata` (pre-channel releases) are always included, ensuring smooth migration from older versions of `actions-mn/release`.
Expand Down
45 changes: 0 additions & 45 deletions TODO/01-etag-caching.md

This file was deleted.

50 changes: 0 additions & 50 deletions TODO/02-repo-channel-manifest.md

This file was deleted.

49 changes: 0 additions & 49 deletions TODO/03-content-hash-dedup.md

This file was deleted.

Loading