A complete reframe of Yakabuff/redarc, modernizing the frontend, search capabilities, and data ingestion workflow.
| Before | After |
|---|---|
| Bootstrap 2 era styling | Dark theme with JetBrains Mono + Source Serif 4 |
class instead of className |
Proper React patterns throughout |
| Direct DOM manipulation for pagination | State-driven cursor-based pagination |
| Hardcoded year dropdown (stops at 2023) | Dynamic date range filters |
| No loading/error/empty states | Full loading skeletons, error boundaries, empty states |
| Search requires ALL fields | Optional filters, search across all subreddits |
| Fixed/basic search behavior | Advanced search filters, keywords filter, partial/phrase matching, emoji-aware search |
| Sorting tied to form submit | Sort updates on change, with dedicated sort controls above results |
| Fixed page size | Per-page result selector in search results |
| No file upload UI | Full drag-and-drop upload with progress tracking |
| Inline fetch calls everywhere | Centralized API client (src/utils/api.js) |
| No responsive design | Mobile-first responsive layout |
| Endpoint | Method | Purpose |
|---|---|---|
POST /upload |
POST | Upload NDJSON files via multipart form |
GET /upload/status |
GET | Check upload job progress |
GET /stats |
GET | Archive-wide statistics |
POST /admin/delete |
POST | Admin delete by filters with preview + confirmation |
BEFORE: AFTER:
CLI only data ingestion: UI + CLI data ingestion:
$ python3 load_sub.py <file> Drag & drop in browser
$ python3 load_comments.py <file> OR python3 load_sub.py <file>
$ python3 load_sub_fts.py <file>
$ python3 load_comments_fts.py /upload endpoint handles:
$ python3 index.py [subreddit] - Parsing NDJSON
- Inserting to main PG
- Inserting to FTS PG
- Auto-indexing subreddits
- Background processing
- Progress reporting
redarc/
├── api/
│ ├── app.py # Updated: CORS middleware, new routes
│ ├── upload.py # NEW: File upload + processing + stats
│ ├── admin_delete.py # NEW: Admin delete-by-filter endpoint
│ ├── comments.py # Unchanged
│ ├── submissions.py # Unchanged
│ ├── subreddits.py # Unchanged
│ ├── search.py # Unchanged
│ ├── submit.py # Unchanged
│ ├── media.py # Unchanged
│ ├── progress.py # Unchanged
│ ├── status.py # Unchanged
│ ├── unlist.py # Unchanged
│ ├── watch.py # Unchanged
│ └── redarc_logger.py # Unchanged
│
├── frontend/
│ ├── src/
│ │ ├── main.jsx # Entry point with router
│ │ ├── utils/
│ │ │ └── api.js # NEW: Centralized API client
│ │ ├── components/ # NEW: Reusable UI components
│ │ │ ├── Header.jsx
│ │ │ ├── Breadcrumb.jsx
│ │ │ ├── EmptyState.jsx
│ │ │ ├── Loading.jsx
│ │ │ ├── Pagination.jsx
│ │ │ ├── CommentTree.jsx
│ │ │ └── Toast.jsx
│ │ └── pages/ # NEW: Page-level components
│ │ ├── Index.jsx # Subreddit grid + stats
│ │ ├── Subreddit.jsx # Submission list
│ │ ├── Thread.jsx # Post + threaded comments
│ │ ├── Search.jsx # Full-text search with filters
│ │ ├── Upload.jsx # File upload + URL submit
│ │ └── Admin.jsx # Watch/unlist/progress + danger-zone delete
│ ├── package.json # Updated deps (Tailwind, Lucide)
│ └── vite.config.js
│
├── scripts/ # Unchanged (CLI still works)
├── ingest/ # Unchanged
├── docker-compose.yml # Unchanged
└── Dockerfile # Unchanged
Upload an NDJSON file for processing.
Request: multipart/form-data
| Field | Type | Required | Description |
|---|---|---|---|
file |
File | Yes | Data file (.json, .ndjson, .zst, .zstd) |
type |
String | No | submissions, comments, or auto (default: auto) |
password |
String | No | Ingest password if INGEST_PASSWORD is set |
target |
String | No | main, fts, or both (default: both) |
auto_index |
String | No | true or false (default: true) |
on_conflict |
String | No | skip or update (default: skip) |
Response: 202 Accepted
{
"status": "accepted",
"job_id": "a3f2b1c8",
"filename": "RS_2023-01.ndjson",
"file_size": 1048576
}Check upload job progress.
Query params: ?job_id=a3f2b1c8 (optional — omit for all jobs)
Response:
{
"id": "a3f2b1c8",
"filename": "RS_2023-01.ndjson",
"status": "processing",
"lines_processed": 45000,
"inserted": 44800,
"skipped": 150,
"errors": 50,
"subreddits": ["programming", "python"]
}Archive statistics.
Response:
{
"subreddits": 42,
"submissions": 284521,
"comments": 3842156,
"total_records": 4126677
}Danger-zone admin delete with two-step flow.
Behavior:
- first call uses
dry_run=trueto preview matched row counts - execute call requires
confirm_text=DELETE - execute call requires valid
ADMIN_PASSWORD(preview does not) - delete is scoped to exactly one subreddit
Request: application/json
| Field | Type | Required | Description |
|---|---|---|---|
dry_run |
Bool | No | Preview mode (default true) |
target |
String | Yes | submissions or comments |
subreddit |
String | Yes | Single subreddit only |
author |
String | No | Optional exact author filter |
keywords |
String | No | Optional keyword filter |
after |
Integer | No | Unix timestamp lower bound |
before |
Integer | No | Unix timestamp upper bound |
confirm_text |
String | Execute only | Must be DELETE |
password |
String | Execute only | Must match ADMIN_PASSWORD |
All API calls are centralized in frontend/src/utils/api.js:
import { fetchSubreddits, search, uploadFile, fetchUploadStatus } from './utils/api';
// Fetch subreddits with abort support
const controller = new AbortController();
const subs = await fetchSubreddits(controller.signal);
// Full-text search
const results = await search({
type: 'submission',
subreddit: 'programming',
query: 'rust',
keywords: 'performance',
match: 'partial', // or 'phrase'
sort_by: 'new',
limit: 50,
after: '1672531200',
});
// Upload file
const job = await uploadFile(fileObject, {
type: 'submissions',
password: 'mypass',
target: 'both',
autoIndex: true,
onConflict: 'skip', // or 'update'
});
// Poll job status
const status = await fetchUploadStatus(job.job_id);The upload endpoint replaces the need to run 4 separate CLI scripts. Here's what happens internally:
1. File received via multipart POST
2. Saved to /tmp/redarc_uploads/
3. Background thread spawned
4. NDJSON parsed line-by-line (streaming, low memory)
5. Each line parsed with same logic as load_sub.py / load_comments.py
6. Batch INSERT (500 rows) into main PG
7. Batch INSERT into FTS PG (if target=both|fts)
8. Conflict handling: `ON CONFLICT DO NOTHING` (`skip`) or `DO UPDATE` (`update`)
9. Auto-index: UPDATE subreddits table with counts
10. Temp file cleaned up
11. Job status updated (poll via /upload/status)
Auto-detection: When type=auto, the parser checks for title/selftext fields (→ submission) vs body field (→ comment). This handles mixed files gracefully.
-
Backend: Replace
api/app.pywith the new version. Addapi/upload.pyandapi/admin_delete.py. -
Frontend: Complete replacement. Delete old
frontend/src/and replace with new structure. -
Docker: Host port mappings are configurable in
docker-compose.yml:REDARC_HTTP_PORT(default:8088)PG_HOST_PORT(default:55432)PGFTS_HOST_PORT(default:55433)REDIS_HOST_PORT(default:56379)- Internal API bind uses
API_PORT(default:18000)
-
Data: No database schema changes. All existing data is preserved.
-
Env vars: Add the port env vars above to
.envif you need custom bindings.
- Frontend routes changed (but old URL patterns still work via the router)
- Bootstrap CSS removed entirely (replaced with Tailwind + custom CSS)
class→classNamethroughout (was a React anti-pattern)
The modernized frontend uses:
- Typography: JetBrains Mono (UI/code) + Source Serif 4 (prose/titles)
- Colors: Dark base (#0a0a0b) with Reddit orange (#ff4500) accent
- Components: All custom — no component library dependency
- Icons: Lucide React (tree-shakeable, 1KB per icon)
- Layout: CSS Grid + Flexbox, responsive breakpoints at 768px
MIT — same as original.