Codex Harness

A practical Codex handbook plus a reversible RTK + Graphify setup harness.

This repo has two jobs:

Learn Codex: what it is, how to use it, and what common words like skills, MCP, plugins, and subagents mean.
Install the harness: set up RTK and Graphify so Codex can inspect repositories with less noisy context and a clear rollback path.

This repo is written for people who want a clear operating model for Codex, from everyday use through a more structured RTK + Graphify setup.

Safety First

Codex can read files, edit files, and run commands, so give it a safe workspace.

Good defaults:

work inside a Git repository
keep raw data, secrets, and irreplaceable files outside that repository
commit or stash work before large changes
use sandbox/approval settings while learning
read approval prompts before allowing commands outside the project

On Windows, WSL2 is usually the easiest place to run terminal-based developer tools. Git helps because you can inspect diffs and roll back tracked files, but it is not a backup system.

Start Here

If you want to...	Read this first
Understand Codex concepts	Learn Codex
Install RTK + Graphify	Install the Harness
Copy commands	Useful Commands
Undo the setup	Rollback Guide

flowchart LR
    A[Learn Codex] --> B[Use Codex on a repo]
    B --> C[Install Harness]
    C --> D[RTK: shorter command output]
    C --> E[Graphify: repo map]
    D --> F[Lower-noise Codex sessions]
    E --> F

Part 1: Learn Codex

Codex is a coding assistant that can act on a repository. A repository is basically a project folder with a history: it contains the files Codex should understand, such as code, documentation, configuration, small example data, and tests. In a research or data workflow, the repo usually holds the code that reads raw data, produces processed outputs, and records how the work was done. Keep large raw data, secrets, and bulky generated outputs outside Git unless your team has a deliberate system for tracking them.

You give Codex a task, and it can inspect files, edit code, run commands, explain choices, and help check whether the result works.

You do not need to learn every term before using it. Start with this loop:

Open a repository.
Explain the goal and constraints.
Let Codex inspect the code.
Review what it changes.
Run tests or checks.
Commit when the result is good.

Concept	What it means in practice	When it makes sense	Guide
Codex CLI	Codex running in a terminal against files on your machine or remote shell. It can inspect the repo, edit files, run commands, and report what changed.	Use it for local development, focused repo edits, tests, refactors, documentation work, and iterative debugging.	Codex CLI
Codex app/cloud	Codex running through an app or hosted task flow rather than only your local terminal.	Use it for longer-running tasks, PR-oriented work, or work you want to delegate and review outside an active shell session.	Codex App and Cloud
`AGENTS.md`	Repo instructions that Codex reads before working in that repository. The model should follow the commands, style rules, safety rules, and repo-specific workflow written there.	Use it when a repo has stable conventions: test commands, formatting rules, architecture notes, generated-file warnings, or instructions like “read Graphify first.”	Settings
Skills	Reusable instruction packs for tasks you want done the same way each time. A skill can include a `SKILL.md`, references, scripts, and templates.	Use skills for repeatable workflows that need precision: polishing a deck, reviewing a PR, creating a plugin, checking official OpenAI docs, or following a lab/team process.	Concepts
MCP	Model Context Protocol: a way to connect Codex to external tools, data sources, or structured APIs through an MCP server.	Use MCP when Codex needs controlled access to something outside the repo, such as docs, browser automation, design files, databases, or internal systems.	Concepts
Plugins	Bundles that can package skills, MCP setup, connectors, and app integrations.	Use plugins when a workflow needs several capabilities together, such as GitHub review tools plus related skills, or a team integration with its own commands and context.	Concepts
Subagents	Additional agents that can work on separate, bounded tasks in parallel.	Use subagents for independent review or implementation slices: one agent checks docs, another checks scripts, another checks experiments. Keep write scopes separate.	Prompt Examples
Approvals/sandboxing	Controls for what Codex can read, write, run, or access over the network.	Use stricter settings when exploring unfamiliar repos, protecting data, or letting Codex run commands you have not reviewed yet.	Settings

Getting Started With Codex

You need access to Codex through your OpenAI account. In practice this usually means a paid ChatGPT plan such as Plus, Pro, Business, Enterprise, Edu, Health, or Gov; availability and rate limits change, so check OpenAI's current Codex plan information before onboarding a team.

Three simple ways to start:

Start here	Good for	First step
VS Code	Editing code while staying in a familiar editor.	Install the Codex/OpenAI extension or integration you use, open a project folder, and ask Codex to explain the repo before editing.
Codex app/cloud	Delegating a task and reviewing the result without staying in a terminal.	Open the Codex app or web flow, connect/select the repo, describe the goal, and ask it to stop after a plan or diff summary.
Terminal / CLI	Direct local work where Codex can run commands, tests, and scripts.	Open a shell, `cd /path/to/project`, run `codex`, and start with a small read-only task.

Example first request:

Read the README and main docs for this repository.
Explain what the repo is for and where I should make changes.
Do not edit files yet.

Part 2: Install the Harness

The harness exists for one main reason: feed Codex less junk.

Codex has limited useful context in each task. If it spends that context on huge file lists, raw logs, repeated grep output, vendored code, and broad scans, it has less room for the files and decisions that matter. RTK and Graphify help Codex start from compact summaries and a repository map, so it can spend fewer tokens getting oriented and more effort on the actual task.

Use the harness when you want Codex to inspect a repo with less noisy context, clearer structure, and a rollback path for the setup.

Need	Where to go
Install the harness safely	Quickstart
Understand what RTK and Graphify add	Concepts and RTK + Graphify
Use Codex CLI day to day	Codex CLI Guide
Use the Codex app/cloud workflow	Codex App and Cloud
Configure Codex settings and instructions	Settings
Spawn useful subagents	Prompt Examples
Explore future RTK/Graphify-adjacent tools	Tooling Roadmap
Roll back the harness	Rollback Guide

What This Harness Installs

Component	Scope	What it does
RTK binary	Global for your shell after install	Adds `rtk` to your PATH so any repo can use compact command summaries.
Graphify Python environment	Harness-local install	Installs Graphify inside this repo's `.venv`; the wrapper scripts use that environment to build or refresh graphs.
Global Codex guidance	Global Codex config area	Can update files under `~/.codex/`, such as `AGENTS.md`, `RTK.md`, and `config.toml`, so Codex knows RTK/Graphify exist.
Project activation	Per target repository	Adds or updates repo-specific Codex/Graphify files such as `AGENTS.md`, `.codex/hooks.json`, and `graphify-out/` in the project you activate.
Rollback manifest	Harness-local state	Records touched files and backups in `manifests/changes.json` and `state/backups/`.

The installer is not purely project-local. install.sh prepares global shell/Codex support, while activate.sh /path/to/project applies project-specific guidance to one target repo. Read Rollback before installing on a machine you care about.

Repository Layout

Path	Purpose
`README.md`	Project overview and command cookbook.
`scripts/harness.py`	Main bootstrap, activation, deactivation, and uninstall implementation.
`scripts/install.sh`	Bootstraps the harness on this machine.
`scripts/activate.sh`	Activates Graphify/Codex guidance for a target repo.
`scripts/build_graph.sh`	Builds a first Graphify report for a target repo.
`scripts/refresh_graph.sh`	Refreshes an existing Graphify report.
`scripts/deactivate.sh`	Restores files for one activated target repo.
`scripts/uninstall.sh`	Restores bootstrap files and removes harness-installed artifacts.
`templates/`	Instruction templates used by the harness and by humans reviewing behavior.
`docs/`	Practical Codex, RTK, Graphify, subagent, and rollback guides.
`manifests/changes.json`	Machine-local record of harness-managed changes.
`vendor/`	RTK and Graphify source checkouts.

Codex Workflow Map

Surface	Use it for
Codex CLI	Local terminal work, repo edits, command execution, tests, commits, and iterative debugging.
Codex app/cloud	Reviewing tasks, delegating repository work, PR-oriented workflows, and work you want tracked outside a local terminal.
`AGENTS.md`	Durable project instructions: commands, style, safety rules, and repo-specific workflow.
Subagents	Parallel, bounded review or implementation tasks when the user explicitly asks for them.
RTK	Lower-noise command output for Codex.
Graphify	High-signal repo maps before architecture or broad codebase questions.

RTK + Graphify Mental Model

Codex has a limited working memory for each task. Huge command output and repeated file scans use that memory quickly.

RTK helps by making command output shorter. Graphify helps by giving Codex a map of the repository before it opens raw files.

Use this default pattern in activated repositories:

rtk git status
rtk git diff
rtk grep "function_or_setting"
rtk find . -type f
/home/<user>/codex_harness/scripts/refresh_graph.sh /path/to/project

Then ask Codex to start from:

Read graphify-out/GRAPH_REPORT.md first, then inspect only the files needed for this change.

Measured Context Reduction

We tested the harness idea with a reproducible experiment in experiments/context_size. The question was simple: if Codex starts from compact summaries and a repository map, how much less text does it need to read before it knows where to work?

Four independent work streams created small fixture repositories across frontend/3D, backend, ops/HPC, and docs-heavy tasks. For each fixture, the script generates two text transcripts:

Workflow	What it simulates
Without harness	A naive first pass: list many files, search broadly, and read full text.
With harness	A guided first pass: compact command summaries plus a repository-structure report.

The experiment counts actual transcript tokens with tiktoken using the o200k_base tokenizer. It does not use private Codex telemetry and it does not measure billing tokens, accuracy, or elapsed time. It measures the amount of text a Codex-like model would need to read in this controlled first-pass workflow.

Across 14 fixtures, the harness-style transcript reduced measured tokens from 27,541 to 10,254 total tokens: 62.8% fewer measured transcript tokens overall. In the chart, each family label uses a pooled family reduction: 1 - sum(harness_tokens) / sum(baseline_tokens) across that family. That is not the plain mean of the replicate percentages; larger fixtures contribute proportionally more. Pooled family reductions range from 26.5% on documentation-heavy fixtures to 74.8% on frontend/3D fixtures, while individual replicate reductions range from 23.6% to 78.5%.

Experiment design:

14 fixtures across five families: frontend/3D, backend/data, ops/HPC, docs-heavy, and original smoke fixtures.
Frontend, backend, and ops fixtures were generated by separate agents with different task briefs; docs-heavy fixtures were added separately as a lower-code comparison group.
Each fixture defines a scenario.json search pattern, so the search task is explicit and reproducible.
The baseline transcript simulates broad manual exploration.
The harness transcript simulates a more disciplined workflow: summarize first, map structure second, open raw files later.

Reproduce it:

cd /home/<user>/codex_harness
python3 -m pip install -r experiments/context_size/requirements.txt
python3 experiments/context_size/run_experiment.py

The chart is generated at experiments/context_size/results/context_reduction_by_family.svg, and the full results are in summary.csv. Treat this as evidence that the workflow can reduce context size, not as a universal guarantee. Real Codex savings depend on the task, prompt, repo size, and whether the agent still needs raw logs or full files.

Useful Commands

Install Harness

/home/<user>/codex_harness/scripts/install.sh

Activate a Project

/home/<user>/codex_harness/scripts/activate.sh /path/to/project

Build or Refresh a Graph

/home/<user>/codex_harness/scripts/build_graph.sh /path/to/project
/home/<user>/codex_harness/scripts/refresh_graph.sh /path/to/project

Use RTK in a Repo

rtk git status
rtk git diff
rtk grep "TODO|FIXME"
rtk find . -type f
rtk pytest

Deactivate One Project

/home/<user>/codex_harness/scripts/deactivate.sh /path/to/project

Uninstall Everything the Harness Knows About

/home/<user>/codex_harness/scripts/uninstall.sh

Example Subagent Prompt

Use 4 subagents to review this repo. Do not edit files yet.

Agent 1: review documentation onboarding.
Agent 2: review installation and rollback safety.
Agent 3: review test and command ergonomics.
Agent 4: review token usage and Graphify/RTK guidance.

Each agent should return top findings, files involved, why it matters, proposed fix, and priority.
After all agents finish, merge duplicates and propose a ranked plan.

Official References

Use these when documenting current Codex behavior:

When official docs and this repo disagree, treat official docs as authoritative and update this repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codex Harness

Safety First

Start Here

Part 1: Learn Codex

Getting Started With Codex

Part 2: Install the Harness

What This Harness Installs

Repository Layout

Codex Workflow Map

RTK + Graphify Mental Model

Measured Context Reduction

Useful Commands

Install Harness

Activate a Project

Build or Refresh a Graph

Use RTK in a Repo

Deactivate One Project

Uninstall Everything the Harness Knows About

Example Subagent Prompt

Official References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
docs		docs
env		env
experiments/context_size		experiments/context_size
manifests		manifests
scripts		scripts
templates		templates
vendor		vendor
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Codex Harness

Safety First

Start Here

Part 1: Learn Codex

Getting Started With Codex

Part 2: Install the Harness

What This Harness Installs

Repository Layout

Codex Workflow Map

RTK + Graphify Mental Model

Measured Context Reduction

Useful Commands

Install Harness

Activate a Project

Build or Refresh a Graph

Use RTK in a Repo

Deactivate One Project

Uninstall Everything the Harness Knows About

Example Subagent Prompt

Official References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages