Skip to content

Commit 883198a

Browse files
committed
Add advanced steering docs and seed policies
1 parent 766e0c1 commit 883198a

68 files changed

Lines changed: 2235 additions & 1067 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

RELEASE.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ A release should include:
1515

1616
Current release notes:
1717

18-
- [RELEASE_NOTES_v0.1.0.md](E:\Projects\StableSteering\RELEASE_NOTES_v0.1.0.md)
18+
- [RELEASE_NOTES_v0.1.0.md](RELEASE_NOTES_v0.1.0.md)
1919

2020
## Release Checklist
2121

@@ -31,10 +31,10 @@ Current release notes:
3131
6. Rebuild the documentation site:
3232
`python scripts/build_pages_site.py`
3333
7. Review:
34-
- [INSTALL.md](E:\Projects\StableSteering\INSTALL.md)
35-
- [README.md](E:\Projects\StableSteering\README.md)
36-
- [docs/student_tutorial.md](E:\Projects\StableSteering\docs\student_tutorial.md)
37-
- [RELEASE_NOTES_v0.1.0.md](E:\Projects\StableSteering\RELEASE_NOTES_v0.1.0.md)
34+
- [INSTALL.md](INSTALL.md)
35+
- [README.md](README.md)
36+
- [docs/student_tutorial.md](docs/student_tutorial.md)
37+
- [RELEASE_NOTES_v0.1.0.md](RELEASE_NOTES_v0.1.0.md)
3838
8. Build a source zip if needed:
3939
`powershell -ExecutionPolicy Bypass -File scripts/build_release_zip.ps1 -Version v0.1.0`
4040
9. Create the Git tag.

RELEASE_NOTES_v0.1.0.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,13 @@ Published HTML documentation:
2222

2323
## Included Documentation
2424

25-
- [README.md](E:\Projects\StableSteering\README.md)
26-
- [INSTALL.md](E:\Projects\StableSteering\INSTALL.md)
27-
- [RELEASE.md](E:\Projects\StableSteering\RELEASE.md)
28-
- [student_tutorial.md](E:\Projects\StableSteering\docs\student_tutorial.md)
29-
- [quick_start.md](E:\Projects\StableSteering\docs\quick_start.md)
30-
- [developer_guide.md](E:\Projects\StableSteering\docs\developer_guide.md)
31-
- [user_guide.md](E:\Projects\StableSteering\docs\user_guide.md)
25+
- [README.md](README.md)
26+
- [INSTALL.md](INSTALL.md)
27+
- [RELEASE.md](RELEASE.md)
28+
- [student_tutorial.md](docs/student_tutorial.md)
29+
- [quick_start.md](docs/quick_start.md)
30+
- [developer_guide.md](docs/developer_guide.md)
31+
- [user_guide.md](docs/user_guide.md)
3232

3333
## Validation Snapshot
3434

RELEASE_NOTES_v0.1.1.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Release Notes v0.1.1
2+
3+
## Summary
4+
5+
This release expands the research MVP into a more configurable and inspectable system.
6+
7+
Major themes in this release:
8+
9+
- richer sampling and preference-mode support
10+
- stronger per-session YAML configuration
11+
- better runtime trace reporting and portable HTML artifacts
12+
- improved docs-site linking and roadmap detail
13+
- real seed-policy behavior in the orchestration layer
14+
15+
## Highlights
16+
17+
### Sampling and preference updates
18+
19+
- added `axis_sweep` and `incumbent_mix` samplers
20+
- added `winner_only` and `approve_reject` feedback modes
21+
- documented the current sampler and preference-model behavior more clearly
22+
23+
### Session and orchestration updates
24+
25+
- first round always includes the unmodified prompt baseline
26+
- later rounds carry forward the previous winner as the incumbent
27+
- default candidate count is now `5`
28+
- implemented seed policies:
29+
- `fixed-per-round`
30+
- `fixed-per-candidate`
31+
- `fixed-per-candidate-role`
32+
33+
### Reporting and examples
34+
35+
- improved HTML session trace reporting
36+
- ensured initial prompt visibility in generated HTML reports
37+
- regenerated the real end-to-end sample bundle
38+
- added a configuration-matrix sample generator
39+
40+
### Documentation and publishing
41+
42+
- fixed broken Markdown links that pointed at machine-local paths
43+
- regenerated the GitHub Pages site with corrected document and code links
44+
- expanded the roadmap docs with:
45+
- why each item matters
46+
- implementation notes
47+
- success signals
48+
49+
## Verification
50+
51+
Validated before release with:
52+
53+
- `python -m pytest -q`
54+
- `npm run test:e2e:chrome`
55+
- `python scripts/build_pages_site.py`
56+
57+
## Known limitations
58+
59+
- `multi-seed averaging` is still specified in docs but not yet implemented
60+
- mode-specific frontend controls are still incomplete for some preference modes
61+
- the real-backend Playwright smoke remains opt-in

app/core/schema.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,14 @@ class FeedbackType(str, Enum):
4949
scalar_rating = "scalar_rating"
5050
pairwise = "pairwise"
5151
top_k = "top_k"
52+
winner_only = "winner_only"
53+
approve_reject = "approve_reject"
54+
55+
56+
class SeedPolicy(str, Enum):
57+
fixed_per_round = "fixed-per-round"
58+
fixed_per_candidate = "fixed-per-candidate"
59+
fixed_per_candidate_role = "fixed-per-candidate-role"
5260

5361

5462
class StrategyConfig(BaseModel):
@@ -57,7 +65,7 @@ class StrategyConfig(BaseModel):
5765
sampler: str = "random_local"
5866
updater: str = "winner_average"
5967
feedback_mode: FeedbackType = FeedbackType.scalar_rating
60-
seed_policy: str = "fixed-per-round"
68+
seed_policy: SeedPolicy = SeedPolicy.fixed_per_round
6169
steering_mode: str = "low_dimensional"
6270
candidate_count: int = 5
6371
image_size: str = "512x512"

app/core/tracing.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,7 @@ def _render_session_report(
201201
' <details class="card section" open>',
202202
" <summary>Run Summary</summary>",
203203
' <div class="card-body">',
204+
f" <p><strong>Initial prompt:</strong> {self._escape(session.get('prompt') or '(none)')}</p>",
204205
f" <p><strong>Negative prompt:</strong> {self._escape(session.get('negative_prompt') or '(none)')}</p>",
205206
f" <p><strong>Model:</strong> <code>{self._escape(session.get('model_name', 'unknown'))}</code></p>",
206207
f" <p><strong>Feedback mode:</strong> <code>{self._escape(str(session.get('config', {}).get('feedback_mode', 'unknown')))}</code></p>",

app/engine/orchestrator.py

Lines changed: 58 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from __future__ import annotations
22

33
from copy import deepcopy
4+
import hashlib
45
import math
56

67
from app.core.config import settings
@@ -14,6 +15,7 @@
1415
RenderStatus,
1516
Round,
1617
RoundResponse,
18+
SeedPolicy,
1719
Session,
1820
SessionCreate,
1921
SessionStatus,
@@ -22,8 +24,10 @@
2224
from app.core.logging import logger
2325
from app.core.tracing import TraceRecorder
2426
from app.feedback.normalization import normalize_feedback
27+
from app.samplers.axis_sweep import AxisSweepSampler
2528
from app.samplers.base import clamp_vector
2629
from app.samplers.exploit_orthogonal import ExploitOrthogonalSampler
30+
from app.samplers.incumbent_mix import IncumbentMixSampler
2731
from app.samplers.random_local import RandomLocalSampler
2832
from app.samplers.uncertainty import UncertaintyGuidedSampler
2933
from app.storage.repository import JsonRepository
@@ -49,6 +53,8 @@ def __init__(
4953
"random_local": RandomLocalSampler(),
5054
"exploit_orthogonal": ExploitOrthogonalSampler(),
5155
"uncertainty_guided": UncertaintyGuidedSampler(),
56+
"axis_sweep": AxisSweepSampler(),
57+
"incumbent_mix": IncumbentMixSampler(),
5258
}
5359
self.updaters = {
5460
"winner_copy": WinnerCopyUpdater(),
@@ -123,7 +129,6 @@ def generate_round(self, session_id: str) -> RoundResponse:
123129
session = self._require_session(session_id)
124130
if session.status == SessionStatus.awaiting_feedback:
125131
raise RuntimeError("Cannot generate a new round while feedback for the current round is still pending")
126-
seed = 1000 + session.current_round
127132
sampler = self.samplers[session.config.sampler]
128133
round_index = session.current_round + 1
129134
round_obj = Round(
@@ -140,21 +145,22 @@ def generate_round(self, session_id: str) -> RoundResponse:
140145
)
141146
carried_forward = self._build_carried_forward_candidate(session)
142147
baseline_candidate = self._build_baseline_prompt_candidate(session)
143-
proposed_candidates = sampler.propose(session, seed)
148+
sampler_seed = self._seed_token(session.id, round_index, "sampler")
149+
proposed_candidates = sampler.propose(session, sampler_seed)
144150
proposed_candidates = self._widen_first_round_candidates(session, proposed_candidates)
145151
candidates = self._compose_round_candidates(
146152
pinned_candidate=carried_forward or baseline_candidate,
147153
proposed_candidates=proposed_candidates,
148154
candidate_count=session.config.candidate_count,
149155
)
156+
self._assign_candidate_seeds(session, round_index, candidates)
150157
# Render each candidate independently so future versions can tolerate
151158
# partial round failures without changing the orchestration contract.
152159
for candidate in candidates:
153160
candidate.round_id = round_obj.id
154161
if candidate.generation_params.get("carried_forward") and candidate.image_path:
155162
candidate.render_status = RenderStatus.succeeded
156163
continue
157-
candidate.seed = seed
158164
candidate = self.generator.render_candidate(session, candidate)
159165
candidate.render_status = RenderStatus.succeeded
160166
round_obj.candidates = candidates
@@ -281,6 +287,16 @@ def _validate_feedback_against_round(self, round_obj: Round, feedback) -> None:
281287
if unknown_ranked:
282288
raise ValueError(f"Feedback ranking references unknown candidates: {', '.join(unknown_ranked)}")
283289

290+
approved = feedback.normalized_payload.get("approved_candidate_ids", [])
291+
unknown_approved = [candidate_id for candidate_id in approved if candidate_id not in candidate_ids]
292+
if unknown_approved:
293+
raise ValueError(f"Feedback approvals reference unknown candidates: {', '.join(unknown_approved)}")
294+
295+
rejected = feedback.normalized_payload.get("rejected_candidate_ids", [])
296+
unknown_rejected = [candidate_id for candidate_id in rejected if candidate_id not in candidate_ids]
297+
if unknown_rejected:
298+
raise ValueError(f"Feedback rejections reference unknown candidates: {', '.join(unknown_rejected)}")
299+
284300
@staticmethod
285301
def _candidate_trace_payload(candidate) -> dict:
286302
"""Return a compact trace payload for one proposed image candidate."""
@@ -294,6 +310,8 @@ def _candidate_trace_payload(candidate) -> dict:
294310
"z": candidate.z,
295311
"predicted_score": candidate.predicted_score,
296312
"predicted_uncertainty": candidate.predicted_uncertainty,
313+
"seed_policy": candidate.generation_params.get("seed_policy"),
314+
"seed_group": candidate.generation_params.get("seed_group"),
297315
}
298316

299317
def _build_carried_forward_candidate(self, session: Session) -> Candidate | None:
@@ -409,3 +427,40 @@ def _compose_round_candidates(
409427
for index, candidate in enumerate(selected):
410428
candidate.candidate_index = index
411429
return selected
430+
431+
def _assign_candidate_seeds(self, session: Session, round_index: int, candidates: list[Candidate]) -> None:
432+
"""Assign deterministic candidate seeds according to the configured policy."""
433+
434+
policy = session.config.seed_policy
435+
round_seed = self._seed_token(session.id, round_index, "round")
436+
for candidate in candidates:
437+
if candidate.generation_params.get("carried_forward"):
438+
candidate.generation_params["seed_policy"] = policy.value
439+
candidate.generation_params["seed_group"] = "carried_forward"
440+
candidate.generation_params["seed_preserved"] = True
441+
continue
442+
443+
if policy == SeedPolicy.fixed_per_round:
444+
candidate.seed = round_seed
445+
seed_group = "round_shared"
446+
elif policy == SeedPolicy.fixed_per_candidate:
447+
candidate.seed = self._seed_token(session.id, round_index, "candidate", str(candidate.candidate_index))
448+
seed_group = f"candidate:{candidate.candidate_index}"
449+
elif policy == SeedPolicy.fixed_per_candidate_role:
450+
role = candidate.sampler_role or "candidate"
451+
candidate.seed = self._seed_token(session.id, round_index, "role", role)
452+
seed_group = f"role:{role}"
453+
else:
454+
raise ValueError(f"Unsupported seed policy: {policy}")
455+
456+
candidate.generation_params["seed_policy"] = policy.value
457+
candidate.generation_params["seed_group"] = seed_group
458+
candidate.generation_params["round_seed"] = round_seed
459+
460+
@staticmethod
461+
def _seed_token(*parts: object) -> int:
462+
"""Create one stable positive seed from arbitrary deterministic inputs."""
463+
464+
joined = "|".join(str(part) for part in parts)
465+
digest = hashlib.blake2b(joined.encode("utf-8"), digest_size=4).digest()
466+
return int.from_bytes(digest, byteorder="big", signed=False)

app/feedback/normalization.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,25 @@ def normalize_feedback(round_id: str, request: FeedbackRequest) -> FeedbackEvent
2525
"winner_candidate_id": payload["winner_candidate_id"],
2626
"loser_candidate_id": payload.get("loser_candidate_id"),
2727
}
28+
elif request.feedback_type == FeedbackType.winner_only:
29+
winner_candidate_id = payload.get("winner_candidate_id")
30+
if not winner_candidate_id:
31+
raise ValueError("winner_only feedback requires winner_candidate_id")
32+
normalized = {"winner_candidate_id": winner_candidate_id}
33+
elif request.feedback_type == FeedbackType.approve_reject:
34+
approvals = payload.get("approvals", {})
35+
if not approvals:
36+
raise ValueError("approve_reject feedback requires at least one approval decision")
37+
approved_candidate_ids = [candidate_id for candidate_id, approved in approvals.items() if approved]
38+
if not approved_candidate_ids:
39+
raise ValueError("approve_reject feedback requires at least one approved candidate")
40+
winner_candidate_id = payload.get("winner_candidate_id") or approved_candidate_ids[0]
41+
normalized = {
42+
"winner_candidate_id": winner_candidate_id,
43+
"approved_candidate_ids": approved_candidate_ids,
44+
"rejected_candidate_ids": [candidate_id for candidate_id, approved in approvals.items() if not approved],
45+
"approvals": approvals,
46+
}
2847
else:
2948
ranking = payload.get("ranking", [])
3049
if not ranking:

app/frontend/static/app.js

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,32 @@ function buildFeedbackPayload(feedbackMode, ratingEntries) {
146146
};
147147
}
148148

149+
if (feedbackMode === "winner_only") {
150+
return {
151+
feedback_type: "winner_only",
152+
payload: {
153+
winner_candidate_id: sorted[0].candidateId,
154+
},
155+
};
156+
}
157+
158+
if (feedbackMode === "approve_reject") {
159+
const approvals = Object.fromEntries(
160+
ratingEntries.map((entry) => [entry.candidateId, entry.rating >= 4])
161+
);
162+
const approvedEntries = sorted.filter((entry) => approvals[entry.candidateId]);
163+
if (!approvedEntries.length) {
164+
throw new Error("Approve/reject feedback requires at least one candidate rated 4 or 5.");
165+
}
166+
return {
167+
feedback_type: "approve_reject",
168+
payload: {
169+
winner_candidate_id: approvedEntries[0].candidateId,
170+
approvals,
171+
},
172+
};
173+
}
174+
149175
if (feedbackMode === "top_k") {
150176
return {
151177
feedback_type: "top_k",

app/samplers/axis_sweep.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
from __future__ import annotations
2+
3+
from app.core.schema import Candidate, Session
4+
from app.samplers.base import clamp_vector, make_rng
5+
6+
7+
class AxisSweepSampler:
8+
"""Sampler that probes positive and negative movement along steering axes."""
9+
10+
name = "axis_sweep"
11+
12+
def propose(self, session: Session, seed: int) -> list[Candidate]:
13+
"""Generate a batch that systematically sweeps the steering basis directions."""
14+
15+
rng = make_rng(seed + 211)
16+
base = session.current_z
17+
dimensions = max(1, len(base))
18+
candidates: list[Candidate] = []
19+
for index in range(session.config.candidate_count):
20+
axis = index % dimensions
21+
direction = 1.0 if (index // dimensions) % 2 == 0 else -1.0
22+
magnitude = 0.18 + (0.04 * (index // (dimensions * 2)))
23+
offset = [0.0 for _ in base]
24+
offset[axis] = direction * magnitude
25+
jitter = [rng.uniform(-0.025, 0.025) for _ in base]
26+
z = clamp_vector(
27+
[current + delta + noise for current, delta, noise in zip(base, offset, jitter, strict=False)],
28+
session.config.trust_radius,
29+
)
30+
role = "axis_positive" if direction > 0 else "axis_negative"
31+
candidates.append(
32+
Candidate(
33+
round_id="",
34+
candidate_index=index,
35+
z=z,
36+
sampler_role=role,
37+
predicted_score=sum(z),
38+
predicted_uncertainty=0.1 + (0.02 * index),
39+
seed=seed,
40+
generation_params={"image_size": session.config.image_size, "axis_index": axis},
41+
)
42+
)
43+
return candidates

0 commit comments

Comments
 (0)