Collecting real cases: silent drift, memory provenance, judgment flattening

I am publishing this repo early as a position paper, not as a finished framework.

The core claim is:

> self-evolving agents, long-term memory, and AI translation are starting to share the same failure mode: capability evolves faster than human judgment can be protected, traced, and re-anchored.

I am looking for concrete cases from people building or using long-horizon agents.

Useful examples:

- an agent improved or rewrote its own instructions, but the objective silently drifted
- long-term memory became useful but also stale, bloated, or hard to audit
- a coding agent kept producing output while path quality degraded
- AI translated a user's raw judgment into polished language but erased the actual decision signal
- a multi-agent or multi-session workflow lost provenance: no one could answer "why does the system believe this?"

The goal is to collect real failure cases and map them into a small runtime kernel vocabulary:

- source-tagged memory updates
- core vs. edge schema separation
- path-quality detection
- human-ratified calibration
- faithful translation as a view, not a replacement

If you have a concrete case, please comment with:

1. What system or workflow was involved?
2. What drifted or got flattened?
3. How did you notice?
4. What would have made the failure auditable earlier?

This issue is meant to become a public case library for provenance-aware agent design.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collecting real cases: silent drift, memory provenance, judgment flattening #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Collecting real cases: silent drift, memory provenance, judgment flattening #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions