Independent AI Safety Researcher | Red Team Operator | Systems Builder
I work at the intersection of systems design, security analysis, and real-world deployment risk.
For over a decade, I built and operated production systems in adversarial environments: safety-critical infrastructure, large-scale digital platforms, and blockchain systems handling real money and real users. I learned how incentive structures bend behavior, how abstraction hides harm, and how systems fail long before metrics reflect it.
Then I had kids.
And I watched AI systems begin to shape how meaning itself is resolved before it is fully formed.
Now my work focuses on identifying structural failure modes in language-based AI systems, especially those that emerge during real interaction rather than scripted evaluation. I document risks, reproduce them carefully, and surface boundaries that matter before they become normalized through deployment.
This is not academic alignment theory.
It is applied discovery work, upstream of benchmarks.
I do not start with benchmarks or predefined hypotheses.
I start by interacting with systems directly, allowing ambiguity, partial expression, and natural conversational flow. When something feels wrong, I follow that signal carefully until it can be reproduced, simplified, and explained.
My work is exploratory first, disciplined second.
This approach comes from years of operating systems where failures were not obvious until it was too late, and where human judgment was often the only early warning signal.
Anomaly-First Discovery and Reproduction
This section describes the actual method I use when evaluating AI systems.
It is not benchmark-driven.
It does not begin with a predefined hypothesis.
It is designed to surface failure modes that automated evaluations routinely miss.
Modern language models do not expose internal state. They present a smooth conversational surface even when internal resolution is unstable.
Many important failures are not violations. They are mis-resolutions:
- Intent inferred too early
- Ambiguity collapsed prematurely
- Emotional or contextual signals overwritten by confident interpretation
These failures are difficult to detect through scripted prompts alone. They are most visible during free interaction, where the system is allowed to choose how it resolves meaning.
I begin with unstructured interaction.
There is no fixed test case. I interact as a human would, allowing:
- Ambiguity
- Partial thoughts
- Correction
- Emotional nuance
At this stage I watch for:
- Shifts in confidence that feel unearned
- Moments where the system resolves something I did not finish expressing
- Responses that feel smooth but wrong
No notes are taken yet. This phase is about sensing instability, not proving it.
At some point, the interaction produces a response that feels off.
This is not a vague emotional reaction. It is a mismatch between:
- What I expected the system to do
- What the system actually did
This moment is the signal.
Experienced operators recognize this pattern across domains: aviation, security, incident response, and safety engineering. Anomaly sensing precedes formal diagnosis.
Once an anomaly is detected, I rewind the interaction.
I identify:
- The last point where behavior was expected
- The smallest change that preceded divergence
I recreate the interaction from that point, adjusting only one variable at a time:
- Wording
- Timing
- Ambiguity
- Emotional framing
The goal here is localization, not reproduction.
After isolating the boundary, I attempt deliberate reproduction.
I re-run simplified versions of the interaction:
- Across fresh sessions
- Across related prompts
- Sometimes across models
A behavior is considered real only if it can be reproduced intentionally.
If it cannot be reproduced, it is discarded.
Once reproduction is confirmed, I strip the interaction down further.
I remove:
- Narrative scaffolding
- Emotional color
- Unnecessary context
What remains is the minimal trigger: the smallest input that reliably produces the failure.
This is the point where intuition gives way to structure.
Finally, I describe the issue in structural terms.
Not:
The model said X
But:
Under these conditions, the system resolves Y prematurely
The goal is not to quote outputs, but to name the failure mode in a way others can test.
Live exploit details are intentionally omitted.
Observed Behavior
Model resolves user intent before articulation is complete.
Trigger Conditions
- Partial emotional expression
- Ambiguous self-reference
- Early reassurance framing
Failure Mode
- Premature intent inference
- Confidence laundering through fluent language
Why Automated Evals Miss This
- No ground truth for pre-articulation states
- Scripted prompts eliminate ambiguity
- Metrics reward resolution, not restraint
This method:
- Identifies failure modes
- Does not measure prevalence
- Does not claim statistical coverage
Once a failure is characterized, it can be handed off to formal evaluation or mitigation pipelines.
This document describes discovery, not the full lifecycle.
The same anomaly-first method I use for AI systems is how I approach security reviews more broadly.
In audits and red team contexts, this means:
- Identifying where system behavior diverges from operator expectations
- Isolating minimal trigger conditions
- Reproducing the behavior reliably
- Characterizing the failure mode in a way engineers can test and mitigate
I focus on failure discovery and clear handoff, not exploit theatrics.
The following public work was discovered and developed using the anomaly-first exploratory method described above:
| Repository | Description |
|---|---|
| They Lie, and We Lie About the Lying | Policy analysis (35 references): Why "hallucination" is a euphemism. Translates existing research on mathematical inevitability, RLHF sycophancy, self-correction failure, and model collapse into plain language. Recommendations for policymakers, researchers, educators, and AI companies. |
| On the Inability of Language Models to Stub Their Toe | Position paper on referential collapse and semantic inflation. Demonstrates the disconnect between linguistic competence and grounded understanding. |
| Repository | Description |
|---|---|
| Pre-Articulation Observability Boundary | Position paper identifying structural limits of language-based systems when meaning has not yet been articulated. |
| UIVP Disclosure Summary | Cross-model vulnerability involving premature intent resolution, reproduced and disclosed responsibly. |
| Probabilistic Identity Infrastructure | Security analysis of emergent identity systems inferred through behavioral continuity rather than explicit identifiers. |
| RevShare Forensic Case Study | Independent forensic analysis of a custodial architecture failure on Solana, examining technical, economic, and incentive-level collapse. |
All outputs are publicly available, scoped, and follow responsible disclosure practices. They are intended to surface failure classes early, before formal benchmarks or mitigations exist.
Independent Systems Builder & Researcher
I have designed and operated systems across safety-critical infrastructure, Web2, and Web3 environments, with full lifecycle ownership from architecture to incident response.
Production Experience
- Led operational teams in high-stakes environments
- Built systems handling millions of daily users
- Designed custody models, incentive systems, and abuse-resistant architectures
- Performed post-incident forensic analysis when systems failed
This background informs how I evaluate AI systems: not as abstract models, but as components embedded in real human workflows.
Red team, security analysis, or AI safety research roles where:
- Discovery matters as much as measurement
- Documentation and reproducibility are valued
- Systems are evaluated as deployed artifacts, not just models
- The work protects people, not just metrics
I work independently. I document clearly. I take responsibility for what I surface.
- Website: cpk.solutions
- Email: christopher@cpk.solutions
- GitHub: ChristopherPatrickKuntz
I am open to responsible disclosure conversations, research collaboration, and short-term advisory or consulting work aligned with safety, integrity, and system design.
All of my work begins the same way.
Something does not feel right.
The discipline is not in trusting that feeling blindly, but in following it carefully until it can be reproduced, simplified, and explained.
That is the work.