Refactor: Delegate GRPO and Evasion capabilities to OSS Alternatives (OpenRLHF, browserforge)

**Title:** Refactor: Delegate GRPO and Evasion capabilities to OSS Alternatives (OpenRLHF, browserforge) #187

**Body:**
### Background
In alignment with our **"Borrow, Do Not Build"** engineering doctrine, we have identified two major sub-systems in the CoReason platform that feature custom proprietary math and logic which can be natively handled by maintained OSS libraries:
1. **GRPO & PRMs:** We currently track RLHF/GRPO advantage scores and Process Reward Model evaluations manually via `EpistemicRewardGradientPolicy`, `CognitiveRewardEvaluationReceipt`, and `ProcessRewardContract`. Calculating PPO/GRPO policy gradients and managing KL-divergence penalties internally across distributed GPUs is unstable at scale.
2. **Browser Evasion:** We built `AdversarialEmulationProfile`, `KinematicNoiseProfile`, and `EnvironmentalSpoofingProfile` for 1/f pink noise mouse movements, JA3 TLS fingerprint spoofing, and WebGL canvas hashing. The cat-and-mouse game of evading CDNs requires daily updates, which makes an internal implementation fragile.

### Proposed Solution
Rip out the internal logic and delegate to OSS primitives:
- **OpenRLHF / HuggingFace TRL:** We will use `coreason-manifest` to label data (creating `EpistemicGroundedTaskManifest`), but offload actual backpropagation and step-level PRM verification to OpenRLHF.
- **browserforge / curl-impersonate:** Delete custom TLS and Canvas spoofing math and delegate browser instantiation to maintained OSS libraries. This keeps CoReason logic strictly focused on deterministic navigation/clicking, not environment spoofing.

### Tasks
- [x] Remove `EpistemicRewardGradientPolicy`, `CognitiveRewardEvaluationReceipt`, and `ProcessRewardContract` from `coreason-manifest`.
- [x] Remove `AdversarialEmulationProfile`, `KinematicNoiseProfile`, and `EnvironmentalSpoofingProfile` from `coreason-manifest`.
- [x] Run `universal_ontology_compiler.py` to regenerate JSON and Language bindings.
- [x] Rip out active inference GRPO evaluation logic from `coreason-runtime`.
- [x] Validate runtime and CI/CD tests.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: Delegate GRPO and Evasion capabilities to OSS Alternatives (OpenRLHF, browserforge) #201

Background

Proposed Solution

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor: Delegate GRPO and Evasion capabilities to OSS Alternatives (OpenRLHF, browserforge) #201

Description

Background

Proposed Solution

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions