Skip to content

Refactor: Delegate GRPO and Evasion capabilities to OSS Alternatives (OpenRLHF, browserforge) #201

@gowthamrao

Description

@gowthamrao

Title: Refactor: Delegate GRPO and Evasion capabilities to OSS Alternatives (OpenRLHF, browserforge) #187

Body:

Background

In alignment with our "Borrow, Do Not Build" engineering doctrine, we have identified two major sub-systems in the CoReason platform that feature custom proprietary math and logic which can be natively handled by maintained OSS libraries:

  1. GRPO & PRMs: We currently track RLHF/GRPO advantage scores and Process Reward Model evaluations manually via EpistemicRewardGradientPolicy, CognitiveRewardEvaluationReceipt, and ProcessRewardContract. Calculating PPO/GRPO policy gradients and managing KL-divergence penalties internally across distributed GPUs is unstable at scale.
  2. Browser Evasion: We built AdversarialEmulationProfile, KinematicNoiseProfile, and EnvironmentalSpoofingProfile for 1/f pink noise mouse movements, JA3 TLS fingerprint spoofing, and WebGL canvas hashing. The cat-and-mouse game of evading CDNs requires daily updates, which makes an internal implementation fragile.

Proposed Solution

Rip out the internal logic and delegate to OSS primitives:

  • OpenRLHF / HuggingFace TRL: We will use coreason-manifest to label data (creating EpistemicGroundedTaskManifest), but offload actual backpropagation and step-level PRM verification to OpenRLHF.
  • browserforge / curl-impersonate: Delete custom TLS and Canvas spoofing math and delegate browser instantiation to maintained OSS libraries. This keeps CoReason logic strictly focused on deterministic navigation/clicking, not environment spoofing.

Tasks

  • Remove EpistemicRewardGradientPolicy, CognitiveRewardEvaluationReceipt, and ProcessRewardContract from coreason-manifest.
  • Remove AdversarialEmulationProfile, KinematicNoiseProfile, and EnvironmentalSpoofingProfile from coreason-manifest.
  • Run universal_ontology_compiler.py to regenerate JSON and Language bindings.
  • Rip out active inference GRPO evaluation logic from coreason-runtime.
  • Validate runtime and CI/CD tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions