Skip to content

Latest commit

 

History

History
281 lines (145 loc) · 6.07 KB

File metadata and controls

281 lines (145 loc) · 6.07 KB

Proofs

This file gathers formal demonstrations under explicit hypotheses. The proofs below do not claim to prove the theory in full generality; they prove precise results inside minimal models that capture the central mechanism of the concept.

Theorem 1 - Sufficient condition for a wrong game in the bounded model

Statement

Consider two stationary policies in the same game:

  • pi_exploit: always chooses exploit;
  • pi_preserve: always chooses preserve.

Assume:

  • r_exploit > r_preserve;
  • o_t in [0,1];
  • o_{t+1}^{exploit} = kappa_E o_t, with 0 <= kappa_E < 1;
  • o_{t+1}^{preserve} = 1 - kappa_P (1 - o_t), with 0 <= kappa_P < 1;
  • U_t = r_t + lambda o_t - eta c_t;
  • c_0 = chi >= 0 under preserve and c_t = 0 otherwise;
  • 0 < delta < 1.

Define:

Gamma = 1/(1-delta) - (1-o_0)/(1-delta kappa_P) - o_0/(1-delta kappa_E).

Then pi_exploit is structurally worse than pi_preserve if:

Delta r + eta chi (1-delta) < lambda (1-delta) Gamma,

where Delta r = r_exploit - r_preserve,

provided that Gamma > 0.

Proof

Under pi_exploit, optionality evolves as:

o_t^E = o_0 kappa_E^t.

Under pi_preserve, it evolves as:

o_t^P = 1 - (1-o_0) kappa_P^t.

Hence:

J^E = sum_{t=0}^\infty delta^t [r_exploit + lambda o_0 kappa_E^t]

and

J^P = - eta chi + sum_{t=0}^\infty delta^t [r_preserve + lambda (1 - (1-o_0) kappa_P^t)].

Separating terms:

J^E = r_exploit/(1-delta) + lambda o_0/(1-delta kappa_E)

and

J^P = - eta chi + r_preserve/(1-delta) + lambda [1/(1-delta) - (1-o_0)/(1-delta kappa_P)].

Subtracting:

J^E - J^P = Delta r/(1-delta) + eta chi - lambda Gamma.

Therefore J^E < J^P if and only if

Delta r/(1-delta) + eta chi < lambda Gamma.

Multiplying by (1-delta) > 0, we obtain:

Delta r + eta chi (1-delta) < lambda (1-delta) Gamma.

QED.

Proposition 2 - Critical threshold of structural weight

Statement

Under the hypotheses of Theorem 1, and assuming Gamma > 0, there exists

lambda_* = [Delta r + eta chi (1-delta)] / [(1-delta) Gamma]

such that J^P > J^E for every lambda > lambda_*.

Proof

From Theorem 1:

J^E - J^P = Delta r/(1-delta) + eta chi - lambda Gamma.

Thus J^P > J^E is equivalent to

lambda Gamma > Delta r/(1-delta) + eta chi.

Since Gamma > 0, divide both sides by Gamma and obtain:

lambda > [Delta r + eta chi (1-delta)] / [(1-delta) Gamma] = lambda_*.

QED.

Proposition 3 - Monotonicity in lambda

Statement

In the model of Theorem 1, defining

Delta J(lambda) = J^P - J^E,

we have d Delta J / d lambda = Gamma.

In particular, if Gamma > 0, then Delta J is strictly increasing in lambda.

Proof

From the expression obtained in Theorem 1:

J^P - J^E = lambda Gamma - Delta r/(1-delta) - eta chi.

Differentiating with respect to lambda,

d Delta J / d lambda = Gamma.

If Gamma > 0, it follows immediately that Delta J grows strictly with lambda.

QED.

Proposition 4 - Effect of explicit exit cost

Statement

In the same model,

d Delta J / d chi = - eta.

Therefore, for eta > 0, the structural advantage of preserve is strictly decreasing in chi.

Proof

From:

Delta J = lambda Gamma - Delta r/(1-delta) - eta chi,

differentiating with respect to chi,

d Delta J / d chi = - eta.

If eta > 0, the sign is strictly negative.

QED.

Proposition 5 - Invariance of local ordering under positive affine transformation

Statement

Let r'(x,a,g) = a r(x,a,g) + b, with a > 0. Then:

argmax_a r'(x,a,g) = argmax_a r(x,a,g).

Proof

For any actions a_1, a_2,

r'(x,a_1,g) >= r'(x,a_2,g)

if and only if

a r(x,a_1,g) + b >= a r(x,a_2,g) + b.

Subtract b and divide by a > 0, obtaining:

r(x,a_1,g) >= r(x,a_2,g).

Therefore the order induced by r' coincides with the order induced by r, and the maximizer sets are identical.

QED.

Proposition 6 - Invariance of structural ordering under positive rescaling of the functional

Statement

If U'_t = a U_t + b, with a > 0, then

J'^pi = a J^pi + b/(1-delta).

Hence, for any policies pi_1, pi_2,

J^{pi_1} > J^{pi_2} if and only if J'^{pi_1} > J'^{pi_2}.

Proof

By definition:

J'^pi = sum_{t=0}^\infty delta^t (a U_t + b).

Separating terms:

J'^pi = a sum_{t=0}^\infty delta^t U_t + b sum_{t=0}^\infty delta^t.

Thus:

J'^pi = a J^pi + b/(1-delta).

Since a > 0, the transformation is strictly increasing in J^pi. Therefore it preserves ordering across policies.

QED.

Proposition 7 - Victory increases the probability of staying

Statement

Let:

p_stay(r) = sigma(alpha + beta r + gamma e),

with sigma(z) = 1/(1+e^{-z}) and beta > 0. Then p_stay is strictly increasing in r.

Proof

The derivative of the logistic function is:

sigma'(z) = sigma(z)(1-sigma(z)).

By the chain rule:

dp_stay/dr = beta sigma(z)(1-sigma(z)).

Since beta > 0 and 0 < sigma(z) < 1, it follows that:

dp_stay/dr > 0.

Therefore higher local reward implies a higher probability of staying in the game.

QED.

Proposition 8 - Local-competence proxy increases expected time in the bad game

Statement

Assume the competence proxy q in [0,1] determines expected reward in the bad game:

E[r | q] = q r_H + (1-q) r_L,

with r_H > r_L. Assume further that:

  • p_stay is increasing in r;
  • permanence in the bad game follows a geometric distribution with parameter 1-p_stay.

Then expected time in the bad game,

E[tau_bad | q] = 1 / (1 - p_stay(q)),

is increasing in q.

Proof

Since r_H > r_L, we have:

dE[r|q]/dq = r_H - r_L > 0.

Because p_stay grows with r, composition implies that p_stay(q) grows with q.

Now consider the function:

f(p) = 1/(1-p), for 0 <= p < 1.

We have:

f'(p) = 1/(1-p)^2 > 0.

Therefore f is increasing in p. Since p_stay(q) increases with q, then:

E[tau_bad | q] = f(p_stay(q))

also increases with q.

QED.

Methodological observation

The proofs above do not close the theory. They do something more important at this stage: they show that the central thesis already has reproducible formal instances, with clear and checkable conditions, without depending on unbounded optionality.