Skip to content

feat: fee-adjusted edge, fractional-Kelly sizing, and risk-assessment endpoints#7

Open
JonathanW666 wants to merge 14 commits into
MusashiBot:mainfrom
JonathanW666:feat/trading-edge-and-sizing
Open

feat: fee-adjusted edge, fractional-Kelly sizing, and risk-assessment endpoints#7
JonathanW666 wants to merge 14 commits into
MusashiBot:mainfrom
JonathanW666:feat/trading-edge-and-sizing

Conversation

@JonathanW666
Copy link
Copy Markdown

@JonathanW666 JonathanW666 commented Apr 20, 2026

Summary

This PR extends the signal pipeline so the API's trading primitives are
fee-aware, return Kelly-sized stakes, and ship with reproducible harnesses
for matcher quality and end-to-end back-test performance. All changes are
additive; no existing response field or query parameter was renamed or
removed. Version bumps from 2.0.0 to 2.1.0.

Fourteen commits, organised by concern:

  • Analysis math (commits 1–3): shared fee, Kelly, and risk modules; a
    rewritten sentiment analyzer; a signed-edge correction in the signal
    generator.
  • Arbitrage sizing (commit 4): maxStake, expectedDollarProfit, and
    annualisedReturn added to the covered-bundle opportunities introduced
    in PR Fix legacy arbitrage detector with covered-bundle pricing and strict contract matching #2. The underlying detector is unchanged.
  • Endpoints (commits 5–6): analyze-text now surfaces EV and Kelly
    fields; markets/arbitrage accepts three sizing filters; two new POST
    endpoints are added (/api/position-sizing, /api/risk-assessment).
  • Matcher quality (commit 7): a post-match quality gate and a
    reproducible evaluation harness.
  • Infrastructure (commit 8): stale-while-revalidate and in-flight request
    deduplication on the market cache; a per-IP sliding-window rate limiter.
  • Validation (commits 9–11): a Monte Carlo back-test with calibration
    sensitivity, a signal-generator fix that routes neutral-sentiment tweets
    to HOLD, and a metrics fix that computes Sharpe and win rate over
    active trades.
  • Review pass (commits 12–13): six correctness and documentation fixes
    identified during a full re-read of the PR. Detail in the "Review pass"
    section below.
  • Polish (commit 14): three follow-up corrections to the public-facing
    response shapes (SCALE_DOWN reachability, HOLD-override reasoning,
    alternative_sizing semantics). Detail in the "Polish" section below.

Matcher evaluation

Evaluation on the 30-tweet × 1,857-market fixture
(scripts/matcher-eval/run-eval.ts):

before after delta
total matches surfaced 99 86 −13
junk rate (any rule) 63.6% 40.7% −22.9 pp
thin-market (<$5k volume) 4.0% 0.0% −4.0 pp
extreme-price (<2% or >98%) 46.5% 25.6% −20.9 pp
cross-domain 29.3% 20.9% −8.4 pp
weak-signal 4.0% 0.0% −4.0 pp

Back-test

Monte Carlo over 500 replications at calibration = 1.0
(scripts/backtest/run-backtest.ts):

strategy active/slots total return Sharpe (active) max DD win rate (active) Brier
KELLY 1/1 +17.4% 0.923 3.5% 71.0% 0.206
FLAT 1/1 +1.9% 0.980 0.3% 71.0% 0.206
RANDOM 1/1 +0.6% 0.331 0.5% 50.0% 0.209

Kelly sensitivity to calibration:

calibration total return Sharpe (active) max DD win rate (active)
0.00 −1.2% −0.067 9.0% 26.2%
0.25 +3.7% 0.183 7.5% 38.0%
0.50 +8.3% 0.398 6.2% 49.0%
0.75 +13.5% 0.668 4.7% 61.6%
1.00 +18.4% 0.995 3.3% 73.2%

Performance degrades monotonically with calibration error. At
calibration = 0 the EV filter routes zero-edge signals to HOLD, which
bounds the total-return loss to −1.2%.

Back-test interpretation

The single active signal on the current corpus is Ethereum at $2,600,
yesPrice = 0.24, implied trueProb = 0.691. KELLY and FLAT both take
the YES side on that signal, so the observed differences reflect stake
sizing rather than directional divergence.

  • Total return scales linearly with stake size. KELLY stakes 10% of
    bankroll and FLAT stakes 1%; the observed 17.4% / 1.9% ratio matches
    the 10× stake ratio.
  • Max drawdown scales similarly with stake. KELLY's 3.5% versus FLAT's
    0.3% reflects position size, not additional per-dollar risk.
  • Per-trade Sharpe on a fixed Bernoulli payoff is approximately
    invariant to stake size. The 0.923 / 0.980 gap between KELLY and FLAT
    is attributable to the non-linear fee and slippage terms, not to
    strategy divergence.
  • Win rate is identical (71.0%) because both strategies take the same
    side on the same signal.

RANDOM serves as a control: it chooses YES or NO uniformly and
therefore converges on a 50% win rate and a near-zero Sharpe.

New surfaces

  • POST /api/position-sizing returns Kelly-optimal stake plus
    alternatives, given true_prob, yes_price, bankroll, and
    volume_24h.
  • POST /api/risk-assessment returns a TAKE / SCALE_DOWN / AVOID
    recommendation together with EV, variance, Sharpe, prob_profit, and
    a Kelly suggestion for a proposed trade.
  • analyze-text adds ev_per_dollar, kelly_fraction, and
    breakeven_prob under data.suggested_action, and
    metadata.implied_true_prob for debugging.
  • markets/arbitrage adds maxStake, expectedDollarProfit, and
    annualisedReturn to data.opportunities[], and accepts
    minExpectedProfit, minAnnualisedReturn, and minMaxStake as query
    parameters.
  • Every response carries X-RateLimit-Limit, X-RateLimit-Remaining,
    and X-RateLimit-Reset headers; 429 responses additionally carry
    Retry-After.

Testing

  • npm run typecheck passes for both tsconfigs.
  • npm run test:wallet passes 5 of 5.
  • npx tsx scripts/matcher-eval/run-eval.ts reproduces the matcher
    evaluation table above.
  • npx tsx scripts/backtest/run-backtest.ts reproduces the back-test
    tables above.
  • The rate limiter is exercised by direct import; see the commit body
    for infra: stale-while-revalidate cache and per-IP rate limiting
    for the burst pattern and per-IP isolation check.

Backwards compatibility

  • Existing response fields and query parameters retain their shapes.
  • Additions to ArbitrageOpportunity, Market, and TradingSignal
    are optional properties.
  • analyzeSentiment(text) retains its legacy signature.
  • KeywordMatcher accepts an optional fourth constructor argument to
    disable the quality gate; the evaluation harness uses this to produce
    the "before" baseline row.
  • The neutral-sentiment change in commit 10 is a behaviour correction:
    on neutral sentiment the endpoint now returns
    direction: 'HOLD'. The prior behaviour issued a directional
    recommendation derived from the 50/50 prior against the market price,
    which is not a supported use of the signal.

Review pass

A full re-read of the PR identified six correctness and documentation
issues. They are addressed in the two commits at the tip of the branch
(review: correct arbitrage profit formula, sentiment polarity match, and worst-case loss, and review: tighten arbitrage cache invalidation and rate-limit algorithm).

1. Arbitrage profit formula

estimateExecutableSizing computed
expectedDollarProfit = refinedEdge × maxStake. refinedEdge is profit
per $1 of bundle payout, while maxStake is dollars outlaid, so the
correct expression is refinedEdge × maxStake / (1 − refinedEdge).
Understatement was approximately 1–5% at typical edges and ~43% at
30%+ edges. File: src/api/arbitrage-detector.ts.

2. Arbitrage cache invalidation

cachedArbitrage previously had an independent TTL. When the market
cache refreshed mid-window, a subsequent request could receive fresh
market prices alongside arbitrage opportunities computed against the
previous snapshot. The arbitrage cache now records the cacheTimestamp
under which it was computed and invalidates whenever that timestamp
advances. File: api/lib/market-cache.ts.

3. Empty-result arbitrage sentinel

The "is the arbitrage cache populated?" check used
cachedArbitrage.length === 0 as the uninitialised sentinel, which
caused a legitimate no-arbitrage result to trigger a full O(n·m) rescan
on every subsequent request. An explicit timestamp sentinel now
distinguishes "not yet computed" from "computed and empty". File:
api/lib/market-cache.ts (same commit as item 2).

4. Rate-limit algorithm and documentation

The previous rate limiter was a fixed-window counter, not the
sliding-window implementation its docstring described; the INCR + EXPIRE claim was also inaccurate, as the Vercel KV path performs a
non-atomic read-modify-write. The limiter has been replaced with a
two-bucket weighted sliding window, and the docstring updated to
describe its behaviour (non-atomic KV, fail-open on KV failure,
in-process counter as the authoritative limiter within a warm
instance). The source field on the result now distinguishes
kv-with-local from local-only. File: api/lib/rate-limit.ts.

5. Sentiment polarity matching

analyzeSentimentForMarket matched BEARISH_LEXICON keys with
title.toLowerCase().includes(key), which produced false flips on
substrings (for example, fall inside "Falcons roster 2026"). Replaced
with a word-boundary regex that preserves multi-word key support.
File: src/analysis/sentiment-analyzer.ts.

6. Worst-case loss bound

EdgeResult.worstCaseLoss was 1 + cost, which propagated to the
/api/position-sizing and /api/risk-assessment responses as a loss
exceeding the stake. Binary-option longs on Polymarket and Kalshi
cannot lose more than the stake, and fees are already accounted for in
evPerDollar. worstCaseLoss is now bounded at 1; bestCaseGain is
floored at 0 for symmetry. File: src/analysis/edge.ts.

Polish

Three follow-up corrections to the public-facing response shapes,
delivered in the final commit on the branch.

1. SCALE_DOWN reachability without an explicit bankroll

/api/risk-assessment previously inferred bankroll = stake / maxFrac
when the caller omitted bankroll, which made stake / bankroll
identically equal to maxFrac and rendered the bankroll-fraction
branch of SCALE_DOWN structurally unreachable. The module now tracks
whether bankroll was supplied. When it is, both SCALE_DOWN branches
fire as documented. When it is not, the bankroll-fraction check is
skipped and a warning is appended instructing the caller to supply
bankroll for a complete assessment. The Kelly-vs-stake branch of
SCALE_DOWN continues to operate in both cases. The dollar worst-case
in assessRisk is also bounded at -stake (previously -stake * (1 + cost), consistent with the worstCaseLoss correction in the review
pass). File: src/analysis/risk.ts.

2. HOLD-override reasoning consistency

When computeEdge identifies raw edge on a side (YES or NO) but fees
and slippage push evPerDollar non-positive, buildSuggestedAction
overrode the direction to HOLD while leaving reasoning as the
original "YES underpriced at X%" string produced by computeEdge. The
override path now rewrites the reasoning to explain that raw edge
favoured the side but net EV is non-positive, so the returned payload
is internally consistent. File: src/analysis/signal-generator.ts.

3. alternative_sizing semantics

alternative_sizing.half_kelly and alternative_sizing.quarter_kelly
were computed as recommendedStake / 2 and recommendedStake / 4.
Because recommendedStake is already capped at min(kelly_cap, max_bankroll_fraction) of bankroll, those values did not correspond
to one-half and one-quarter of the full Kelly fraction. Both fields
are now derived from the uncapped full Kelly fraction (via
kellyFraction(shrunk_prob, yes_price)), with the same outer risk
cap applied so the "safer" sizings never exceed the recommended stake
or the caller's hard risk limit. A new full_kelly_fraction field is
exposed for clients that need the uncapped value. File:
api/position-sizing.ts.

Future improvements

The items below were identified during the review pass and are out of
scope for this PR. They are listed as a prioritised backlog for
subsequent iterations.

Correctness

  • Back-test RNG drift across strategies. KELLY can skip signals
    (no rng() call), FLAT calls rng() once per signal, and RANDOM
    calls it twice. On a multi-signal corpus the three strategies would
    not observe the same coin flips on the same underlying bet.
    Innocuous with the present single-signal corpus; should be resolved
    before expanding the fixture set, either by pre-materialising
    outcomes per signal or by assigning a dedicated RNG stream per
    strategy. scripts/backtest/run-backtest.ts.

  • breakevenProb semantics by side. The field is interpreted as
    "minimum trueProb for positive EV" on the YES side and "maximum
    trueProb for positive EV" on the NO side; the docstring states
    only "min". Proposed resolution: split into breakevenMin /
    breakevenMax, or add a side-aware docstring.
    src/analysis/edge.ts.

  • Additive fee accounting. EV subtracts cost from the win-leg
    and adds it to the loss-leg, treating fees as paid on top of the
    stake. This slightly overstates EV under Polymarket's
    slippage-dominated model and slightly understates it under Kalshi's
    profit-based taker fees; net impact on typical markets is
    approximately 2%. A physical-model rewrite would shift the
    back-test headline numbers by 1–3 pp and belongs in a dedicated PR
    that also updates the back-test fixtures.

Observability and consistency

  • Rate-limit KV TTL is windowSeconds × 2. windowSeconds is
    sufficient; the doubled TTL consumes additional KV memory without
    affecting correctness. api/lib/rate-limit.ts.

  • RateLimitResult.source union includes an unused 'disabled'
    variant.
    Either remove from the union type or introduce a
    disabled-by-config code path.

  • applyQualityGate.dropped telemetry is discarded. The gate
    produces per-reason drop counts (lowVolume, extremePrice,
    weakSignal), but KeywordMatcher.match consumes only .kept.
    Either emit drops to logs or metrics, or remove the field if unused.
    src/analysis/match-quality.ts.

  • Divergent NaN handling. passesVolume drops non-finite input
    while passesExtremePrice passes it through. A single rule should
    apply. src/analysis/match-quality.ts.

  • Divergent validation between position-sizing and
    risk-assessment.
    true_prob is required by the former and
    optional by the latter; yes_price bounds are aligned. Document the
    intentional difference or unify.

  • generateEventId uses a 32-bit hash. Birthday collisions become
    meaningful around 60,000–100,000 distinct tweets; adequate for
    current usage. Migrate to a base36-truncated SHA-256 before
    event_id is relied upon as a primary key.
    src/analysis/signal-generator.ts.

Deployment and operations

  • SWR and in-flight dedup are per-instance. The module-level
    cachedMarkets and inFlightFetch state does not cross function
    boundaries. Under Vercel's multi-instance scaling, concurrent
    requests routed to different instances each trigger an independent
    refresh. Migrating to a shared backend (for example, KV with a short
    TTL) is the appropriate remediation if cross-instance consistency
    becomes a requirement. api/lib/market-cache.ts.

  • markets.snapshot.json (1.8 MB) is checked into the repo.
    Reviewed manually: contents are public market data (titles, prices,
    URLs). No remediation required.

  • scripts/matcher-eval and scripts/backtest are not wired into
    npm run test.
    Reviewers must invoke them directly. Adding
    test:eval and test:backtest targets would make the evidence
    available as a single command.

  • Narrow back-test corpus. One active signal remains after the
    quality gate and tradability filter. The 500-replication Monte Carlo
    exercises the sizing math but does not robustly estimate production
    PnL. Calibration sensitivity partially compensates; expanding the
    fixture corpus is the next step.

  • KeywordMatcher constructor takes four positional arguments.
    Backwards compatible for 0–3 arguments, but subclasses or mocks that
    match the constructor signature require updating. Consider
    documenting the new argument or migrating to an options object.
    src/analysis/keyword-matcher.ts.


None of the above blocks merge. Recommended sequencing for the next
iteration: the back-test RNG drift fix before any corpus expansion,
and the SWR/dedup deployment note before the next deployment cycle
on a multi-instance Vercel project.

Three pure modules used by the signal and arbitrage code. fees.ts
models per-platform taker fees with a bounded adverse-execution
slippage term. edge.ts returns signed edge, fractional Kelly, and
breakeven probability for a bet at a given price. risk.ts wraps both
to produce EV, variance, and a TAKE/SCALE_DOWN/AVOID call on a
proposed trade.

No I/O in any of them; consumed by the commits that follow.
Replaces the bag-of-words scorer. Per-token weights, ~150 prediction-
market terms, multi-word phrases, emoji graphemes, intensifier and
hedge multipliers, and a three-token negation scope. ALL-CAPS and
trailing "!!" bump magnitude.

The analyzeSentiment(text) signature is unchanged so the keyword
matcher and signal generator keep working without modification.
The previous formula was edge = confidence * |p - price|, which drops
the sign. A bullish tweet on a 95c YES market still reported a large
edge because there was nowhere to buy. Signals now pick the side by
expected value and return signed edge, ev_per_dollar, kelly_fraction,
and breakeven_prob from edge.ts.

Other changes here:
- sentimentToProbability range narrowed from ±0.45 to ±0.25; tweet-
  level evidence rarely justifies a sharper prior.
- The confidence passed to Kelly is capped at 0.6 for lexicon-only
  signals.
- computeUrgency reads the new covered-bundle arbitrage semantics
  (spread is already net of modeled cost) and downgrades critical/
  high to medium when 24h volume is under $25k.
Leaves the covered-bundle detector from PR MusashiBot#2 alone and adds three
optional fields on each opportunity: maxStake, expectedDollarProfit,
and annualisedReturn. Sizing uses the impact-aware slippage model in
fees.ts and clamps the stake to $10 when refined edge is non-positive.
APR uses the earlier of the two endDates or falls back to 30 days.
analyze-text now returns ev_per_dollar, kelly_fraction, and
breakeven_prob on data.suggested_action, plus implied_true_prob under
metadata. markets/arbitrage accepts three new optional filters
(minExpectedProfit, minAnnualisedReturn, minMaxStake) that read the
fields added in the previous commit. health bumps the version to
2.1.0 and lists the new endpoints in its catalog.

No existing field or query parameter changed shape.
POST /api/position-sizing returns a Kelly-optimal stake plus half-
and quarter-Kelly alternatives given true_prob, yes_price, bankroll,
and volume_24h. Defaults to quarter-Kelly with a 10%-of-bankroll hard
cap. Runs a second pass so the stake feeds back into the slippage
model.

POST /api/risk-assessment takes a proposed trade (side, price, stake,
optional bankroll and expiry) and returns EV, variance, Sharpe,
prob_profit, Kelly-suggested stake, and a TAKE/SCALE_DOWN/AVOID call.

Both are wired in vercel.json and the local server. API-REFERENCE is
extended with request/response examples.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 20, 2026

@LechenWang is attempting to deploy a commit to the Victor's projects Team on Vercel.

A member of the Team first needs to authorize it.

@JonathanW666 JonathanW666 force-pushed the feat/trading-edge-and-sizing branch from c30f8e9 to a26a0e9 Compare April 20, 2026 19:02
The keyword matcher surfaces untradable matches: markets with
essentially no 24h volume, markets pinned near 1c or 99c, and single-
unigram broad hits (the typical case is a Fed-rate-cut tweet matching
an NBA penny market on the word "win"). The gate drops all three
before results are returned:

- volume floor: drop if 24h volume < $5k
- extreme-price: drop if yesPrice < 2% or > 98% unless the match has
  a phrase hit and confidence >= 0.75
- strong-signal: require either a phrase hit or >= 2 matched keywords
  at >= 0.55 confidence

The gate is on by default and can be disabled by passing false as the
KeywordMatcher's fourth constructor argument. The eval harness uses
that to produce its baseline row.

scripts/matcher-eval/ ships a reproducible evaluation:

  npx tsx scripts/matcher-eval/snapshot-markets.ts   # regen snapshot
  npx tsx scripts/matcher-eval/run-eval.ts

On the 30-tweet x 1,857-market fixture:

                             before   after    delta
  total matches surfaced         99      86     -13
  junk rate (any rule)         63.6%   40.7%   -22.9 pp
  thin-market (<$5k)            4.0%    0.0%    -4.0 pp
  extreme-price                46.5%   25.6%   -20.9 pp
  cross-domain                 29.3%   20.9%    -8.4 pp
  weak-signal                   4.0%    0.0%    -4.0 pp

Recall drops 13% while the overall junk rate drops 36% relative.
Changes to api/lib/market-cache.ts:
- in-flight request deduplication so concurrent callers during a
  cache miss share a single fetch promise instead of each making
  their own Polymarket/Kalshi roundtrip
- stale-while-revalidate up to MARKET_CACHE_SWR_SECONDS (default 60s)
  past the 20s TTL; stale is returned immediately while a single
  background refresh runs
- refresh no longer overwrites the cache with two empty arrays on a
  full-platform outage; last-known-good is retained

New api/lib/rate-limit.ts: sliding-window per-IP counter backed by
Vercel KV with an in-process Map fallback. Writes X-RateLimit-Limit /
-Remaining / -Reset headers on every response, returns 429 with
Retry-After once the bucket is exhausted, and fails open if KV is
unavailable. Wired into analyze-text, markets/arbitrage,
ground-probability, position-sizing, and risk-assessment with
per-endpoint budgets.

Verified directly: 40 bursted requests from one IP at a 30-rpm cap
produced 30 allowed + 10 denied; a different IP got a fresh 30-req
budget in the same window.
scripts/backtest/run-backtest.ts runs the signal pipeline over the
same 30-tweet corpus and 1,857-market snapshot used by the matcher
eval, and compares three sizing strategies over 500 replications:
KELLY (quarter-Kelly, 10% bankroll cap), FLAT ($100), and RANDOM
($100, random side).

Execution realism:
- only trades signals on markets priced in [0.10, 0.90] with 24h
  volume >= $25k; penny markets are theoretically +EV but not
  executable at any realistic size
- Kelly stake is computed against min(current, 2x starting) bankroll
  so a streak does not size trades past book depth
- fees and adverse-execution slippage come from fees.ts, so the
  PnL is apples-to-apples with what the API reports

Results at calibration = 1.0 (pooled-trade Sharpe, 500 reps):

  strategy   return   Sharpe   maxDD   winRate   Brier
  KELLY     +17.4%    0.360    3.5%   17.8%    0.206
  FLAT       +5.6%    0.570    1.7%   54.2%    0.239
  RANDOM     +1.8%    0.228    1.9%   49.1%    0.240

Kelly sensitivity sweep:

  calibration   return    Sharpe
  0.00          -1.2%    -0.033
  0.25          +3.7%     0.090
  0.50          +8.3%     0.188
  0.75         +13.5%     0.289
  1.00         +18.4%     0.377

At calibration 0 (signals are noise) Kelly loses small rather than
blowing up, because the EV check short-circuits to HOLD on negative-
expectation trades. Results persisted to scripts/backtest/fixtures/
result.json and regenerable from scratch.
@JonathanW666 JonathanW666 reopened this Apr 20, 2026
@JonathanW666 JonathanW666 force-pushed the feat/trading-edge-and-sizing branch from 874b73d to 2fe2a00 Compare April 20, 2026 20:34
sentimentToProbability returned 0.5 for neutral sentiment, which then
flowed into computeEdge as a hardcoded prior. computeEdge would find
positive EV on whichever side was cheaper and emit a YES/NO
recommendation, effectively betting against the market price based on
nothing.

Neutral sentiment means we have no directional evidence. When there is
no arbitrage to dominate, the right call is HOLD and deferral to the
market. generateSignal now short-circuits to that suggested_action in
the neutral case, with implied_true_prob set to the market yesPrice
so downstream consumers know we did not derive an independent prior.

This was surfaced while inspecting the backtest: 3 of 4 signals on the
fixture corpus had trueProb = 0.5 exactly, came from neutral-sentiment
tweets like "Real Madrid beat Barcelona 3-1", and were generating
coin-flip trades against the market. They are now filtered at signal
generation, not at the strategy layer.
The original computeMetrics counted skipped-signal slots (stake=0) in
the denominators of both winRate and Sharpe. A selective strategy
like KELLY, which correctly refuses low-evidence trades, was
penalised for doing so: its reported win rate was 17.8% and its
Sharpe 0.36 while FLAT, which took every signal, showed 54.2% and
0.57. Both numbers were an artefact of the zero-pad on the pnl path.

Metrics are now split:
  - totalReturn and maxDrawdown still walk the full pnlPath (zeros
    do not move equity, so they're correct as a full-path metric)
  - sharpe, winRate, meanPnl, stdPnl are computed over active trades
    (entries with pnl != 0) only
  - activeTrades and activeRate are reported alongside the other
    numbers so a reviewer can see how selective each strategy was

After the fix plus the matching signal-generator change that removes
coin-flip trades on neutral-sentiment tweets, KELLY and FLAT
converge to the same 71% win rate (they take the same single
well-calibrated bet per replication) and near-identical Sharpe
(0.92 vs 0.98). The total return gap (17.4% vs 1.9%) is the
intended effect of staking 10% of bankroll vs a flat $100, and
max drawdown scales the same way (3.5% vs 0.3%).
…nd worst-case loss

Arbitrage `expectedDollarProfit` in `estimateExecutableSizing` used
`refinedEdge * maxStake`. `refinedEdge` is profit per $1 of bundle
payout, while `maxStake` is dollars outlaid; the correct expression is
`refinedEdge * maxStake / (1 - refinedEdge)`. Under-statement was
approximately 1-5% at typical edges and ~43% at 30%+ edges.

Sentiment polarity flipping in `analyzeSentimentForMarket` used
`title.toLowerCase().includes(key)` against `BEARISH_LEXICON`, which
produced false matches on substrings (for example, `fall` inside
"Falcons roster 2026"). Replaced with a word-boundary regex that also
supports multi-word keys.

`EdgeResult.worstCaseLoss` was set to `1 + cost`, which propagated
through `/api/position-sizing` and `/api/risk-assessment` responses as
a loss exceeding the stake. Binary-option longs on Polymarket and
Kalshi cannot lose more than the stake, and fees are already reflected
in `evPerDollar`. Corrected to `1`; `bestCaseGain` floored at `0` for
symmetry.

Reproducibility unchanged: matcher evaluation junk-rate delta
-22.9 pp; back-test KELLY total return 17.4%, Sharpe (active) 0.923,
win rate (active) 71.0%. Typecheck and wallet tests pass.
`getArbitrage` previously validated its cache by TTL alone. A request
arriving after a market-cache refresh but before the arbitrage TTL
expired received fresh market prices alongside arbitrage opportunities
computed against the previous market snapshot. The arbitrage cache now
records the `cacheTimestamp` under which it was computed and
invalidates whenever that timestamp advances. The same change replaces
the empty-array "uncached" sentinel with an explicit timestamp
sentinel, so a legitimate no-arbitrage result no longer triggers a
full O(n*m) rescan on every subsequent request.

`api/lib/rate-limit.ts` has been replaced with a two-bucket weighted
sliding-window implementation (`current + previous * (1 - elapsed
fraction)`). The previous fixed-window counter permitted up to twice
the configured limit across a window boundary. The Vercel KV
read-modify-write sequence remains non-atomic; the docstring now
states this explicitly. The `source` field distinguishes
`kv-with-local` from `local-only` so operators can detect KV
degradation.

Verified: a 40-request burst at a 30-rpm cap yields 30 allowed and 10
denied, with per-IP isolation intact. Typecheck and wallet tests pass.
@JonathanW666 JonathanW666 force-pushed the feat/trading-edge-and-sizing branch from c81b7ba to 165edaf Compare April 20, 2026 22:18
…emantics

Three follow-up corrections surfaced by re-reading the public-facing
response shapes after the previous review pass.

`src/analysis/risk.ts` — `SCALE_DOWN` was structurally unreachable via
the bankroll-fraction branch whenever `bankroll` was omitted, because
the inferred bankroll (`stake / maxFrac`) made `stake / bankroll`
exactly equal to `maxFrac`. The module now tracks whether `bankroll`
was supplied: when it is, both SCALE_DOWN branches apply as documented;
when it is not, the bankroll-fraction check is skipped and a warning is
surfaced telling the caller to pass `bankroll` for a complete
assessment. The dollar worst-case is also bounded at `-stake`
(previously `-stake * (1 + cost)`, which propagated an impossible
"loss exceeds stake" figure to clients); `bestCase` is floored at `0`.

`src/analysis/signal-generator.ts` — when `computeEdge` recommends YES
or NO on raw edge but fees and slippage push `evPerDollar` non-positive,
`buildSuggestedAction` overrode the direction to `HOLD` but preserved
the original reasoning string (for example, "YES underpriced at X%").
The payload is now internally consistent: on the override path the
reasoning is rewritten to explain that raw edge favoured the side but
net EV is non-positive.

`api/position-sizing.ts` — `alternative_sizing.half_kelly` and
`quarter_kelly` were previously computed as `recommendedStake / 2` and
`recommendedStake / 4`. Because `recommendedStake` is already capped at
`min(kelly_cap, max_bankroll_fraction)` of bankroll, those values did
not correspond to one-half and one-quarter of the full Kelly fraction.
Both fields are now derived from the uncapped full Kelly fraction
(`kellyFraction(shrunk_prob, yes_price)`), with the same outer risk
cap applied so the "safer" sizings never exceed the recommended stake
or the caller's hard risk limit. A new `full_kelly_fraction` field
exposes the uncapped value for clients that want to reason about it
explicitly.

Verified: typecheck and wallet tests pass; matcher evaluation delta
unchanged at -22.9 pp junk rate; back-test KELLY total return 17.4%,
Sharpe (active) 0.923, win rate (active) 71.0%. Direct smoke tests
confirm all three behavioural corrections.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants