BondLens AI

Explainable Bond Analysis Agent

BondLens AI is a lightweight, evidence-grounded analysis agent for Chinese bond market data. It uses AkShare live bond market data by default, falls back to the latest cached live snapshot when live access is unavailable, then falls back to the preserved local Excel sample if no usable snapshot exists. Each answer returns a structured trace with an evidence ledger, answer judge, risk profile, guardrail status, and limitations.

Non-investment advice. For learning, research, and portfolio demonstration only.

Project page: https://phoenix0531-sudo.github.io/bondlens-ai/

Screenshots

Agent Workbench	Answer, Tool Trace, and Evidence
Risk Profile and Answer Judge	Replay Dashboard
GitHub Pages Project Page	Live-first data, deterministic tools, optional LLM enhancement, guardrails, replay, Docker, and CI in one portfolio-ready project.

Background

This project started as a 2024 undergraduate thesis project: a Flask-based bond data analysis system. The original thesis version is preserved and should not be rewritten:

Original thesis branch: undergraduate-thesis-2024
Current branch: main

The current branch upgrades the thesis project into an AI Agent / LLM Application / AI Engineer portfolio project while keeping the historical origin visible.

Repository Structure

This repository intentionally keeps two long-lived branches:

main: the modern BondLens AI portfolio project
undergraduate-thesis-2024: the original undergraduate thesis version

No release tag is kept because the original thesis branch is the preserved historical version.

Why This Is An Agent, Not A Chatbot

BondLens AI does not ask an LLM to guess financial answers. The agent follows a small deterministic loop:

Data resolver loads AkShare live bond data first, then a cached live snapshot, then data/testdata.xlsx when needed.
Planner classifies user intent and chooses tools.
Tools run local Python analysis over the active data frame.
Evidence is attached to the response as structured data and rendered as reviewer-readable claims.
Report is generated from the evidence, with risks and limitations.
Optional LLM can polish the answer only after the local evidence exists. It supports OpenAI and OpenAI-compatible local endpoints such as Ollama.
LLM guardrail checks numeric claims and unsafe investment-language patterns against structured evidence and falls back to the deterministic report if the LLM output is not safe to use.
Answer judge records whether model output was accepted, rejected by guardrails, or bypassed.
Evidence ledger, risk profile, and replay store make the answer auditable without showing raw JSON in the portfolio UI.
Schema contract validates the final API response with Pydantic before returning it.

If OPENAI_API_KEY is not set, the project still runs and uses deterministic fallback output.

Core Capabilities

Intent planning: market overview, bond search, ranking, outlier detection, full bond report
Tool trace: each planner/tool step is visible in the Web page and API response
Bond search by name, maturity, and yield range
Live data mode: AkShare bond_spot_deal current bond deal data
Security-master reconciliation: because bond_spot_deal does not provide native maturity, matched bonds are enriched from the local static sample and marked with maturity coverage metadata
Cached live snapshot mode: latest successful AkShare fetch is reused when the live endpoint temporarily fails
Local fallback mode: data/testdata.xlsx remains available for offline demos and deterministic tests
Market summary: sample count, yield distribution, volume statistics
Ranking by yield, volume, maturity, or price
Yield outlier detection with z-score
Bond-to-market comparison: yield percentile, volume percentile, maturity percentile, outlier status
Data source profile: requested mode, actual runtime mode, provider, fetch time, fallback reason, and legacy crawler boundary
Retrieval-augmented risk explanations for fixed-income concepts
Evidence quality scoring with confidence and freshness labels
LLM faithfulness guardrail for numeric evidence checks, unsafe investment-language checks, and safe fallback
Evidence ledger: turns tool outputs into claim/evidence/source/confidence records for review
Answer judge: explains why an LLM answer was accepted, rejected, or bypassed
Structured risk profile: data quality, credit context, liquidity, duration, outlier, and model-output risks
Replay dashboard: /replay summarizes recent Agent runs without exposing raw JSON by default
Pydantic response schema with /api/agent/schema
Lightweight /healthz endpoint for containers and deployment platforms
Agent eval and red-team eval suites for repeatable behavior and safety checks
Docker deployment with gunicorn

Agent Workflow

flowchart TD
    A[User Question] --> B[Data Source Resolver]
    B --> C[Planner]
    C --> D{Intent}
    D -->|market_overview| E[describe_market]
    D -->|bond_search| F[search_bonds]
    D -->|ranking| G[rank_bonds]
    D -->|outlier_detection| H[detect_yield_outliers]
    D -->|bond_report| I[search_bonds + compare_bond_to_market + market/ranking/outlier tools]
    E --> J[Structured Evidence]
    F --> J
    G --> J
    H --> J
    I --> J
    J --> K[generate_bond_report]
    K --> L[Risk explanation retrieval]
    L --> M[Evidence quality assessment]
    M --> N{OPENAI_API_KEY or OPENAI_BASE_URL}
    N -->|missing| O[Deterministic fallback]
    N -->|set| P[OpenAI or local LLM enhancement]
    P --> Q[LLM numeric and language guardrail]
    Q -->|passed| R[LLM final answer]
    Q -->|numeric or language failure| S[Deterministic fallback answer]
    R --> T[Answer Judge + Evidence Ledger + Risk Profile]
    S --> T
    O --> T
    T --> U[Replay Dashboard]

Tool Trace Example

User question: 搜索23附息国债26并给出收益率分析
-> data_source(mode=live, source=akshare_bond_spot_deal)
-> planner(intent=bond_report)
-> search_bonds(name=23附息国债26)
-> compare_bond_to_market()
-> describe_market()
-> rank_bonds(by=yield, top_n=5)
-> detect_yield_outliers(method=zscore, threshold=3.0)
-> generate_bond_report()
-> llm_guardrail(skipped: llm_disabled)
-> final answer

Tech Stack

Python 3.11
Flask
AkShare
Pandas / NumPy
OpenPyXL
OpenAI Python SDK, optional
Pytest + local agent evals
Docker Compose + gunicorn
GitHub Actions CI

Architecture

.
├── app.py                       # Flask app entry
├── bond_agent/
│   ├── agent.py                 # Agent orchestration and LLM fallback status
│   ├── planner.py               # Rule-based intent planner
│   ├── data_loader.py           # AkShare live loading, snapshot cache, Excel fallback
│   ├── evidence_ledger.py       # Claim/evidence/source/confidence ledger
│   ├── answer_judge.py          # Deterministic judge for LLM acceptance/fallback
│   ├── risk_profile.py          # Structured risk profile cards
│   ├── replay_store.py          # Sanitized local run replay summaries
│   ├── risk_knowledge.py        # Local fixed-income risk explanation retrieval
│   ├── evidence_quality.py      # Evidence scoring, freshness, and confidence labels
│   ├── llm_guardrail.py         # Numeric and risk-language checks for LLM answers
│   ├── schemas.py               # Pydantic API request/response contracts
│   └── tools.py                 # Local bond analysis tools
├── data/testdata.xlsx           # Static bond sample data
├── docs/index.html              # GitHub Pages project page
├── docs/deployment.md           # Docker, health check, and platform deployment notes
├── evals/
│   ├── agent_eval_cases.yml     # Behavior cases
│   ├── red_team_eval_cases.yml  # Safety boundary cases
│   ├── run_agent_evals.py       # Local eval runner
│   └── run_red_team_evals.py    # Red-team eval runner
├── templates/agent.html         # Agent UI
├── templates/replay.html        # Recent run replay dashboard
├── tests/                       # Unit and smoke tests
├── LICENSE
├── Dockerfile
└── docker-compose.yml

Quick Start With Docker

docker compose up --build

Open:

http://localhost:5000/agent

The container runs gunicorn:

gunicorn -b 0.0.0.0:5000 app:app

The Compose service is named bondlens-ai and uses /healthz for lightweight platform and container health checks.

Local Development

python -m pip install -r requirements-dev.txt
python app.py

Open:

http://localhost:5000/agent

Environment Variables

FLASK_ENV=production
SECRET_KEY=change-me-in-production
OPENAI_API_KEY=
OPENAI_MODEL=gpt-5.4-mini
OPENAI_BASE_URL=
OPENAI_API_STYLE=auto
OPENAI_TIMEOUT_SECONDS=20
BOND_DATA_MODE=auto
BOND_LIVE_CACHE_PATH=
BOND_LIVE_CACHE_MAX_AGE_HOURS=24
BOND_REPLAY_ENABLED=true
BOND_REPLAY_DIR=

SECRET_KEY: Flask session secret.
OPENAI_API_KEY: optional. If empty, deterministic fallback is used.
OPENAI_MODEL: configurable model for evidence-constrained answer enhancement.
OPENAI_BASE_URL: optional OpenAI-compatible endpoint. For local Ollama, use http://127.0.0.1:11434/v1.
OPENAI_API_STYLE: auto, responses, or chat. Keep auto for normal use; local endpoints usually use chat completions.
OPENAI_TIMEOUT_SECONDS: optional LLM request timeout. Defaults to 20 so slow local models safely fall back instead of timing out the web server.
BOND_DATA_MODE: auto, live, or static. auto tries AkShare first, then cached live snapshot, then local Excel fallback.
BOND_LIVE_CACHE_PATH: optional path for the AkShare snapshot CSV. Defaults to .tmp/bond_spot_deal_snapshot.csv.
BOND_LIVE_CACHE_MAX_AGE_HOURS: maximum accepted snapshot age before static fallback is used. Defaults to 24.
BOND_REPLAY_ENABLED: set to false to disable local run replay summaries. Defaults to true.
BOND_REPLAY_DIR: optional replay directory. Defaults to .tmp/replays, which is ignored by Git.

Local Ollama smoke example:

set OPENAI_BASE_URL=http://127.0.0.1:11434/v1
set OPENAI_MODEL=qwen2.5:1.5b
set OPENAI_API_STYLE=chat
python app.py

OPENAI_API_KEY can stay empty for local OpenAI-compatible endpoints that do not require authentication.

Small local models are useful for verifying that the LLM path runs end to end, but the deterministic evidence fields remain the source of truth for review and debugging.

When using Docker on Windows or macOS, point the container to the host Ollama service:

set OPENAI_BASE_URL=http://host.docker.internal:11434/v1
docker compose up --build

The API response exposes safe LLM state:

{
  "used_llm": false,
  "used_llm_in_final": false,
  "llm_status": "disabled",
  "llm_error": null,
  "llm_guardrail": {
    "status": "not_run",
    "numeric_status": "not_run",
    "language_status": "not_run"
  }
}

Example Questions

当前样本收益率分布是什么样？
搜索23附息国债26并给出收益率分析
按收益率列出最高的前5只债券
按成交量列出最活跃的前5只债券
按期限列出最长的前5只债券
有没有收益率异常的债券？
筛选收益率大于 3 的债券

API

POST /api/agent/query
Content-Type: application/json

{
  "question": "搜索23附息国债26并给出收益率分析",
  "data_mode": "auto"
}

Key response fields:

plan: planner intent, selected tools, ranking/search parameters
tools_used: tools actually used for the answer
tool_trace: human-readable step trace
data_evidence: machine-readable market/search/ranking/outlier/comparison evidence
data_source: active data source profile, including requested mode, runtime mode, provider, fetch time, row counts, and fallback reason
risk_explanations: retrieved fixed-income risk explanations
evidence_quality: score, confidence labels, coverage, freshness, and penalties
evidence_ledger: reviewer-readable claim, evidence, source, tool, and confidence records
answer_judge: final answer acceptance/rejection status for LLM output
risk_profile: structured data quality, credit, liquidity, duration, outlier, and model-risk cards
final_answer: either the LLM answer if it passes guardrails, or the deterministic report
final_answer_source: llm or deterministic_fallback
llm_enhanced_answer: raw LLM answer kept for debugging when available
llm_guardrail: numeric faithfulness status, unsafe risk-language status, score, unsupported numeric claims, and blocked phrases
llm_status: disabled, success, or failed

Additional operational endpoints:

GET /healthz
GET /api/agent/schema
GET /replay

/api/agent/schema returns the Pydantic JSON schemas for the request, response, health check, and error payloads. /replay shows sanitized recent run summaries for interview demos and debugging replay.

Deployment notes are available in docs/deployment.md.

Data Source Boundary

The current Agent path uses a live-first data strategy:

Primary:       AkShare bond_spot_deal
Snapshot:      .tmp/bond_spot_deal_snapshot.csv
Final fallback: data/testdata.xlsx

AkShare documents bond_spot_deal as the ChinaMoney current bond deal market interface. The native fields used by BondLens AI are bond name, clean price, latest yield, BP change, weighted yield, and trading volume. The live endpoint does not provide maturity, so BondLens AI enriches matched bond names from the local static sample and reports maturity_coverage in data_source.

The default runtime mode is auto: fetch live data first, write the normalized result to a local CSV snapshot, and use that snapshot if a later live request fails. If both live fetch and snapshot fallback are unavailable or stale, the Agent falls back to the local workbook. The /agent page and API also support:

auto   -> live first, cached snapshot second, local fallback third
live   -> live source requested; fallback reason is shown if it degrades
static -> local Excel only

The local fallback remains:

data/testdata.xlsx

The workbook contains more than 3,000 bond sample rows with fields such as bond name, maturity, clean price, closing yield, weighted yield, and trading volume. It is used for offline demos, deterministic CI, and fallback behavior.

The live snapshot is intentionally stored under .tmp/ by default and is not committed to Git. This keeps the repository clean while still making local demos resilient when the public endpoint is temporarily unavailable.

The legacy crawler is preserved in undergraduate-thesis-2024 as thesis-era historical code only. It targeted old CNSTOCK news pages, depended on MongoDB and thesis-era text-analysis modules, and is not present in the current main runtime. During repository verification on May 26, 2026, the old CNSTOCK HTTP endpoints returned 403 Forbidden to automated requests, so this project does not present them as an active or reliable live data source.

Risk Explanation Layer

BondLens AI includes a local retrieval-augmented explanation layer for fixed-income risk concepts. After the Python tools produce evidence, the Agent retrieves relevant snippets from a curated local knowledge base covering:

yield interpretation
liquidity risk
maturity and duration sensitivity
yield outlier review
credit-context limitations
live/static data boundaries

This keeps explanations grounded and repeatable without requiring an external vector database or live LLM call.

Evidence Quality

Every Agent answer includes an evidence_quality object with:

score: 0-100 evidence quality score for the current answer
level: low, medium, or high for the active evidence set
analysis_confidence: confidence in the descriptive analysis
decision_confidence: intentionally low because issuer rating, credit event, macro curve, and full security master data are not attached
data_freshness: live_fetch, cached_live_snapshot, or static_snapshot
coverage: which evidence blocks were available
penalties: missing context that limits conclusions

Evidence Ledger, Answer Judge, and Replay

The default Web UI avoids raw JSON/code-like diagnostic panels. Instead it presents:

Evidence ledger: claim/evidence/source/confidence records derived from the active tool outputs.
Answer judge: a deterministic acceptance layer showing whether LLM text was accepted, rejected by guardrails, or bypassed.
Risk profile: structured cards for data quality, credit context, liquidity, duration, yield outliers, and model-output risk.
Replay dashboard: /replay stores sanitized run summaries under .tmp/replays by default.

Raw machine-readable contracts remain available through /api/agent/query and /api/agent/schema.

Agent Eval

Run deterministic behavior checks:

python evals/run_agent_evals.py

Run red-team safety checks:

python evals/run_red_team_evals.py

The eval suite checks:

expected planner intent
expected tools
required answer keywords
optional forbidden answer keywords
investment-advice and guaranteed-return boundary cases

It does not call OpenAI.

Tests

python -m pytest -q

Coverage includes:

planner intent classification
intent-aware tool routing
data source metadata
risk explanation retrieval
evidence quality assessment
market statistics
ranking tools
yield outlier detection
bond-to-market comparison
concrete bond report behavior
LLM disabled/success/failed status with mocks
LLM numeric and unsafe risk-language guardrails
evidence ledger, answer judge, risk profile, and replay store
Pydantic Agent response schema
health check and schema endpoints
live snapshot cache fallback
Flask page/API smoke tests
eval case loading

Repository Policy

The public repository is intentionally kept compact: source code, tests, evals, Docker, docs, screenshots, CI, and license. Generic community templates were removed because this is a personal portfolio project rather than an open-source collaboration hub.

The recommended branch policy is to protect main and require the CI workflow to pass before merging. The original thesis branch remains a historical reference and should not receive modern feature work.

Data Boundary

All financial conclusions are computed from the active data source shown in each response:

AkShare bond_spot_deal, the cached AkShare snapshot, or data/testdata.xlsx when static/fallback mode is active

The agent does not invent issuer ratings, credit events, macro views, or investment recommendations. Legacy crawler code is preserved only in the thesis branch; the current main branch uses AkShare live data plus the local Excel fallback.

Modern Project Cleanup

The main branch removes legacy login/database code, obsolete crawler code, old thesis UI pages, IDE metadata, and unreferenced static dumps. This is safe because:

undergraduate-thesis-2024 preserves the original repository state.
Current Flask routes only serve BondLens AI and its API.
Core bond sample data, Agent code, tests, Docker, and README documentation are retained.

Interview Talking Points

Tool calling design: deterministic planner maps user intent to local Python tools.
Live-first source design: AkShare live data is the default, with cached live snapshot and static fallback layers for reliability.
Evidence constraint: final answers are generated from data_evidence, not free-form finance guessing.
Evidence ledger: UI turns data evidence into auditable claims instead of dumping raw JSON.
Local LLM compatibility: OpenAI-compatible endpoints can exercise the LLM path without a paid API key.
LLM guardrail: numeric claims and unsafe investment-language phrases are checked before an LLM answer can become final.
Answer judge and replay: accepted/rejected model output is visible and recent runs can be reviewed.
Fallback design: no API key required; OpenAI/local LLM path is optional and observable.
Risk boundary: output always includes limitations and non-investment-advice language.
Eval method: local behavior evals and red-team evals test intent, tool selection, answer constraints, and safety boundaries.
Dockerization: gunicorn runtime, healthcheck, and reproducible dependency install.
Legacy migration: original thesis version preserved, modern branch cleaned for portfolio use.

Roadmap

Add issuer ratings, bond master data, and curve context around the live market feed
Expand RAG from local snippets to document-backed retrieval
Add PDF/Markdown report export
Add richer evidence-consistency evals across live snapshots and static fallback
Add duration, convexity, credit spread, and liquidity buckets
Add a background security-master refresh job when a stable bond detail source is available

License

MIT. Keep the thesis origin and author context visible when using this project for learning, portfolio review, or interview discussion.

Disclaimer

BondLens AI does not provide investment advice, trading advice, ratings opinions, or return guarantees. Outputs are for learning, research, and engineering demonstration only.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
bond_agent		bond_agent
data		data
docs		docs
evals		evals
templates		templates
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
app.py		app.py
docker-compose.yml		docker-compose.yml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

BondLens AI

Screenshots

Background

Repository Structure

Why This Is An Agent, Not A Chatbot

Core Capabilities

Agent Workflow

Tool Trace Example

Tech Stack

Architecture

Quick Start With Docker

Local Development

Environment Variables

Example Questions

API

Data Source Boundary

Risk Explanation Layer

Evidence Quality

Evidence Ledger, Answer Judge, and Replay

Agent Eval

Tests

Repository Policy

Data Boundary

Modern Project Cleanup

Interview Talking Points

Roadmap

License

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages