This project introduces a custom-designed Financial Agent system built to generate, refine, and evaluate stock analysis reports using large language models (LLMs). Our agents handle both Chinese A-share and US stock markets, and are designed to be modular, extensible, and evaluation-ready.
🔗 Built on: LangChain-Chatchat
🚀 Enhanced with: Custom Agents, Optimizers, and LLM Judge
📊 Evaluates with: LLMBox (extended for multi-model financial benchmarking)
At the heart of the project lies our Financial Agent system:
| Agent Type | Description |
|---|---|
AStock Agent |
Analyzes China A-share stocks, fetching historical data, indicators, charts |
USStock Agent |
Handles U.S. stocks with similar processing pipelines |
OPT Agent |
Refines raw LLM-generated reports into higher-quality financial insights |
Each Agent follows a structured pipeline:
- 📥 Data Gathering: Crawl price and financial data
- 🧾 Report Generation: Use prompts + LLMs to write market analysis
- ✍️ Report Optimization: OPT Agent improves fluency and structure
- 📈 Evaluation: LLM Judge benchmarks across different LLMs
All agents are implemented under the Fin_Agent/ directory.
. ├── Fin_Agent/ # 💡 Core Financial Agent system
│ ├── AStock.py # A-share agent
│ ├── USStock.py # US stock agent
│ ├── finreport_opt.py # OPT Agent for refining LLM reports
│ └── ... # Reports, backups, visuals
├── Chatbot/ # 🗨️ LangChain-based dialogue logic
├── Langchain-Chatchat/ # UI & chat backend (third-party)
├── LLMBox/ # 📊 LLM evaluator (extended from open-source)
├── docs/NLP/ # 📐 LLM Judge for report evaluation
└── README.md
Located under docs/NLP, the LLM Judge module allows structured evaluation of reports generated by different models.
- Compare two or more markdown reports
- Use reference data for grounding
- Output evaluation reports (clarity, factual consistency, financial relevance)
The system integrates with LangChain-Chatchat to support:
- Web-based conversational interaction
- Real-time Q&A about financial markets
- Embedding our Financial Agents inside a flexible chatbot UI
We include a customized version of LLMBox to test LLMs on various tasks.
- Evaluate models like GPT-4, DeepSeek, Gemini, etc.
- Benchmark with MT-Bench, QuAC, GPTEval, and custom datasets
- Added support for more APIs
- ✅ CSVs with historical prices
- ✅ Visualizations (
*.png) - ✅ Raw reports from different models
- ✅ Refined reports via OPT Agent
- ✅ Evaluation markdown from LLM Judge
This work builds upon the foundations of:
Distributed under the MIT License.
💬 Feel free to open issues, ask questions, or contribute ideas!