River Algorithm — AI Chat History Edition

English

River Algorithm is a personal digital profile weighting algorithm for local AI systems. This project is a special edition focused on batch-importing historical AI conversation data and extracting user profiles through LLM-simulated sleep processing.

Since this project processes past conversation records, the real-time interaction components of the River Algorithm are not included. By running this project, you can see the personal profile extracted from your past AI conversations and how it flows and evolves along the river of time.

Note: No LLM today is specifically trained or fine-tuned for personal profile extraction, so results will vary across models — some hallucinations are inevitable. If you spot anything absurd in your profile, please open an Issue — I'd love to improve it.

Warning: If you are using a remote LLM API (OpenAI, Anthropic, etc.) and your conversation history contains large amounts of code, encoded content, or very long messages, processing can burn through a surprising number of tokens — and real money. Please review your export data before running and clean out unnecessary content to keep costs under control. (Local models like Ollama are free to run and not affected by this.)

Features

Import your locally exported ChatGPT / Claude / Gemini conversation history into the database
LLM-powered profile extraction (remote LLM API or local Ollama)
Contradiction detection & timeline tracking
Monthly snapshot viewer
Relationship mapping
Local web viewer (Chinese / English / Japanese)

Prerequisites

Python 3.11 or 3.12
PostgreSQL
LLM API Key (e.g. OpenAI, Anthropic) or local Ollama

Quick Start

# 1. Clone the repository
git clone https://github.com/wangjiake/RiverHistory.git
cd RiverHistory

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure
# Edit config.yaml with your LLM API key and database settings

# 4. Initialize database
python setup_db.py --db myprofile

# 5. Import conversation data
# Place your export files in data/ (see data/README.md for details)
python import_data.py --chatgpt data/ChatGPT/conversations.json
python import_data.py --claude data/Claude/conversations.json
python import_data.py --gemini "data/Gemini/My Activity.html"
# Note: The Gemini export filename varies by language. Adjust the filename accordingly.

# 6. Run profile extraction
#    Format: python run.py <source> <count>
#    source: chatgpt / claude / gemini / all
#    count:  a number = process N conversations starting from the oldest
#            max     = process all conversations
#    All commands process conversations in chronological order (oldest first)

python run.py chatgpt 50       # ChatGPT only, 50 oldest conversations
python run.py claude max       # Claude only, all conversations
python run.py gemini 100       # Gemini only, 100 oldest conversations
python run.py all max           # All 3 sources merged together, sorted by time, process all (excludes demo)

# 7. View results
python web.py --db myprofile
# Open http://localhost:2345 in your browser

Note: Each run.py execution automatically clears all profile tables before writing new data. Source data tables are not affected. Safe to re-run at any time.

No Chat Data? Try the Demo

The project includes built-in test data, so you can experience the full workflow without exporting your own AI chat history:

Dataset	Character	Language	Sessions	Command
`--demo`	Lin Yutong	Chinese	50	`python import_data.py --demo`
`--demo2`	Lin Yutong (extended)	Chinese	50	`python import_data.py --demo2`
`--demo3`	Jake Morrison	English	20	`python import_data.py --demo3`

--demo2 and --demo3 clear the demo table before importing.

python setup_db.py                  # Create database and tables
python import_data.py --demo        # Import demo test data (or --demo2 / --demo3)
python run.py demo max              # Process all demo conversations
python web.py --db myprofile        # View the extracted profile

Reset Profile Data

Clear all processing and profile tables while keeping imported source data (chatgpt/claude/gemini/demo tables are not affected):

python reset_db.py                  # Clear profile data, keep source data
python reset_db.py --db mydb        # Specify database name

Exporting Conversations

Platform	Steps
ChatGPT	Settings → Data controls → Export data → Extract `conversations.json`
Claude	Settings → Account → Export Data → Extract `conversations.json`
Gemini	Google Takeout → Select Gemini Apps → Put `Gemini Apps` folder into `data/`

LLM Configuration

OpenAI API (recommended): Set llm_provider: "openai" in config.yaml and enter your API key.

Local Ollama: Install Ollama, pull a model with ollama pull qwen2.5:14b, and set llm_provider: "local".

Prompt language: Set the language field in config.yaml. Supported values: "zh" (Chinese), "en" (English), "ja" (Japanese). This controls the language of LLM prompts, not the web interface.

Project Structure

├── config.yaml          # LLM and database configuration
├── setup_db.py          # Initialize database and tables
├── import_data.py       # Import conversation exports into database
├── run.py               # Run profile extraction (perceive + sleep)
├── web.py               # Local web viewer (Flask, port 2345)
├── reset_db.py          # Clear profile tables, keep source data
├── build_core.py        # Compile core modules to .so/.pyd (Cython)
├── requirements.txt     # Python dependencies
├── data/                # Conversation export files (git-ignored)
│   ├── demo.json        # Demo: Lin Yutong (Chinese, 50 sessions)
│   ├── demo2.json       # Demo: Lin Yutong extended (Chinese, 50 sessions)
│   └── demo3.json       # Demo: Jake Morrison (English, 20 sessions)
├── agent/
│   ├── perceive.py      # Perception module — classify user input
│   ├── config/          # Configuration loader
│   ├── storage/         # Database operations
│   ├── utils/           # LLM client
│   └── core/            # Core profile extraction (compiled for distribution)
│       ├── sleep.py     # Main extraction pipeline
│       └── sleep_prompts.py  # Multilingual prompts (zh/en/ja)
└── templates/
    └── profile.html     # Web viewer template

中文

河流算法（River Algorithm） 是一套关于本地 AI 个人数字画像权重的算法。本项目是河流算法的特别篇，专注于将历史 AI 对话批量导入数据库，并通过 LLM 模拟睡眠处理来提取用户画像。

由于本项目处理的是历史会话记录，河流算法中实时交互相关的部分不包含在内。运行本项目后，你可以查看自己在过去与 AI 对话中留下的个人画像，以及随时间河流流动的变迁轨迹。

注意： 目前没有任何 LLM 是专门为个人画像提取训练或微调的，因此不同模型的提取结果会存在差异，偶尔出现"幻觉"在所难免。如果你发现画像中有离谱的内容，欢迎提交 Issue，我会持续改进。

警告： 如果你使用的是远端 LLM API（OpenAI、Anthropic 等），且对话记录中包含大量代码、编码内容或超长消息，处理过程会消耗大量 token，不加注意可能会烧掉不少钱。强烈建议在运行前先检查导出数据，删除不必要的内容，以免账单失控。（本地模型如 Ollama 不产生费用，无需担心。）

功能

将你从 ChatGPT / Claude / Gemini 导出的本地对话记录导入数据库
LLM 驱动的画像提取（支持远端 LLM API 或本地 Ollama）
矛盾检测与时间线追踪
月度快照查看
人际关系图谱
本地网页查看（中/英/日三语）

前置要求

Python 3.11 或 3.12
PostgreSQL
LLM API Key（如 OpenAI、Anthropic 等）或本地 Ollama

快速开始

# 1. 克隆仓库
git clone https://github.com/wangjiake/RiverHistory.git
cd RiverHistory

# 2. 安装依赖
pip install -r requirements.txt

# 3. 配置
# 编辑 config.yaml，填入你的 LLM API Key 和数据库信息

# 4. 初始化数据库
python setup_db.py --db myprofile

# 5. 导入对话数据
# 将导出文件放到 data/ 目录下（详见 data/README.md）
python import_data.py --chatgpt data/ChatGPT/conversations.json
python import_data.py --claude data/Claude/conversations.json
python import_data.py --gemini "data/Gemini/我的活动记录.html"
# 注意：Gemini 导出文件名因语言而异，请根据实际文件名修改命令

# 6. 运行画像提取
#    格式: python run.py <源> <数量>
#    源:   chatgpt / claude / gemini / all
#    数量: 数字 = 从最早开始处理 N 条, max = 处理全部
#    所有命令都按对话时间从旧到新的顺序处理

python run.py chatgpt 50       # 只处理 ChatGPT，从最早的开始，处理 50 条
python run.py claude max       # 只处理 Claude，全部处理
python run.py gemini 100       # 只处理 Gemini，从最早的开始，处理 100 条
python run.py all max           # 三个源的数据混在一起，按时间从旧到新，全部处理（不含 demo）

# 7. 查看结果
python web.py --db myprofile
# 打开浏览器访问 http://localhost:2345

注意： 每次运行 run.py 会自动清空所有画像表再重新写入，源数据表不受影响。可以放心重复运行。

没有对话数据？用 Demo 快速体验

项目自带测试数据，无需导出自己的 AI 对话即可体验完整流程：

数据集	人物	语言	对话数	命令
`--demo`	林雨桐	中文	50 组	`python import_data.py --demo`
`--demo2`	林雨桐（扩展）	中文	50 组	`python import_data.py --demo2`
`--demo3`	Jake Morrison	English	20 组	`python import_data.py --demo3`

--demo2 和 --demo3 会先清空 demo 表再导入。

python setup_db.py                  # 建库建表
python import_data.py --demo        # 导入测试数据（或 --demo2 / --demo3）
python run.py demo max              # 处理全部测试对话
python web.py --db myprofile        # 查看画像结果

清空画像数据

清空所有处理和画像表，保留已导入的源数据（chatgpt/claude/gemini/demo 表不受影响）：

python reset_db.py                  # 清空画像，保留源数据
python reset_db.py --db mydb        # 指定数据库

对话导出方式

平台	步骤
ChatGPT	Settings → Data controls → Export data → 解压得到 `conversations.json`
Claude	Settings → Account → Export Data → 解压得到 `conversations.json`
Gemini	Google Takeout → 选择 Gemini Apps → 解压，将 `Gemini Apps` 文件夹放入 `data/`

LLM 配置

OpenAI API（推荐）： 在 config.yaml 中设置 llm_provider: "openai" 并填入 API Key。

本地 Ollama： 安装 Ollama，拉取模型 ollama pull qwen2.5:14b，设置 llm_provider: "local"。

提示词语言： 在 config.yaml 中设置 language 字段，支持 "zh"（中文）、"en"（English）、"ja"（日本語）。该设置控制 LLM 提示词的语言，不影响网页界面。

项目结构

├── config.yaml          # LLM 和数据库配置
├── setup_db.py          # 初始化数据库和表结构
├── import_data.py       # 导入对话导出文件到数据库
├── run.py               # 运行画像提取（感知 + 睡眠整合）
├── web.py               # 本地网页查看（Flask，端口 2345）
├── reset_db.py          # 清空画像表，保留源数据
├── build_core.py        # 编译核心模块为 .so/.pyd（Cython）
├── requirements.txt     # Python 依赖
├── data/                # 对话导出文件（已 git-ignore）
│   ├── demo.json        # 测试数据：林雨桐（中文，50 组）
│   ├── demo2.json       # 测试数据：林雨桐扩展（中文，50 组）
│   └── demo3.json       # 测试数据：Jake Morrison（英文，20 组）
├── agent/
│   ├── perceive.py      # 感知模块 — 分类用户输入
│   ├── config/          # 配置加载
│   ├── storage/         # 数据库操作
│   ├── utils/           # LLM 客户端
│   └── core/            # 核心画像提取（编译后分发）
│       ├── sleep.py     # 主提取流程
│       └── sleep_prompts.py  # 多语言提示词（zh/en/ja）
└── templates/
    └── profile.html     # 网页模板

日本語

River Algorithm（河流アルゴリズム） は、ローカルAIにおける個人デジタルプロフィール重み付けアルゴリズムです。本プロジェクトは、過去のAI会話データを一括インポートし、LLMによる睡眠シミュレーション処理を通じてユーザープロフィールを抽出する特別版です。

本プロジェクトは過去の会話記録を処理するため、River Algorithmのリアルタイムインタラクション部分は含まれていません。本プロジェクトを実行すると、過去のAI会話から抽出された個人プロフィールと、時間の河の流れに沿った変遷の軌跡を確認できます。

ご注意： 現在、個人プロフィール抽出に特化して訓練・微調整された LLM は存在しないため、モデルによって抽出結果に差異が生じ、ハルシネーションも避けられません。おかしな内容を見つけた場合は、Issue を提出してください。継続的に改善してまいります。

警告： リモート LLM API（OpenAI、Anthropic など）を使用している場合、会話履歴に大量のコード、エンコードされたコンテンツ、または非常に長いメッセージが含まれていると、処理で驚くほどのトークン — つまり実際のお金を消費する可能性があります。実行前にエクスポートデータを確認し、不要なコンテンツを削除して費用を抑えてください。（Ollama などのローカルモデルは無料で実行でき、この影響を受けません。）

機能

ChatGPT / Claude / Gemini からローカルにエクスポートした会話履歴をデータベースにインポート
LLM駆動のプロフィール抽出（リモートLLM API またはローカル Ollama）
矛盾検出とタイムライン追跡
月次スナップショットビューア
人間関係マッピング
ローカルWebビューア（中国語 / 英語 / 日本語）

前提条件

Python 3.11 または 3.12
PostgreSQL
LLM API Key（OpenAI、Anthropic など）またはローカル Ollama

クイックスタート

# 1. リポジトリをクローン
git clone https://github.com/wangjiake/RiverHistory.git
cd RiverHistory

# 2. 依存関係をインストール
pip install -r requirements.txt

# 3. 設定
# config.yaml を編集し、LLM API Keyとデータベース情報を入力

# 4. データベースを初期化
python setup_db.py --db myprofile

# 5. 会話データをインポート
# エクスポートファイルを data/ に配置（詳細は data/README.md を参照）
python import_data.py --chatgpt data/ChatGPT/conversations.json
python import_data.py --claude data/Claude/conversations.json
python import_data.py --gemini "data/Gemini/マイ アクティビティ.html"
# 注意：Geminiのエクスポートファイル名は言語によって異なります。実際のファイル名に合わせてコマンドを変更してください

# 6. プロフィール抽出を実行
#    形式: python run.py <ソース> <件数>
#    ソース: chatgpt / claude / gemini / all
#    件数:   数字 = 最も古いものから N 件処理, max = 全件処理
#    すべてのコマンドは会話の時系列順（古い順）に処理されます

python run.py chatgpt 50       # ChatGPTのみ、最も古い50件を処理
python run.py claude max       # Claudeのみ、全件処理
python run.py gemini 100       # Geminiのみ、最も古い100件を処理
python run.py all max           # 全3ソースを時系列順に混合して全件処理（demoは含まない）

# 7. 結果を確認
python web.py --db myprofile
# ブラウザで http://localhost:2345 を開く

注意： run.py を実行するたびに、すべてのプロフィールテーブルが自動的にクリアされてから再書き込みされます。ソースデータテーブルは影響を受けません。何度でも安全に再実行できます。

チャットデータがない場合：デモで体験

プロジェクトにはテストデータが含まれているため、自分のAIチャット履歴をエクスポートしなくても完全なワークフローを体験できます：

データセット	キャラクター	言語	セッション数	コマンド
`--demo`	林雨桐	中国語	50 組	`python import_data.py --demo`
`--demo2`	林雨桐（拡張）	中国語	50 組	`python import_data.py --demo2`
`--demo3`	Jake Morrison	English	20 組	`python import_data.py --demo3`

--demo2 と --demo3 はインポート前にdemoテーブルをクリアします。

python setup_db.py                  # データベースとテーブルを作成
python import_data.py --demo        # デモテストデータをインポート（または --demo2 / --demo3）
python run.py demo max              # 全デモ会話を処理
python web.py --db myprofile        # 抽出されたプロフィールを確認

プロフィールデータのリセット

インポート済みのソースデータを保持したまま、すべての処理・プロフィールテーブルをクリアします（chatgpt/claude/gemini/demo テーブルは影響を受けません）：

python reset_db.py                  # プロフィールデータをクリア、ソースデータは保持
python reset_db.py --db mydb        # データベース名を指定

会話のエクスポート方法

プラットフォーム	手順
ChatGPT	Settings → Data controls → Export data → `conversations.json` を解凍
Claude	Settings → Account → Export Data → `conversations.json` を解凍
Gemini	Google Takeout → Gemini Apps を選択 → `Gemini Apps` フォルダを `data/` に配置

LLM設定

OpenAI API（推奨）： config.yaml で llm_provider: "openai" を設定し、API Keyを入力してください。

ローカル Ollama： Ollama をインストールし、ollama pull qwen2.5:14b でモデルを取得、llm_provider: "local" に設定してください。

プロンプト言語： config.yaml の language フィールドで設定。"zh"（中国語）、"en"（英語）、"ja"（日本語）に対応。LLMプロンプトの言語を制御します（Webインターフェースには影響しません）。

プロジェクト構成

├── config.yaml          # LLMとデータベースの設定
├── setup_db.py          # データベースとテーブルの初期化
├── import_data.py       # 会話エクスポートファイルをDBにインポート
├── run.py               # プロフィール抽出を実行（知覚 + スリープ統合）
├── web.py               # ローカルWebビューア（Flask、ポート 2345）
├── reset_db.py          # プロフィールテーブルをクリア、ソースデータは保持
├── build_core.py        # コアモジュールを .so/.pyd にコンパイル（Cython）
├── requirements.txt     # Python依存関係
├── data/                # 会話エクスポートファイル（git-ignore済み）
│   ├── demo.json        # デモ：林雨桐（中国語、50組）
│   ├── demo2.json       # デモ：林雨桐拡張（中国語、50組）
│   └── demo3.json       # デモ：Jake Morrison（英語、20組）
├── agent/
│   ├── perceive.py      # 知覚モジュール — ユーザー入力を分類
│   ├── config/          # 設定ローダー
│   ├── storage/         # データベース操作
│   ├── utils/           # LLMクライアント
│   └── core/            # コアプロフィール抽出（配布用にコンパイル）
│       ├── sleep.py     # メイン抽出パイプライン
│       └── sleep_prompts.py  # 多言語プロンプト（zh/en/ja）
└── templates/
    └── profile.html     # Webビューアテンプレート

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

River Algorithm — AI Chat History Edition

English

Features

Prerequisites

Quick Start

No Chat Data? Try the Demo

Reset Profile Data

Exporting Conversations

LLM Configuration

Project Structure

中文

功能

前置要求

快速开始

没有对话数据？用 Demo 快速体验

清空画像数据

对话导出方式

LLM 配置

项目结构

日本語

機能

前提条件

クイックスタート

チャットデータがない場合：デモで体験

プロフィールデータのリセット

会話のエクスポート方法

LLM設定

プロジェクト構成

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agent		agent
data		data
img		img
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build_core.py		build_core.py
config.yaml		config.yaml
import_data.py		import_data.py
requirements.txt		requirements.txt
reset_db.py		reset_db.py
run.py		run.py
setup_db.py		setup_db.py
web.py		web.py

License

wangjiake/RiverHistory

Folders and files

Latest commit

History

Repository files navigation

River Algorithm — AI Chat History Edition

English

Features

Prerequisites

Quick Start

No Chat Data? Try the Demo

Reset Profile Data

Exporting Conversations

LLM Configuration

Project Structure

中文

功能

前置要求

快速开始

没有对话数据？用 Demo 快速体验

清空画像数据

对话导出方式

LLM 配置

项目结构

日本語

機能

前提条件

クイックスタート

チャットデータがない場合：デモで体験

プロフィールデータのリセット

会話のエクスポート方法

LLM設定

プロジェクト構成

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages