HypoAgent

This is the official code repository for HypoAgent: An Agentic Framework for Interactive Abductive Hypothesis Generation over Knowledge Graphs.

1. Environment Setup

conda create -n hypoagent python=3.10
conda activate hypoagent
pip install -r requirements.txt

Note: PyTorch is installed with CUDA 11.8 support. If you use a different CUDA version, install the matching PyTorch build from pytorch.org.

LLM API Configuration

Copy the API key template and fill in your own credentials:

cp akgr/configs/api_keys.yml.example akgr/configs/api_keys.yml
# Edit akgr/configs/api_keys.yml with your API keys

The agent modules use smolagents OpenAIServerModel to call LLMs. You can configure any OpenAI-compatible API provider in akgr/configs/api_keys.yml.

2. Data Sampling

Sample training data from the knowledge graph:

python -m akgr.sampling.sample_parallel -s="full" -a=32 -p=16

This generates sampled data under ./sampled_data/ for all configured datasets.

3. Training the Hypothesis Generation Model

3.1 Unconditional Training

Train a lightweight hypothesis generation model without any condition control:

CUDA_VISIBLE_DEVICES=0 python -m akgr.abduction_model.main \
    --condition='unconditional'
    --modelname='GPT2_6_act_nt' \
    --data_root='./sampled_data/' -d='BioKG' --scale='full' -a=32 \
    --checkpoint_root='checkpoints/' \
    --result_root='./results/' \
    --save_frequency 5 \
    --mode='training'

3.2 Multi-Condition Training

Train with multiple types of control signals (pattern, entity, relation, etc.) simultaneously:

CUDA_VISIBLE_DEVICES=0 python -m akgr.abduction_model.main \
    --condition='multi' \
    --multi_conditions='pattern,entity,entitynumber,relation,relationnumber' \
    --random_multi \
    --seed 42 \
    --modelname GPT2_6_act_nt \
    --accelerate \
    --data_root ./sampled_data/ \
    -d PharmKG8k \
    --scale full \
    -a 32 \
    --checkpoint_root ./results/ \
    --result_root ./results/multi-pharmkg8k/ \
    --save_frequency 5 \
    --mode training

DATA And CKPT

You can directly using our sampled data and ckpt in https://hkustconnect-my.sharepoint.com/:f:/g/personal/ygaodi_connect_ust_hk/IgB-jL7WLo2VRKy1fkTlPvOXAfIVkXtnrcxMQOSWDQkT9Qo?e=KhNuJ3.

4. Running the Agent

All agent scripts require a trained multi-condition checkpoint and an LLM API configured in akgr/configs/api_keys.yml.

4.1 Single-Turn Agent (`loop.py`)

Single-turn hypothesis generation with iterative self-refinement. The agent generates an initial hypothesis, analyzes its quality via hypothesis fragment diagnosis and neighborhood search, then refines it over multiple rounds.

# Run on batch data
CUDA_VISIBLE_DEVICES=0 python -m akgr.agent.loop \
    --mode run \
    --dataname BioKG \
    --checkpoint checkpoints/BioKG-full-32-multi.pth \
    --data_root ./data/ \
    --max_rounds 3 \
    --jaccard_threshold 0.95

# Run on a single demo case
CUDA_VISIBLE_DEVICES=0 python -m akgr.agent.loop --mode case

4.2 Multi-Turn Agent (`multi-turn.py`)

Multi-turn dialogue-based hypothesis generation. The user provides a sequence of follow-up questions across multiple turns, and the agent progressively refines the hypothesis.

# Run on batch data
CUDA_VISIBLE_DEVICES=0 python -m akgr.agent.multi-turn \
    --mode run \
    --dataname BioKG \
    --checkpoint checkpoints/BioKG-full-32-multi.pth \
    --data_root ./data/ \
    --analysis

# Run on a single demo case
CUDA_VISIBLE_DEVICES=0 python -m akgr.agent.multi-turn --mode case --analysis

The --analysis flag enables the RCA-Agent analysis for quality refinement when Jaccard is below threshold.

4.3 Unconditional Agent (`uncondition.py`)

Starts from an unconditional hypothesis (no user conditions), then uses the agent to automatically analyze the KG structure and derive structural/semantic/hybrid conditions for refinement.

# Run on batch data
CUDA_VISIBLE_DEVICES=0 python -m akgr.agent.uncondition \
    --mode run \
    --dataname DBpedia50 \
    --checkpoint checkpoints/DBpedia50-full-32-multi.pth \
    --data_root ./data/

# Run on a single demo case
CUDA_VISIBLE_DEVICES=0 python -m akgr.agent.uncondition --mode case

4.4 Evaluate Agent Results

# Evaluate single-turn results
python -m akgr.agent.judge --dataname BioKG --modelname DeepSeek-V4-Flash

# Evaluate multi-turn results
python -m akgr.agent.judge_multi --dataname BioKG --modelname DeepSeek-V4-Flash --analysis

# Evaluate unconditional results
python -m akgr.agent.judge_uncondition --dataname DBpedia50 --modelname gpt-5.4-mini --analysis

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
akgr		akgr
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HypoAgent

1. Environment Setup

LLM API Configuration

2. Data Sampling

3. Training the Hypothesis Generation Model

3.1 Unconditional Training

3.2 Multi-Condition Training

DATA And CKPT

4. Running the Agent

4.1 Single-Turn Agent (`loop.py`)

4.2 Multi-Turn Agent (`multi-turn.py`)

4.3 Unconditional Agent (`uncondition.py`)

4.4 Evaluate Agent Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HypoAgent

1. Environment Setup

LLM API Configuration

2. Data Sampling

3. Training the Hypothesis Generation Model

3.1 Unconditional Training

3.2 Multi-Condition Training

DATA And CKPT

4. Running the Agent

4.1 Single-Turn Agent (loop.py)

4.2 Multi-Turn Agent (multi-turn.py)

4.3 Unconditional Agent (uncondition.py)

4.4 Evaluate Agent Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

4.1 Single-Turn Agent (`loop.py`)

4.2 Multi-Turn Agent (`multi-turn.py`)

4.3 Unconditional Agent (`uncondition.py`)

Packages