Optical Character Recognition for Manchu script using multiple model architectures.
- CPU: Intel Core i9-13900KS (32 cores)
- GPU: NVIDIA RTX 6000 Ada Generation (49GB VRAM)
- RAM: 188GB
uv syncThe scripts/ folder contains the main entry points:
python scripts/train.pyTrains all models by default.
python scripts/train.py --target_model llama-32-11bTrain a specific model.
python scripts/train.py --target_model llama-32-11b qwen-25-3bTrain multiple models (space-separated).
python scripts/evaluate.pyEvaluates all models by default.
python scripts/evaluate.py --target_model llama-32-11bEvaluate a specific model.
python scripts/evaluate.py --target_model llama-32-11b qwen-25-3bEvaluate multiple models (space-separated).
python scripts/visualize.pyGenerates 11 publication-ready visualizations and saves them to results/paper/figures/:
- Accuracy bar plot - Model comparison
- Character Error Rate (CER) comparison
- F1 score comparison
- Inference time analysis
- Performance comparison chart (VLM models)
- Checkpoint evaluation trends
- Word length performance analysis
- VLM vs CRNN comparison
- Character error heatmap
- Model training curves
- Training stability (gradient norms)
# Custom output directory
python scripts/visualize.py --output ./my_figures
# Custom metrics directory
python scripts/visualize.py --metrics ./custom_metricsRun individual visualizations:
python scripts/figures/checkpoint_trends.py
python scripts/figures/comparison_vlm_vs_crnn.pyOutput: PNG (300 DPI) and PDF formats with Times New Roman fonts.
All configuration files are located in configs/ directory and use YAML format with hierarchical structure:
default: Base settings for all models- Model-specific sections: Override defaults (e.g.,
qwen-25-3b,llama-32-11b,crnn-base-3m)
# Basic usage - uses default config
python scripts/train.pyKey parameters:
default:
training:
num_train_epochs: 5
learning_rate: 2.0e-4
per_device_train_batch_size: 4
warmup_steps: 100
save_steps: 1000 # Checkpoint frequency
eval_steps: 1000 # Evaluation frequency
metric_for_best_model: "manchu_cer" # Best model selection metric
peft: # LoRA settings (VLM only)
r: 32
lora_alpha: 64
lora_dropout: 0.05Override for specific models:
llama-32-11b:
training:
learning_rate: 1.0e-4 # Lower learning rate for larger model
num_train_epochs: 5
crnn-base-3m:
training:
num_train_epochs: 100
batch_size: 16
learning_rate: 1e-3# Basic usage - uses default config
python scripts/evaluate.pyKey parameters:
default:
validation:
num_samples: 1000
step_num: best # Options: "best", "latest", or step number (e.g., 21000)
test:
num_samples: 753
step_num: bestCheckpoint selection:
best- Best checkpoint bymetric_for_best_modellatest- Most recent checkpoint<number>- Specific step (e.g.,21000→ checkpoint-21000)
Override for specific checkpoints:
qwen-25-3b:
validation:
num_samples: 1000
step_num: 106000 # Evaluate specific checkpoint
test:
num_samples: 218
step_num: 106000Dataset configuration and preprocessing settings:
default:
dataset_name: mic7ch/manchu_sub2_new_12Nov
train_split: train
val_split: validation
test_split: test
image_key: im
text_key: [roman, manchu]
cache: false # Set to true to cache downloaded dataModel definitions and base model mappings:
models:
- name: qwen-25-3b
base_model: unsloth/Qwen2.5-VL-3B-Instruct
model_class: VLM
- name: llama-32-11b
base_model: unsloth/Llama-3.2-11B-Vision-Instruct
model_class: VLM
- name: crnn-base-3m
base_model: crnn
model_class: CRNN- qwen-25-3b/7b: Qwen2.5-VL-3B/7B
- llama-32-11b: Llama-3.2-11B
- crnn-base-3m: Convolutional Recurrent Neural Network
Evaluation results are saved in results/ directory.