Skip to content

Add jfinqa: Japanese Financial Numerical Reasoning QA #1168

@ajtgjmdjp

Description

@ajtgjmdjp

Summary

I'd like to add jfinqa — a Japanese financial numerical reasoning QA benchmark — as a new task in lighteval.

About jfinqa

  • 1,000 questions across 3 subtasks:
    • Numerical Reasoning (550): Calculate growth rates, margins, ratios from financial statements
    • Consistency Checking (200): Verify internal consistency of figures
    • Temporal Reasoning (250): Analyze year-over-year trends
  • 68 companies from EDINET (Japan's securities filing system)
  • Covers J-GAAP, IFRS, and US-GAAP accounting standards
  • HuggingFace Dataset: ajtgjmdjp/jfinqa
  • GitHub: ajtgjmdjp/jfinqa

Metrics

Two metrics per subtask:

  1. Exact Match — with Japanese financial normalisation (fullwidth→halfwidth, △→minus, comma removal, NFKC)
  2. Numerical Match — 1% relative tolerance, handles kanji multipliers (千/百万/億/兆) and unit suffixes (円/ドル/bps)

Prior Art

Baselines (zero-shot, temperature=0)

Model Overall Numerical Consistency Temporal
GPT-4o 87.0% 80.2% 90.5% 99.2%
Gemini 2.0 Flash 80.4% 86.2% 83.5% 65.2%
GPT-4o-mini 67.7% 79.3% 83.5% 29.6%
Qwen2.5-3B 39.6% 46.4% 51.0% 15.6%

I have a PR ready — happy to adjust the implementation based on your feedback (e.g., inspect-ai format if preferred).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions