Skip to content

Latest commit

 

History

History
653 lines (468 loc) · 23.2 KB

File metadata and controls

653 lines (468 loc) · 23.2 KB

FrontierTextHandler

A utility to read text from Monster Hunter Frontier, edit and reinsert. It is roughly a Python rewrite of FrontierTextTool (from ReFrontier, by mhvuze) in Python.

Requirements

  • Python 3.10+ (uses modern type hints such as list[str])
  • No external dependencies (pure standard library)

Install

Download the repository and run command from the main folder.

git clone https://github.com/Houmgaor/FrontierTextHandler.git
cd FrontierTextHandler

Usage

# This can save lives
python main.py --help

To extract the data:

  1. Place game files (mhfdat.bin, mhfpac.bin, mhfinf.bin) in a data/ folder.
  2. Run main.py.

Note on file formats: Game files are both encrypted (ECD/EXF) and compressed (JKR). This tool handles both layers automatically:

  • Extraction: Auto-decrypts ECD/EXF and auto-decompresses JKR
  • Reimport: Use --compress --encrypt to produce game-ready files

Output data will be in output/*.csv (UTF-8) and output/*.json. Pass --refrontier-tsv if you also want the legacy output/refrontier.csv (Shift-JIS TSV) for ReFrontier interop — it is opt-in since 1.7.0.

Extract all data

To extract all available text sections at once:

python main.py --extract-all

This reads headers.json and extracts every defined section, creating CSV and JSON files in output/. The tool automatically maps xpaths to their corresponding files:

  • dat/* sections → data/mhfdat.bin
  • pac/* sections → data/mhfpac.bin
  • inf/* sections → data/mhfinf.bin

Extract specific data

You can customize which data will be extracted. For instance to extract only the legs armor names from mhfdat.bin:

python main.py --xpath=dat/armors/legs

It will create a file output/dat-armors-legs.csv. A JSON file (output/dat-armors-legs.json) is also produced alongside the CSV.

For a full list of which game binary holds which xpath (weapons, armors, quests, Felyne dialogue, squad text, ...), see docs/game-files.md.

Change the game files

Using a CSV file, you can insert new strings (such as translations) in the original MHFrontier game.

The CSV file should follow this convention:

  1. The first column (location) of the file should be the original datum location (with format [offset]@[original file name]).
  2. The second column (source) is the original string value.
  3. The third column (target) is the new string value.

To update the file, use --csv-to-bin [input CSV] [output BIN file]. JSON files are also accepted as input. It will only add strings if "target" is different from "source". For instance:

python main.py --csv-to-bin output/dat-armors-legs.csv data/mhfdat.bin

The modified file is saved to output/mhfdat-modified.bin.

Compress after import

To automatically compress the modified binary using JKR HFI compression:

python main.py --csv-to-bin output/translations.csv data/mhfdat.bin --compress

This creates output/mhfdat-modified.bin with JKR compression applied. The compression log shows the size reduction achieved.

Tip: To produce a game-ready file in one step, add --encrypt:

python main.py --csv-to-bin output/translations.csv data/mhfdat.bin --compress --encrypt

Accent folding for European languages

MH Frontier's custom bitmap font covers JIS X 0208 + basic ASCII but not the accented Latin characters that European languages need (é, è, à, ô, ç, œ, etc.). Until the in-game font is extended, use --fold-unsupported-chars to fold diacritics down to their nearest ASCII equivalents on import:

python main.py --csv-to-bin output/dat-armors-head.csv data/mhfdat.bin \
    --fold-unsupported-chars --compress --encrypt

The folding is intentionally lossy and only happens on the way to the binary. Keep your source CSV with proper accents — the folding can be removed once the font supports the missing glyphs.

In-place section rebuild

When --csv-to-bin is combined with --xpath, only the target section is rewritten in the binary file. This is useful when you want to update a single section without touching the rest of the file:

python main.py --csv-to-bin output/dat-armors-legs.csv data/mhfdat.bin --xpath=dat/armors/legs

Stable index keys (default since 1.6.0)

Every extracted CSV/JSON file keys each string by its index — the slot number in the section's pointer table. Indexes survive upstream string-length changes that would shift raw byte offsets, so re-extracted files stay easy to merge with existing translations: slot 37 is still slot 37 even if earlier strings grow or shrink. This is the default for every extractor (--extract-all, --xpath=…, --quest, --quest-dir, --scenario, --scenario-dir, --npc, --npc-dir, --ftxt).

# Default 1.6.0 extraction
python main.py --xpath=dat/armors/head data/mhfdat.bin

# Default batch extraction
python main.py --extract-all

The resulting CSV is three columns — no offset, no filename:

index,source,target
0,オリジナル,Traduction
1,未翻訳,

JSON output uses the same shape and records the source binary, xpath, and a content fingerprint in metadata instead of repeating them on every row:

{
  "metadata": {
    "source_file": "mhfdat.bin",
    "xpath": "dat/armors/head",
    "version": "1.6.0",
    "fingerprint": "a1b2c3d4e5f60718"
  },
  "strings": [
    {"index": 0, "source": "オリジナル", "target": "Traduction"}
  ]
}

The fingerprint is the first 16 hex chars of SHA-256 over the decrypted, decompressed binary. At import time the importer recomputes it on the target file and warns loudly on mismatch — that catches the most dangerous failure mode (applying a translation extracted from one game version to a different version, or to a binary that already has translations applied). The warning does not abort the import; the user can still proceed if they know what they're doing.

Note: CSV vs JSON asymmetry. The fingerprint check fires for JSON imports only — CSV files deliberately stay minimal (index,source,target with no metadata) so they render cleanly in spreadsheets and on GitHub. If you want fingerprint protection, import the JSON sidecar that the extractor writes alongside every CSV; if you only need the human- friendly format, the CSV is enough but you lose the cross-version safety net. The xpath inference (from filename) still works for both.

When importing an index-keyed file against a headers.json-backed section, the importer auto-detects the format and infers the xpath from (1) the JSON metadata.xpath field or (2) the CSV/JSON filename — dat-armors-head.csv resolves to dat/armors/head if that xpath exists in headers.json. So this just works:

python main.py --csv-to-bin output/dat-armors-head.csv data/mhfdat.bin \
    --compress --encrypt

For standalone file formats (FTXT, NPC dialogue, scenario, quest files), the importer re-extracts the source binary with the matching format-specific extractor and aligns index-keyed translations positionally against the live entries — no xpath needed:

# Quest file: extract, edit, re-import, round-trip in index form
python main.py --quest data/quests/quest_001.bin
# …edit output/quest-quest_001.csv…
python main.py --csv-to-bin output/quest-quest_001.csv data/quests/quest_001.bin

Opting back into the legacy offset format

Pass --legacy-offset to emit the pre-1.6.0 location,source,target shape instead:

python main.py --legacy-offset --xpath=dat/armors/head data/mhfdat.bin

Use this only when you need to interoperate with tooling that hasn't yet adopted the index format; the importer accepts both forms either way, so mixing is fine during a migration.

The --with-index flag that was opt-in in 1.5.0 is still accepted as a silent no-op alias, so scripts written against the 1.5.0 behaviour keep working. The ReFrontier-compatible TSV output (export_for_refrontier) and the refrontier_to_csv helper stay offset-keyed because their inputs carry raw ReFrontier offsets and have no section context to index against. Since 1.7.0 the TSV is opt-in via --refrontier-tsv (or refrontier_tsv=True on the Python API); the modern UTF-8 CSV/JSON covers every round-trip the importer needs and the strict Shift-JIS re-encode crashed extraction on inputs containing \ufffd bytes.

Decrypt files

Decrypt an ECD/EXF-encrypted file manually:

python main.py --decrypt data/mhfdat.bin output/mhfdat-decrypted.bin

Use --save-meta to preserve the encryption header in a .meta file, which allows re-encryption with the original parameters later:

python main.py --decrypt data/mhfdat.bin output/mhfdat-decrypted.bin --save-meta

FTXT files

Extract text from standalone FTXT text files (magic 0x000B0000):

python main.py --ftxt data/some_ftxt_file.bin

Quest files

Extract text from quest .bin files:

# Single quest file
python main.py --quest data/quest_file.bin

# Batch extract all quest files in a directory
python main.py --quest-dir data/quests/

NPC dialogue

Extract and reimport NPC dialogue from stage dialogue binary files:

# Extract from a single file
python main.py --npc data/npc_dialogue.bin

# Batch extract from a directory
python main.py --npc-dir data/npc/

# Import translations back to binary
python main.py --npc-to-bin output/npc_dialogue.csv data/npc_dialogue.bin

Scenario files

Extract and reimport text from story scenario .bin files (Basic quests, Veteran quests, Diva Exchange, Diva Story). These files use a multi-chunk container format with quest names, NPC dialog (@RETURN, @MYNAME, {c05}…{/c} color codes), and JKR-compressed menu text.

# Extract from a single file (outputs CSV + JSON)
python main.py --scenario data/scenarios/0_0_0_0_S17_T2_C0.bin

# Batch extract from a directory
python main.py --scenario-dir data/scenarios/

# Import translations back to binary (accepts CSV or JSON)
python main.py --scenario-to-bin output/scenario-0_0_0_0_S17_T2_C0.csv data/scenarios/0_0_0_0_S17_T2_C0.bin

Validate line lengths

MH Frontier has fixed-width UI elements. Translations that exceed the original Japanese display width may overflow in-game text boxes.

# Measure limits from original JP binaries and store in headers.json
python main.py --measure-line-lengths

# Validate a translation file against stored limits
python main.py --validate-line-lengths output/dat-armors-head.csv

# Strict mode: abort on first violation (for CI pipelines)
python main.py --validate-line-lengths output/dat-armors-head.csv --strict-line-lengths

# Allow 10% expansion over JP max
python main.py --validate-line-lengths output/dat-armors-head.csv --max-expansion 1.1

Display width uses Unicode East Asian Width: CJK / fullwidth characters count as 2 cells, everything else as 1. Inline placeholders ({cNN}, {/c}, {j}, {K…}, {i…}, {u…}) are stripped before measurement. For grouped entries ({j}-separated), each sub-string is measured independently and the sub-string count is checked against the section maximum.

Validate files

Inspect the structure of a game file (encryption layer, compression layer, format):

python main.py --validate data/mhfdat.bin

Compare files

Compare strings between two files. Works with CSV files and binary files:

# Compare two CSV files
python main.py file_a.csv --diff file_b.csv

# Compare two binary files (requires --xpath, --ftxt, --quest, --npc, or --scenario)
python main.py data/mhfdat.bin --diff data/mhfdat_v2.bin --xpath=dat/armors/head

Merge translations

Carry over translations from an old translated file into a freshly extracted file. Translations are matched by source string — if the source is unchanged, the translation is preserved:

# Merge CSV files (output written to third argument, or auto-named)
python main.py old_translated.csv --merge new_extracted.csv
python main.py old_translated.csv --merge new_extracted.csv output/merged.csv

# Also works with JSON files
python main.py old_translated.json --merge new_extracted.json

Compatibility with ReFrontier

You can also convert any translation CSV to ReFrontier

python main.py --refrontier-to-csv

See headers.json for all available sections, or run python main.py --extract-all to extract everything at once.

JPK Compression

FrontierTextHandler includes built-in support for JPK/JKR compression, the format used by Monster Hunter Frontier for compressed game files.

Note: Game files (.bin) have two layers: ECD encryption (outer) and JKR compression (inner). This tool handles both layers automatically.

Automatic Decompression

JPK files are automatically detected and decompressed when reading game data. No additional steps needed.

Python API

You can also use the compression functions directly in Python:

from src import compress_jkr_hfi, decompress_jkr, is_jkr_file

# Check if a file is JPK compressed
with open("file.bin", "rb") as f:
    data = f.read()
    if is_jkr_file(data):
        decompressed = decompress_jkr(data)

# Compress data (HFI = Huffman + LZ77, most common)
compressed = compress_jkr_hfi(original_data)

# Decompress
original = decompress_jkr(compressed)

Supported compression types:

  • RW (0): Raw, no compression
  • HFIRW (2): Huffman encoding only
  • LZ (3): LZ77 compression only
  • HFI (4): Huffman + LZ77 (most common, best compression)

Running Tests

python -m unittest discover -s tests -v

ECD/EXF Encryption

FrontierTextHandler includes built-in support for ECD and EXF encryption, the formats used by Monster Hunter Frontier for encrypted game files.

Automatic Decryption

Encrypted files are automatically detected and decrypted when reading game data. No additional steps needed.

Python API

You can also use the encryption functions directly in Python:

from src import decrypt, encrypt, is_encrypted_file

# Check if a file is encrypted and decrypt
with open("file.bin", "rb") as f:
    data = f.read()
    if is_encrypted_file(data):
        decrypted, header = decrypt(data)

# Encrypt data (uses default key index 4)
encrypted = encrypt(data)

# Re-encrypt preserving original format
encrypted = encrypt(data, meta=original_header)

Supported encryption formats:

  • ECD (0x1A646365): Primary format, LCG-based with nibble Feistel cipher
  • EXF (0x1A667865): Alternative format, 16-byte XOR key with position-dependent transform

All known MHF files use key index 4 (the default). Use --key-index to specify a different key (0–5).

Game version support

Different game versions (Season 6, Forward.5, ZZ, etc.) have different numbers of items, armors, weapons, and other entries. The tool defaults to ZZ but supports other versions through --game-version:

# Extract using ZZ array sizes (default)
python main.py --extract-all

# Extract from a Korean-version binary
python main.py --extract-all --game-version ko

# Import with explicit version
python main.py --csv-to-bin output/dat-armors-head.csv data/mhfdat.bin --game-version ko

The entry counts for each version are stored in headers.json. When only one version is known, the count is a plain integer; when multiple versions are documented, it becomes a map:

"entry_count": {"zz": 14594, "ko": 1290}

Configuration: headers.json

The headers.json file defines where text data is located within each binary file. Understanding this format allows you to add support for new data sections.

Structure Overview

{
  "file_type": {
    "category": {
      "subcategory": {
        "begin_pointer": "0x64",
        "entry_count": 14594
      }
    }
  }
}

Pointer Table Format

Monster Hunter Frontier stores text as pointer tables — arrays of 4-byte offsets that point to null-terminated Shift-JIS strings elsewhere in the file.

Binary file layout:
┌─────────────────────────────────────────────────────────┐
│ ... file header and other data ...                      │
├─────────────────────────────────────────────────────────┤
│ Pointer Table (at begin_pointer offset):                │
│   [0x1000] [0x1008] [0x1010] [0x1018] ...              │
│   (each entry is a 4-byte little-endian offset)        │
├─────────────────────────────────────────────────────────┤
│ String Data (pointed to by the table):                  │
│   0x1000: "Leather Helm\0"                              │
│   0x1008: "Iron Helm\0"                                 │
│   0x1010: "Steel Helm\0"                                │
│   ...                                                   │
└─────────────────────────────────────────────────────────┘

Field Definitions

Field Type Description
begin_pointer Hex string Offset to a pointer that points to the start of the pointer table
entry_count Integer or map Number of entries in the pointer table. Plain integer for a single version, or {"zz": N, "ko": M} for multi-version
pointers_per_entry Integer Number of consecutive pointer slots per logical entry (default: 1). Used for sections like weapon descriptions with 4 sub-pointers per weapon
null_terminated Boolean If true, scan forward until a null pointer instead of using entry_count for the length
entry_size Integer Byte size of each struct entry (for struct-strided sections where strings are embedded in fixed-size records)
field_offset Integer Byte offset of the string pointer within each struct entry

Important: begin_pointer is a pointer to a pointer. The value at begin_pointer contains the actual address of the pointer table start.

Adding New Sections

To add support for a new text section:

  1. Find the pointer table using a hex editor or ImHex with MHF patterns

  2. Identify the start and count:

    • Find where the file stores the table's start address (begin_pointer)
    • Count the number of entries in the pointer table
  3. Add the entry to headers.json:

    "monsters": {
      "names": {
        "begin_pointer": "0x200",
        "entry_count": 142
      }
    }
  4. Test extraction:

    python main.py --xpath=dat/monsters/names -v
  5. Verify output: Check that strings are decoded correctly and no garbage data appears

Example: Armor Section

The armor head names section in mhfdat.bin:

"armors": {
  "head": {
    "begin_pointer": "0x64",
    "entry_count": 14594
  }
}

This means:

  • Read the 4-byte value at offset 0x64 → this gives the pointer table start
  • Read 14594 × 4 bytes of pointer entries
  • Each 4-byte entry is a pointer to a null-terminated armor name

Multiline Strings

Some sections (like weapon descriptions) use null pointer separators (0x00000000) to indicate line breaks within a single logical entry. The tool joins these into a single CSV/JSON row separated by {j} markers — for example, "Hunter Basics{j}Deliver 2 Raw Meat{j}None{j}…". The importer re-derives the per-sub pointer offsets from the live pointer table at import time, so translations can freely rearrange or re-word the sub-strings as long as the number of {j}-separated parts stays the same. Pre-1.6.0 CSVs used the offset-bearing <join at="N"> tag form and remain accepted by the importer for backward compatibility.

Troubleshooting

Common Errors

FileNotFoundError: 'data/mhfdat.bin' does not exist

The input file was not found. Make sure to:

  1. Create a data/ folder in the project directory
  2. Place your decrypted game files (mhfdat.bin, mhfpac.bin, etc.) in it
  3. Verify the file path matches your command

InterruptedError: file.csv has less than one line!

The CSV file is empty or has no data rows. Ensure your CSV file has:

  1. A header row — index,source,target (the 1.6.0 default) or location,source,target (legacy, opt-in via --legacy-offset)
  2. At least one data row

EncodingError: Failed to encode string to Shift-JIS

Your translation contains characters not supported by the game's encoding (Shift-JIS). Common causes:

  • Emoji characters
  • Special Unicode symbols
  • Characters from non-Japanese/ASCII scripts

Solution: Replace unsupported characters with ASCII or Japanese equivalents.

CSVParseError: Invalid location format

The CSV location column is malformed. Expected format: 0x1234@filename.bin

Check that:

  1. The hex offset starts with 0x
  2. There's an @ separator between offset and filename
  3. No extra spaces or characters

ValueError: Cannot find any readable data in 'file.bin' with xpath 'path'

The xpath doesn't match the file type or the file structure is different. Solutions:

  1. Verify you're using the correct xpath for your file (e.g., dat/ for mhfdat.bin, pac/ for mhfpac.bin)
  2. Check available xpaths in headers.json
  3. Ensure the file is a valid game file (ECD-encrypted files are auto-decrypted)

JKRError: Invalid JKR magic bytes or JKRError: Data too short

The file is not a valid JPK/JKR compressed file or is corrupted:

  1. Verify the file is actually JPK-compressed (not all game files are)
  2. Re-extract the file from the game data
  3. Check if the file was partially downloaded or truncated

InvalidPointerError: Pointer offset 0x... is outside file bounds

A pointer in the file points to an invalid location. This usually means:

  1. The file is corrupted or truncated
  2. The wrong xpath is being used for this file type
  3. The headers.json configuration has incorrect offsets for this game version

Try using -v to see which pointer is causing the issue.

Debug Mode

Use the -v or --verbose flag to see detailed debug output:

python main.py -v data/mhfdat.bin

This shows:

  • Number of translations found
  • Pointer assignments during import
  • File creation messages

Credits

This software was realized with the support of @ezemania2 from the MezeLounge Discord community as well as the Mogapédia, the French Monster Hunter wiki.

See also