Skip to content

Reuse pairs in tbl_trans_map with multiple languages #278

@scossu

Description

@scossu

Currently, every transliteration table in Scriptshifter generates a database entry for each of the token pairs in the S2R and R2S sections, for the current table and all of its parents. This means that each table that inherits from one or more other tables creates duplicate entries for all its parents.

Until we had a few Cyrillic languages using table inheritance on a few hundred entries, this was not a problem. But now we have tens of nearly identical Cyrillic tables, and what's more concerning, several Indic languages are using the Devanagari base table which is over 8K lines. This creates an unnecessarily large and slow database.

This ticket is to restructure the DB tables so that each token pair is no longer bound to a single language and script, but rather uses a many-to-many relationship via a new join table. This would greatly reduce the number of entries and, possibly improve performance as the number of supported languages and scripts grows.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions