The following string is Unicode, but it detects as Zawgyi. It contains a lot of U+FE00. If we add that code point to the model, it might make this text correctly detect as Unicode, even without a lot of training data.
ꩬ︀ံꩭုဝ︀်ꩬ︀ိပ︀်တ︀ိꩫ︀်ၸ︀ႝꩫ︀ိုဝ︀်ꩫ︀ိꩫ︀်မ︀ေ︀ပꩫ︀ႃ ။ ၸ︀ၞ်ꩭူၺꩫ︀်တ︀ႝꩡ︀ွ်မ︀ႃꩭေ︀ႃကꩭၞ်ꩫ︀ႝမ︀ွက︀်လ︀ွ်ꩡ︀ွ်
တ︀ႃ ။ ꩬ︀ိပ︀်တ︀ိꩫ︀်ꩬ︀ံꩭုဝ︀်ၸ︀ႝꩫ︀ိꩫ︀ၵ︀ံမ︀ေ︀ပꩫ︀ႃ ။ ၸ︀ၞ်ꩭူမ︀ႃꩭေ︀ႃၺꩫ︀်ၸ︀ြႃကꩭၞ်ꩫ︀ႝမ︀ွက︀်လ︀ွ် ꩡ︀ွ်တ︀ႃ ။ ꩬ︀ုတ︀်ယ︀ွ် ။
ဝ︀ွႃꩭင︀်ထ︀ႝꩫ︀ႃ ။
CC @sven-oly
The following string is Unicode, but it detects as Zawgyi. It contains a lot of U+FE00. If we add that code point to the model, it might make this text correctly detect as Unicode, even without a lot of training data.
ꩬ︀ံꩭုဝ︀်ꩬ︀ိပ︀်တ︀ိꩫ︀်ၸ︀ႝꩫ︀ိုဝ︀်ꩫ︀ိꩫ︀်မ︀ေ︀ပꩫ︀ႃ ။ ၸ︀ၞ်ꩭူၺꩫ︀်တ︀ႝꩡ︀ွ်မ︀ႃꩭေ︀ႃကꩭၞ်ꩫ︀ႝမ︀ွက︀်လ︀ွ်ꩡ︀ွ်
တ︀ႃ ။ ꩬ︀ိပ︀်တ︀ိꩫ︀်ꩬ︀ံꩭုဝ︀်ၸ︀ႝꩫ︀ိꩫ︀ၵ︀ံမ︀ေ︀ပꩫ︀ႃ ။ ၸ︀ၞ်ꩭူမ︀ႃꩭေ︀ႃၺꩫ︀်ၸ︀ြႃကꩭၞ်ꩫ︀ႝမ︀ွက︀်လ︀ွ် ꩡ︀ွ်တ︀ႃ ။ ꩬ︀ုတ︀်ယ︀ွ် ။
ဝ︀ွႃꩭင︀်ထ︀ႝꩫ︀ႃ ။
CC @sven-oly