tace16-utf8-converter

Tamil Unicode (From Wikipedia)

Tamil is a Unicode block containing characters for the Tamil, Badaga, and Saurashtra languages of Tamil Nadu India, Sri Lanka, Singapore, and Malaysia. In its original incarnation, the code points U+0B02..U+0BCD were a direct copy of the Tamil characters A2-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

TACE16 (From Wikipedia)

Tamil All Character Encoding (TACE16) is a 16-bit Unicode-based character encoding scheme for Tamil language.

TACE16 is better suited for the tamil grammar and slightly differs from UTF-16 encoding. I couldn't find any codec helpers for TACE16 so I had planned to write one a while now. Luckily I was experimenting with tamil name generation with neural networks last week and these projects acted like a symboiotic motivation for me to complete this one. This is still work in progress and not yet include converters for other unicode encodings like UTF-32 and UTF-16. I chose to do UTF-8 first because I have the dataset in that form.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
collocation_freq.py		collocation_freq.py
data_new.csv		data_new.csv
tace16.py		tace16.py
uyir-mei.csv		uyir-mei.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tace16-utf8-converter

Tamil Unicode (From Wikipedia)

TACE16 (From Wikipedia)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tace16-utf8-converter

Tamil Unicode (From Wikipedia)

TACE16 (From Wikipedia)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages