Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
78f8ef0
Added HTML output support
May 24, 2016
a1837cd
Clear variable every time .to_html is called
May 24, 2016
a32f3c0
Added my name
May 24, 2016
7a8fe34
Added examples
May 24, 2016
05fbd79
Fixed 404 on image
May 24, 2016
d82027a
Update api.rst
May 24, 2016
e58208d
Added tests for dragonmapper.html
May 24, 2016
cb9c23a
Tweeked style
May 24, 2016
6f8144a
Added 'html' as a tag
May 24, 2016
795ed9c
Fixed formatting errors
May 24, 2016
c34c05a
Merge branch 'develop' of https://github.com/TTWNO/dragonmapper into …
May 24, 2016
a3c1616
Added description of image
May 24, 2016
3a80b8e
Tweaked contribution name
May 24, 2016
1b69123
Added bullet point
May 24, 2016
07895ec
Added file containing correct Unit testing strings
May 24, 2016
2d94722
More compliant with PEP8
May 24, 2016
59011f4
Added link to HTML test data file
May 24, 2016
b35122b
Fixed v3 compatibility issue
May 24, 2016
af20557
Removed debugging print statements from Unit Testing
May 24, 2016
025389b
Added internal functions for striping grammar from phonetic strings
May 25, 2016
bffcb98
Changed var name to reflect use
May 25, 2016
f3c2f1a
Changed var name to reflect use, again
May 25, 2016
3c8707c
Changed var name to reflect use, again
May 25, 2016
344ce5b
Make _stackify() XML compliant
May 25, 2016
339e527
Added special style for punctuation marks
May 26, 2016
2a05e10
Now can pass hanzi.to_zhuyin() directly into .to_html()
May 26, 2016
5ae6718
CHanged to reflact changes
May 26, 2016
a123bdd
Add methods to deal with punctuation
May 26, 2016
c640d20
Removed .replace(' ', ' ')
May 26, 2016
01fc2c2
Added keep_puct argument. Allowing for pronunciation to be preserved
May 26, 2016
5ab3663
Added style for tone-mark
May 26, 2016
bbefcc7
split_punct() added. Adds spacing so HTML formatting will work properly
May 27, 2016
8fd7ae9
Moved split_punct() to html file, as it is a special function
May 27, 2016
ab6e93f
Updated tests to include internal functions
May 27, 2016
ffe8e24
<br> -> <br /> (according to XHML standard)
May 27, 2016
4454095
Now can pass hanzi.to_zhuyin(s) directly into .to_html()
May 27, 2016
31a8372
Fixed too-long line
May 27, 2016
29b9311
More compliant with PEP8
May 27, 2016
bac424e
Fixed false-nagative tests
May 27, 2016
f44ed68
Added ? to punctuation list
May 27, 2016
3a83f10
Added simple remove_grammar() function
May 27, 2016
a1bb926
Added test for remove_punct()
May 27, 2016
7381c63
Changed zh_s to s: usage
May 27, 2016
b8e6072
More PEP8 compliant
May 27, 2016
d3b9ba6
Fixed incorrect variable name in ._split_punct()
May 27, 2016
da64207
Fixed test for trans.remove_punct()
May 27, 2016
eb3ae95
Updated README with new features
May 27, 2016
a1d9ef4
Update CHANGES.rst
May 28, 2016
e572d9e
Forced Pinyin compatible font
Jun 2, 2016
790b3c1
Merge branch 'develop' of https://github.com/TTWNO/dragonmapper into …
Jun 2, 2016
69e7d59
Removed _split_phons(): Obsolete
Jun 3, 2016
eee9ade
Made _split_punct() output cleaner. Updated tests to reflect so.
Jun 3, 2016
d2ec1a4
PEP8 Compliency
Jun 3, 2016
4bd47e4
Commented api.rst for testing
Jun 3, 2016
dd54c8e
Removed uncesseary lines
Jun 3, 2016
58f1f7e
Updated zhon version
Jun 3, 2016
4d8f9f5
Update README.rst
Jun 3, 2016
5a491f9
Update README.rst
Jun 3, 2016
c34e19c
Update README.rst
Jun 3, 2016
0da9824
Update README.rst
Jun 3, 2016
d3f0649
Make _stackify() simpler + removed unneeded <br />
Jun 10, 2016
334c392
<tobdy> -> <tbody>
Jun 10, 2016
bba2667
<tobdy> -> <tbody> in test cases
Jun 10, 2016
5234bce
Cleaned up enumerate() lines
Jun 10, 2016
6fc461c
More exaustive list of punctuation
Jun 10, 2016
b7fbb65
Merge branch 'develop' of https://github.com/TTWNO/dragonmapper into …
Jun 10, 2016
55bed18
removed uneeded [1,2,3,4,5]
Jun 10, 2016
71c4c3a
Removed split_punct argument. Added tests
Jun 10, 2016
e9f1845
Removed long sections of code, replaced with functions
Jun 13, 2016
714f45c
Add test for new functions
Jun 13, 2016
98c4639
Changes variable name for clarity
Jun 13, 2016
a5c4fcd
Removed unneeded variables
Jun 13, 2016
225af7e
Added proper attribution
Jun 14, 2016
3c03a6d
moved attribution from here
Jun 14, 2016
9d57d92
Added class='xyz character'
Jun 20, 2016
9e93c67
PEP8
Jun 20, 2016
dbd2df2
Added CSS classes
Jul 12, 2016
c7af733
Updated tests
Jul 12, 2016
61ef624
PEP8
Jul 12, 2016
2c7787a
Added more simple ruby function
Jul 30, 2016
e39bf68
Fixed some short sound like yo and o causeing errors. Added tests
Jul 30, 2016
3d0a4b0
Merge branch 'test2' into test1
Jul 30, 2016
6278c8a
Changed to more common pronunciation 誰(shui)->shei
Aug 3, 2016
165f2f0
Overhauled.. Using <ruby> insead of <table>
Aug 3, 2016
1cfa75c
Added tests
Aug 3, 2016
f49a8ed
PEP8
Aug 3, 2016
3d3d89d
Updated README with more accurate description of program
Aug 3, 2016
7fdc46b
Add slightly better to read code
Nov 30, 2016
87a146b
Fix flake8 tests
Nov 30, 2016
03028f4
Fix python2.x errors
Nov 30, 2016
b0afd6f
Fix error with '嗲' causing ValueError
Dec 1, 2016
f459638
Make examples in docs more readable
Dec 1, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,12 @@ Author and Maintainer
Contributors
------------

None yet. Why not be the first?
* Tait Hoyem <https://github.com/TTWNO> — HTML Formatting

Why not be the second? :-)

Attribution
------------

* Sun Jianai — FZKai-Extended font [used in pictures]
* Google — [Source Sans Pro, Normal 400](https://www.google.com/fonts#QuickUsePlace:quickUse/Family:Source+Sans+Pro) [used for Pinyin font]
5 changes: 5 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@
Change Log
----------

0.3.0 (2016-05-27)
++++++++++++++++++

* Added HTML Formatting.

0.2.6 (2016-05-23)
++++++++++++++++++

Expand Down
35 changes: 28 additions & 7 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Dragon Mapper

.. image:: https://badge.fury.io/py/dragonmapper.png
:target: http://badge.fury.io/py/dragonmapper

.. image:: https://travis-ci.org/tsroten/dragonmapper.png?branch=develop
:target: https://travis-ci.org/tsroten/dragonmapper

Expand All @@ -22,27 +22,48 @@ Features
Phonetic Alphabet.
* Identify a string as Traditional or Simplified Chinese, Pinyin, Zhuyin, or
the International Phonetic Alphabet.
* Output HTML of characters with Pinyin attached to them.

.. code:: python

>>> from dragonmapper import hanzi
>>> s = '我是一个美国人。'
>>> dragonmapper.hanzi.is_simplified(s)
>>> hanzi.is_simplified(s)
True
>>> dragonmapper.hanzi.to_pinyin(s)
>>> hanzi.to_pinyin(s)
'wǒshìyīgèměiguórén。'
>>> dragonmapper.hanzi.to_pinyin(s, all_readings=True)
>>> hanzi.to_pinyin(s, all_readings=True)
'[wǒ][shì/shi/tí][yī][gè/ge/gě/gàn][měi][guó][rén/ren]。'

.. code:: python

>>> from dragonmapper import transcriptions as trans
>>> s = 'Wǒ shì yīgè měiguórén.'
>>> dragonmapper.transcriptions.is_pinyin(s)
>>> trans.is_pinyin(s)
True
>>> dragonmapper.transcriptions.pinyin_to_zhuyin(s)
>>> trans.pinyin_to_zhuyin(s)
'ㄨㄛˇ ㄕˋ ㄧ ㄍㄜˋ ㄇㄟˇ ㄍㄨㄛˊ ㄖㄣˊ.'
>>> dragonmapper.transcriptions.pinyin_to_ipa(s)
>>> trans.pinyin_to_ipa(s)
'wɔ˧˩˧ ʂɨ˥˩ i˥ kɤ˥˩ meɪ˧˩˧ kwɔ˧˥ ʐən˧˥.'

.. code:: python

>>> from dragonmapper import transcriptions as trans
>>> form dragonmapper import hanzi
>>> from dragonmapper import html
>>> s = "我是加拿大人"
>>> zh = hanzi.to_zhuyin(s)
>>> pi = trans.zhuyin_to_pinyin(zh).split(' ')
>>> pi
['wǒ', 'shì', 'jiā', 'ná', 'dà', 'rén']
>>> h = html.to_html(s, top=pi)
>>> print(h)

* The intermediate switch to Zhuyin, is because of spacing. You can space out the characters instead.
* Note: only top is aviable right now, as browsers do not currently support having it elsewhere.
.. image:: https://s25.postimg.org/4s44wylcv/Screenshot_from_2016_08_03_15_59_03.png
:target: https://postimg.org/image/o9yscwiaj/

Getting Started
---------------
* `Install Dragon Mapper <http://dragonmapper.readthedocs.org/en/latest/installation.html>`_
Expand Down
6 changes: 6 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -234,3 +234,9 @@ lines of code.
.. autofunction:: to_zhuyin

.. autofunction:: to_ipa

HTML Conversion:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Creates some HTML from the characters, and transcription systems you have.

#.. autofunction:: to_html
14 changes: 8 additions & 6 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,24 @@ functions for Chinese text processing:

.. code:: python

>>> from dragonmapper import hanzi
>>> s = '我是一个美国人。'
>>> dragonmapper.hanzi.is_simplified(s)
>>> hanzi.is_simplified(s)
True
>>> dragonmapper.hanzi.to_pinyin(s)
>>> hanzi.to_pinyin(s)
'wǒshìyīgèměiguórén。'
>>> dragonmapper.hanzi.to_pinyin(s, all_readings=True)
>>> hanzi.to_pinyin(s, all_readings=True)
'[wǒ][shì/shi/tí][yī][gè/ge/gě/gàn][měi][guó][rén/ren]。'

.. code:: python

>>> from dragonmapper import transcriptions as trans
>>> s = 'Wǒ shì yīgè měiguórén.'
>>> dragonmapper.transcriptions.is_pinyin(s)
>>> trans.is_pinyin(s)
True
>>> dragonmapper.transcriptions.pinyin_to_zhuyin(s)
>>> trans.pinyin_to_zhuyin(s)
'ㄨㄛˇ ㄕˋ ㄧ ㄍㄜˋ ㄇㄟˇ ㄍㄨㄛˊ ㄖㄣˊ.'
>>> dragonmapper.transcriptions.pinyin_to_ipa(s)
>>> trans.pinyin_to_ipa(s)
'wɔ˧˩˧ ʂɨ˥˩ i˥ kɤ˥˩ meɪ˧˩˧ kwɔ˧˥ ʐən˧˥.'

If this is your first time using Dragon Mapper, check out the :doc:`installation`.
Expand Down
40 changes: 40 additions & 0 deletions dragonmapper/data/default-style.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/**
Default CSS style for dragonmapper
**/

@import url(https://fonts.googleapis.com/css?family=Source+Sans+Pro);

.hanzi{
font-size: 2em;
line-height: 1em;
text-align: center;
vertical-align: middle;
}
.punct{
font-size: 1.5em;
line-height: 1em;
text-align: center;
vertical-align: middle;
}
.zhuyin{
font-size: 0.6em;
line-height: 1em;
text-align: center;
vertical-align: middle;
}
.pinyin{
font-size: 1em;
line-height: 1em;
/** Some fonts have exess space on accented pinyin character,
setting the font fixes this problem. **/
font-family: 'Source Sans Pro', sans-serif;
text-align: center;
vertical-align: center;
}
.tone-mark{
font-size: 1em;
text-align: center;
}
.unknown{
visibility: collapse;
}
4 changes: 2 additions & 2 deletions dragonmapper/data/hanzi_pinyin_characters.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -24969,8 +24969,8 @@
䳠 shuì/zhù
脽 shuí
𧀣 shuí
shuí
shuí/shéi
shéi
誰 shéi/shuí
鎙 shuò
硕 shuò
𠲿 shuò
Expand Down
3 changes: 3 additions & 0 deletions dragonmapper/data/transcriptions.csv
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ dei,ㄉㄟ,teɪ
den,ㄉㄣ,tən
deng,ㄉㄥ,tɤŋ
di,ㄉㄧ,ti
dia,ㄉㄧㄚ,tjɑ
dian,ㄉㄧㄢ,tjɛn
diang,ㄉㄧㄤ,tjɑŋ
diao,ㄉㄧㄠ,tjɑʊ
Expand Down Expand Up @@ -234,6 +235,7 @@ nun,ㄋㄨㄣ,nwən
nuo,ㄋㄨㄛ,nwɔ
nü,ㄋㄩ,ny
nüe,ㄋㄩㄝ,nɥœ
o,ㄛ,wɔ
ou,ㄡ,oʊ
pa,ㄆㄚ,pʰa
pai,ㄆㄞ,pʰaɪ
Expand Down Expand Up @@ -368,6 +370,7 @@ ye,ㄧㄝ,jɛ
yi,ㄧ,i
yin,ㄧㄣ,in
ying,ㄧㄥ,iŋ
yo,ㄧㄛ,jʊ
yong,ㄩㄥ,yʊŋ
you,ㄧㄡ,yoʊ
yu,ㄩ,y
Expand Down
51 changes: 43 additions & 8 deletions dragonmapper/hanzi.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@


def _load_data():
"""Load the word and character mapping data into a dictionary.
r"""
Load the word and character mapping data into a dictionary.

In the data files, each line is formatted like this:
HANZI PINYIN_READING/PINYIN_READING
Expand All @@ -58,8 +59,9 @@ def _load_data():
_WORDS = _HANZI_PINYIN_MAP['words']


def _hanzi_to_pinyin(hanzi):
"""Return the Pinyin reading for a Chinese word.
def _hanzi_to_pinyin(hanzi, DICT=None):
"""
Return the Pinyin reading for a Chinese word.

If the given string *hanzi* matches a CC-CEDICT word, the return value is
formatted like this: [WORD_READING1, WORD_READING2, ...]
Expand All @@ -71,10 +73,18 @@ def _hanzi_to_pinyin(hanzi):
original character is returned, e.g. [[CHAR_READING1, ...], CHAR, ...]

"""
if DICT is None:
DICT = _HANZI_PINYIN_MAP

try:
return _HANZI_PINYIN_MAP['words'][hanzi]
return DICT['words'][hanzi]
except KeyError:
return [_CHARACTERS.get(character, character) for character in hanzi]
return [
DICT['characters'].get(
character,
character)
for character in hanzi
]


def _enclose_readings(container, readings):
Expand All @@ -88,7 +98,8 @@ def _enclose_readings(container, readings):

def to_pinyin(s, delimiter=' ', all_readings=False, container='[]',
accented=True):
"""Convert a string's Chinese characters to Pinyin readings.
"""
Convert a string's Chinese characters to Pinyin readings.

*s* is a string containing Chinese characters. *accented* is a
boolean value indicating whether to return accented or numbered Pinyin
Expand Down Expand Up @@ -169,7 +180,8 @@ def to_pinyin(s, delimiter=' ', all_readings=False, container='[]',


def to_zhuyin(s, delimiter=' ', all_readings=False, container='[]'):
"""Convert a string's Chinese characters to Zhuyin readings.
"""
Convert a string's Chinese characters to Zhuyin readings.

*s* is a string containing Chinese characters.

Expand All @@ -192,7 +204,8 @@ def to_zhuyin(s, delimiter=' ', all_readings=False, container='[]'):


def to_ipa(s, delimiter=' ', all_readings=False, container='[]'):
"""Convert a string's Chinese characters to IPA.
"""
Convert a string's Chinese characters to IPA.

*s* is a string containing Chinese characters.

Expand All @@ -212,3 +225,25 @@ def to_ipa(s, delimiter=' ', all_readings=False, container='[]'):
numbered_pinyin = to_pinyin(s, delimiter, all_readings, container, False)
ipa = pinyin_to_ipa(numbered_pinyin)
return ipa


def to_jyutping(s, delimiter=' ', all_readings=False, container='[]'):
"""
Convert a string's Chinese characters to Jyutping.

*s* is a string containing Chinese characters.

*delimiter* is the character used to indicate word boundaries in *s*.
This is used to differentiate between words and characters so that a more
accurate reading can be returned.

*all_readings* is a boolean value indicating whether or not to return all
possible readings in the case of words/characters that have multiple
readings. *container* is a two character string that is used to
enclose words/characters if *all_readings* is ``True``. The default
``'[]'`` is used like this: ``'[READING1/READING2]'``.

Characters not recognized as Chinese are left untouched.

"""
pass
Loading