Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
d0a0db7
Ordinals class done
shreeshd-tn Oct 9, 2025
2fceb3d
Merge pull request #1 from shreeshd-tn/ordinals
shreeshd-tn Oct 9, 2025
36866b2
Future Implementations for classes - Measure, Money, and Date (#258)
ngachchi Apr 22, 2025
da87d4c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 22, 2025
1a2c48e
Potential fix for code scanning alert no. 821: Unused local variable
mgrafu Apr 23, 2025
513db90
Hindi TN Future Implementations 2.0. - Fraction, Measure and Time (#310)
ngachchi Aug 27, 2025
0d3f383
Hindi TN 2.0 - Telephone class integration from staging branch (#320)
shreeshd-tn Sep 9, 2025
d8009c8
template files for new address class
shreeshd-tn Sep 23, 2025
4d2c52d
Basic address structure
shreeshd-tn Sep 24, 2025
d5f2726
basic functionality
shreeshd-tn Sep 26, 2025
dd85e25
slight improvement - 4 to 19 percent
shreeshd-tn Sep 26, 2025
6b0543e
Suffix based approach
shreeshd-tn Sep 26, 2025
f2bb68a
Address is working at the cost of cardinal
shreeshd-tn Sep 26, 2025
e0c0a61
telephone_copy
shreeshd-tn Sep 30, 2025
d74c975
Good checkpoint, but sparrowhawk is failing
shreeshd-tn Sep 30, 2025
9b48ae9
Context + Complex numbers matching
shreeshd-tn Sep 30, 2025
a081e85
Context based and improved format matching
shreeshd-tn Sep 30, 2025
39b0368
formatting changes and new context words
shreeshd-tn Oct 8, 2025
157927c
Fully tested ordinals
shreeshd-tn Oct 13, 2025
10f24c0
address is good, but telephone was affected
shreeshd-tn Oct 14, 2025
7adf096
simplified context + weight changes
shreeshd-tn Oct 15, 2025
b484d14
Working greedy address + regressions
shreeshd-tn Oct 28, 2025
fcebf16
Merge staging Hindi TN v2 to main (#346)
mgrafu Oct 31, 2025
62d60c1
Merged with latest main
shreeshd-tn Nov 3, 2025
30a6123
Context + window
shreeshd-tn Nov 4, 2025
2d2a70e
Messy working code
shreeshd-tn Nov 11, 2025
844212d
Cleaned code ready for PR
shreeshd-tn Nov 12, 2025
54f23d1
Merge branch 'address_context' into staging_hi_tn
shreeshd-tn Nov 12, 2025
a090507
Minor fixes
shreeshd-tn Nov 12, 2025
828b8ef
Slight changes
shreeshd-tn Nov 12, 2025
8e0bec7
Slight fixes
shreeshd-tn Nov 12, 2025
ab3a31b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2025
c05ac5c
Jenkins date change
shreeshd-tn Nov 12, 2025
e295a78
Missed in merge
shreeshd-tn Nov 13, 2025
46e7908
Sparrowhawk punctuation fix
shreeshd-tn Nov 17, 2025
de26996
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 18, 2025
c23bf3a
Review changes + sparrowhawk fix
shreeshd-tn Nov 20, 2025
dd01128
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 20, 2025
bec1466
Removed unecessary changes
shreeshd-tn Nov 21, 2025
53315e3
Reduced context to improve compilation time
shreeshd-tn Dec 8, 2025
c29bb09
Combined approaches
shreeshd-tn Dec 9, 2025
6c4d2ce
Review changes
shreeshd-tn Dec 9, 2025
379d994
Almost done
shreeshd-tn Dec 12, 2025
6cf74e8
Ready to merge
shreeshd-tn Dec 12, 2025
ec57c44
Merge branch 'combined' into staging_hi_tn
shreeshd-tn Dec 12, 2025
6f18486
Corrected data files
shreeshd-tn Dec 12, 2025
567c0e7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ pipeline {
HY_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/03-12-24-0'
MR_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/03-12-24-1'
JA_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/10-17-24-1'
HI_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/10-31-25-0'
HI_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/12-09-25-0'
DEFAULT_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/06-08-23-0'
}
stages {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
36 changes: 36 additions & 0 deletions nemo_text_processing/text_normalization/hi/data/address/cities.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
अमरावती
ईटानगर
दिसपुर
पटना
रायपुर
पणजी
गांधीनगर
चंडीगढ़
शिमला
रांची
बेंगलुरु
तिरुवनंतपुरम
भोपाल
मुंबई
इम्फाल
शिलांग
आइजोल
कोहिमा
भुवनेश्वर
जयपुर
गंगटोक
चेन्नई
हैदराबाद
अगरतला
लखनऊ
देहरादून
कोलकाता
पोर्ट ब्लेयर
दमन
नई दिल्ली
श्रीनगर
जम्मू
लेह
कारगिल
कवरत्ती
पुडुचेरी
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
हाउस
प्लॉट
बूथ
अपार्टमेंट
फ्लैट
यूनिट
टावर
कॉम्प्लेक्स
मंजिल
फ्लोर
ब्लॉक
सेक्टर
फेज
रोड
सड़क
मार्ग
स्ट्रीट
गली
राजमार्ग
ड्राइव
डिस्ट्रिक्ट
बाईपास
हाइवे
पार्कवे
कॉलोनी
नगर
पार्क
एस्टेट
क्षेत्र
बोलवार्ड
मार्केट
सेंटर
पिन
गांव
पास
ब्रिगेड
नियर
स्क्वेर
मॉल
टॉवर
इंस्टीट्यूट
पिलर
मेट्रो
एवेन्यू
वेस्ट
सामने
पीछे
वीया
आर डी
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
street स्ट्रीट
southern सदर्न
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
A ए
B बी
C सी
D डी
E ई
F एफ
G जी
H एच
I आई
J जे
K के
L एल
M एम
N एन
O ओ
P पी
Q क्यू
R आर
S एस
T टी
U यू
V वी
W डब्ल्यू
X एक्स
Y वाई
Z ज़ेड
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
- हाइफ़न
/ बटा
36 changes: 36 additions & 0 deletions nemo_text_processing/text_normalization/hi/data/address/states.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
आंध्र प्रदेश
अरुणाचल प्रदेश
असम
बिहार
छत्तीसगढ़
गोवा
गुजरात
हरियाणा
हिमाचल प्रदेश
झारखंड
कर्नाटक
केरल
मध्य प्रदेश
महाराष्ट्र
मणिपुर
मेघालय
मिज़ोरम
नागालैंड
ओडिशा
पंजाब
राजस्थान
सिक्किम
तमिलनाडु
तेलंगाना
त्रिपुरा
उत्तर प्रदेश
उत्तराखंड
पश्चिम बंगाल
अंडमान और निकोबार द्वीप समूह
चंडीगढ़
दादरा और नगर हवेली और दमन और दीव
दिल्ली
जम्मू और कश्मीर
लद्दाख
लक्षद्वीप
पुडुचेरी
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,3 @@ hp हॉर्सपॉवर
d दिन
month महीना
months महीने

Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
0 ०
1 १
2 २
3 ३
4 ४
5 ५
6 ६
7 ७
8 ८
9 ९
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,20 @@
३री तीसरी
४था चौथा
४थी चौथी
५वां पाँचवां
५वीं पाँचवीं
६ठा छठा
६ठी छठी
१st फ़र्स्ट
२nd सेकंड
३rd थर्ड
४th फ़ोर्थ
५th फ़िफ्थ
६th सिक्स्थ
७th सेवंथ
८th एटथ
९th नाइंथ
१०th टेंथ
११th इलेवंथ
१२th ट्वेल्फ्थ
१३th थर्टींथ
१४th फोर्टींथ
१५th फिफ्टींथ
7 changes: 7 additions & 0 deletions nemo_text_processing/text_normalization/hi/graph_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,13 @@
HI_SADHE = "साढ़े" # half more (X.5)
HI_PAUNE = "पौने" # quarter less (0.75)

# Hindi decimal representations
HI_POINT_FIVE = ".५" # .5
HI_ONE_POINT_FIVE = "१.५" # 1.5
HI_TWO_POINT_FIVE = "२.५" # 2.5
HI_DECIMAL_25 = ".२५" # .25
HI_DECIMAL_75 = ".७५" # .75

NEMO_LOWER = pynini.union(*string.ascii_lowercase).optimize()
NEMO_UPPER = pynini.union(*string.ascii_uppercase).optimize()
NEMO_ALPHA = pynini.union(NEMO_LOWER, NEMO_UPPER).optimize()
Expand Down
Loading