-
Notifications
You must be signed in to change notification settings - Fork 125
Hindi ITN: Telephone, Quarterly Measures, Fraction Exceptions, Changes to Date #301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
513fff5
535af69
d4e380f
60f8757
9aa85c0
bf6ebe3
28c2cd7
bcf6b28
0837161
c9bb5fd
05f6237
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
ई.पू. ईसा पूर्व | ||
ई. ईस्वी | ||
ई. ईसवी | ||
वर्ष पूर्व वर्ष पूर्व | ||
शताब्दी शताब्दी |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,231 @@ | ||
१ दो | ||
२ तीन | ||
३ चार | ||
४ पाँच | ||
४ पांच | ||
५ छः | ||
५ छह | ||
५ छे | ||
६ सात | ||
७ आठ | ||
८ नौ | ||
९ दस | ||
१० ग्यारह | ||
११ बारह | ||
१२ तेरह | ||
१३ चौदह | ||
१४ पन्द्रह | ||
१४ पंद्रह | ||
१५ सोलह | ||
१६ सत्रह | ||
१६ सतरह | ||
१७ अठारह | ||
१७ अट्ठारह | ||
१८ उन्नीस | ||
१८ उनीस | ||
१९ बीस | ||
२० इक्कीस | ||
२० इकीस | ||
२० ईकीस | ||
२१ बाईस | ||
२१ बाइस | ||
२२ तेईस | ||
२२ तेइस | ||
२३ चौबीस | ||
२४ पच्चीस | ||
२४ पचीस | ||
२५ छब्बीस | ||
२५ छबीस | ||
२६ सत्ताईस | ||
२६ सत्ताइस | ||
२६ सताईस | ||
२६ सताइस | ||
२७ अट्ठाईस | ||
२७ अट्ठाइस | ||
२७ अठाईस | ||
२७ अठाइस | ||
२८ उनतीस | ||
२८ उन्तीस | ||
२९ तीस | ||
३० इकतीस | ||
३० इकतिस | ||
३० इकत्तीस | ||
३० इकत्तिस | ||
३१ बत्तीस | ||
३१ बत्तिस | ||
३१ बतीस | ||
३१ बतिस | ||
३२ तैंतीस | ||
३२ तैंतिस | ||
३२ तैंत्तीस | ||
३२ तैंत्तिस | ||
३२ तेतीस | ||
३२ तेंतीस | ||
३३ चौंतीस | ||
३३ चौंतिस | ||
३३ चौंत्तीस | ||
३३ चौंत्तिस | ||
३४ पैंतीस | ||
३४ पैंतिस | ||
३४ पैंत्तीस | ||
३४ पैंत्तिस | ||
३५ छत्तीस | ||
३५ छत्तिस | ||
३५ छतीस | ||
३५ छतिस | ||
३६ सैंतीस | ||
३६ सैंतिस | ||
३६ सैंत्तीस | ||
३६ सैंत्तिस | ||
३७ अड़तीस | ||
३७ अड़तिस | ||
३७ अड़त्तीस | ||
३७ अड़त्तिस | ||
३८ उनतालीस | ||
३८ उनतालिस | ||
३८ उनत्तालीस | ||
३८ उनत्तालिस | ||
३८ उन्तालीस | ||
३८ उन्तालिस | ||
३९ चालीस | ||
४० इकतालीस | ||
४० इकतालिस | ||
४० इक्तालीस | ||
४१ बयालीस | ||
४१ बयालिस | ||
४१ ब्यालीस | ||
४२ तैंतालीस | ||
४२ तैंतालिस | ||
४३ चौवालीस | ||
४३ चौवालिस | ||
४३ चवालीस | ||
४३ चवालिस | ||
४३ चौंतालीस | ||
४४ पैंतालीस | ||
४४ पैंतालिस | ||
४५ छियालीस | ||
४५ छियालिस | ||
४५ छयालीस | ||
४६ सैंतालीस | ||
४६ सैंतालिस | ||
४६ सैतालिस | ||
४७ अड़तालीस | ||
४७ अड़तालिस | ||
४८ उनचास | ||
४९ पचास | ||
५० इक्यावन | ||
५० इकयावन | ||
५१ बावन | ||
५२ तिरपन | ||
५२ तिरेपन | ||
५३ चौवन | ||
५४ पचपन | ||
५५ छप्पन | ||
५५ छपन | ||
५६ सत्तावन | ||
५६ सतावन | ||
५७ अट्ठावन | ||
५७ अठावन | ||
५८ उनसठ | ||
५८ उनसठ | ||
५९ साठ | ||
६० इकसठ | ||
६१ बासठ | ||
६१ बासट | ||
६२ तिरसठ | ||
६२ तिरेसठ | ||
६३ चौंसठ | ||
६४ पैंसठ | ||
६५ छियासठ | ||
६५ छयासठ | ||
६६ सड़सठ | ||
६७ अड़सठ | ||
६८ उनहत्तर | ||
६८ उनहतर | ||
६९ सत्तर | ||
६९ सतर | ||
७० इकहत्तर | ||
७० इकहतर | ||
७० इक्हत्तर | ||
७० इकत्तर | ||
७१ बहत्तर | ||
७१ बहतर | ||
७२ तिहत्तर | ||
७२ तिहतर | ||
७३ चौहत्तर | ||
७३ चौहतर | ||
७४ पचहत्तर | ||
७४ पचहतर | ||
७४ पिछत्तर | ||
७४ पिछतर | ||
७५ छिहत्तर | ||
७५ छिहतर | ||
७५ छियत्तर | ||
७६ सतहत्तर | ||
७६ सतहतर | ||
७६ सतत्तर | ||
७७ अठहत्तर | ||
७७ अठहतर | ||
७८ उन्यासी | ||
७८ उन्यासि | ||
७८ उनासी | ||
७८ उनासि | ||
७९ अस्सी | ||
७९ अस्सि | ||
८० इक्यासी | ||
८० इक्यासि | ||
८१ बयासी | ||
८१ बयासि | ||
८१ ब्यासी | ||
८१ ब्यासि | ||
८१ बिरासी | ||
८२ तिरासी | ||
८२ तिरासि | ||
८२ तेरासी | ||
८३ चौरासी | ||
८३ चौरासि | ||
८४ पचासी | ||
८४ पचासि | ||
८४ पिचयासी | ||
८४ पिचयासि | ||
८४ पिचासी | ||
८५ छियासी | ||
८५ छियासि | ||
८६ सत्तासी | ||
८६ सत्तासि | ||
८६ सतासी | ||
८६ सतासि | ||
८७ अट्ठासी | ||
८७ अट्ठासि | ||
८७ अठासी | ||
८७ अठासि | ||
८८ नवासी | ||
८८ नवासि | ||
८९ नब्बे | ||
९० इक्यानबे | ||
९० इक्यानवे | ||
९१ बानबे | ||
९१ बानवे | ||
९२ तिरानबे | ||
९२ तिरानवे | ||
९३ चौरानबे | ||
९३ चौरानवे | ||
९४ पंचानबे | ||
९४ पंचानवे | ||
९४ पचानवे | ||
९४ पिचयानवे | ||
९४ पिचयानबे | ||
९४ पिच्यानवे | ||
९४ पिच्यानबे | ||
९५ छियानबे | ||
९५ छियानवे | ||
९६ सत्तानबे | ||
९६ सत्तानवे | ||
९७ अट्ठानबे | ||
९७ अट्ठानवे | ||
९७ अठानवे | ||
९७ अठानबे | ||
९८ निन्यान्बे | ||
९८ निन्यानबे | ||
९८ निन्यानवे | ||
९८ निन्यान्वे | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's also either leverage cardinal graph or optimize with rules There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't use the cardinal graph here because the number mapping is completely different for this particular TSV file. Also, numbers 0-99 have unique words in Hindi that cannot be represented by grammars. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are you saying that the "9" in 93 is a different word than the "9" in 94? what about the "4" in 34 vs the "4" in 74? |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
१७ सत्रह | ||
१७ सतरह | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's also optimize with rules There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Numbers 0-99 have unique words in Hindi that cannot be represented by grammars. |
||
१८ अठारह | ||
१८ अठाहर | ||
१८ अट्ठारह | ||
१९ उन्नीस | ||
१९ उनीस | ||
|
@@ -216,4 +217,4 @@ | |
९९ निन्यान्बे | ||
९९ निन्यानबे | ||
९९ निन्यानवे | ||
९९ निन्यान्वे | ||
९९ निन्यान्वे |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have the same term multiple times in this tsv? is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, are these mappings any different than cardinals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the same terms in the TSV file because of different spellings and character differences. I kept all the other versions on purpose because inverse text normalization allows many-to-one mapping. Having all the versions makes it work better and more accurately.
I added the numbers used for dates in a separate file because the date semiotic class only needs numbers from 1 to 31. For cardinal numbers, we already have two separate files: one for single digits and another called teens and ties for numbers from 10 to 99. So it was easier and cleaner to create a new TSV file just for dates instead of using the existing cardinal number files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there no way to optimize (1) with rules instead of one long tsv file?
let's use the cardinal graph and restrict inputs to 1-31 for (2), that will be cleaner and easier to maintain in the future
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.