Skip to content

Commit 40af628

Browse files
authored
Merge pull request #40 from zuazo-forks/gl-segmenter
Add Galician support to the Segmenter
2 parents 4880834 + 8f190f9 commit 40af628

File tree

7 files changed

+400
-6
lines changed

7 files changed

+400
-6
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
/build
2+
/commonvoice_utils.egg-info

MANIFEST.in

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ include cvutils/data/ckt/phon.tsv
2727
include cvutils/data/gl
2828
include cvutils/data/gl/alphabet.txt
2929
include cvutils/data/gl/validate.tsv
30+
include cvutils/data/gl/punct.tsv
31+
include cvutils/data/gl/abbr.tsv
3032
include cvutils/data/gl/phon.tsv
3133
include cvutils/data/gl/vocab.tsv
3234
include cvutils/data/rm-vallader

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,7 @@ A-hend-all e vez gounezet arc'hant dre chaseal ha pesketa.
247247
| Frisian | Frysk |`fry` | `fy-NL` |`fy`| ||||
248248
| Igbo | Ásụ̀sụ́ Ìgbò |`ibo` | `ig` |`ig`|||| |
249249
| Irish | Gaeilge |`gle` | `ga-IE` |`ga`| ||| |
250-
| Galician | Galego |`glg` | `gl` |`gl`|||| |
250+
| Galician | Galego |`glg` | `gl` |`gl`|||| |
251251
| Guaraní | Avañeʼẽ |`gug` | `gn` |`gn`|||| |
252252
| Hindi | हिन्दी |`hin` | `hi` | `hi` ||||
253253
| Hausa | Harshen Hausa |`hau` | `ha` |`ha` |||| |

cvutils/data/gl/abbr.tsv

Lines changed: 371 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,371 @@
1+
1 a.
2+
1 AA.
3+
1 ab.
4+
1 a.C.
5+
1 acad.
6+
1 acadca.
7+
1 acadco.
8+
1 acep.
9+
1 adm.
10+
1 admdor.
11+
1 admdora.
12+
1 admtva.
13+
1 admtvo.
14+
1 adv.
15+
1 adx.
16+
1 ag.
17+
1 agr.
18+
1 agrón.
19+
1 alc.
20+
1 alm.
21+
1 alt.
22+
1 a.m.
23+
1 ampl.
24+
1 and.
25+
1 ant.
26+
1 ap.
27+
1 apdo.
28+
1 aprox.
29+
1 apto.
30+
1 arq.
31+
1 arquit.
32+
1 art.
33+
1 asdo.
34+
1 asoc.
35+
1 át.
36+
1 aum.
37+
1 aus.
38+
1 aut.
39+
1 aux.
40+
1 avda.
41+
1 axud.
42+
1 bibl.
43+
1 bibliog.
44+
1 bl.
45+
1 b.o.
46+
1 bol.
47+
1 c.
48+
1 ca.
49+
1 cant.
50+
1 cap.
51+
1 carr.
52+
1 cast.
53+
1 cat.
54+
1 cát.
55+
1 catedr.
56+
1 célt.
57+
1 cént.
58+
1 cert.
59+
1 ch.
60+
1 cit.
61+
1 cl.
62+
1 clás.
63+
1 cód.
64+
1 coed.
65+
1 col.
66+
1 colab.
67+
1 com.
68+
1 comp.
69+
1 conc.
70+
1 constr.
71+
1 cont.
72+
1 convoc.
73+
1 coord.
74+
1 corp.
75+
1 corrix.
76+
1 cp.
77+
1 cta.
78+
1 cto.
79+
1 d.
80+
1 d.C.
81+
1 dec.
82+
1 del.
83+
1 dem.
84+
1 dep.
85+
1 desp.
86+
1 det.
87+
1 dic.
88+
1 dipl.
89+
1 dir.
90+
1 dir.ª
91+
1 disp.
92+
1 distr.
93+
1 d.l.
94+
1 doc.
95+
1 dpto.
96+
1 Dr.
97+
1 Dra.
98+
1 dta.
99+
1 dto.
100+
1 dupl.
101+
1 d/v.
102+
1 d.v.
103+
1 d.x.
104+
1 econ.
105+
1 ed.
106+
1 edit.
107+
1 ef.
108+
1 Em.
109+
1 entr.
110+
1 enx.
111+
1 e.p.d.
112+
1 epíl.
113+
1 escr.
114+
1 esp.
115+
1 esq.
116+
1 esqda.
117+
1 esqdo.
118+
1 est.
119+
1 estat.
120+
1 estr.
121+
1 etc.
122+
1 e.t.s.
123+
1 e.u.
124+
1 eusc.
125+
1 éusc.
126+
1 ex.
127+
1 exc.
128+
1 exped.
129+
1 ext.
130+
1 f.
131+
1 fábr.
132+
1 fac.
133+
1 facs.
134+
1 fact.
135+
1 fasc.
136+
1 feb.
137+
1 fem.
138+
1 fest.
139+
1 fig.
140+
1 fotogr.
141+
1 fr.
142+
1 fund.
143+
1 fut.
144+
1 gal.
145+
1 gar.
146+
1 gl.
147+
1 gob.
148+
1 gr.
149+
1 gram.
150+
1 h.
151+
1 hab.
152+
1 habit.
153+
1 íb.
154+
1 íd.
155+
1 igr.
156+
1 il.
157+
1 ilustr.
158+
1 imp.
159+
1 imper.
160+
1 imperf.
161+
1 impers.
162+
1 impr.
163+
1 inc.
164+
1 incl.
165+
1 incompl.
166+
1 ind.
167+
1 índ.
168+
1 indet.
169+
1 inf.
170+
1 infin.
171+
1 info.
172+
1 inform.
173+
1 ing.
174+
1 ins.
175+
1 insep.
176+
1 inst.
177+
1 int.
178+
1 inter.
179+
1 interr.
180+
1 interx.
181+
1 intr.
182+
1 introd.
183+
1 invent.
184+
1 irr.
185+
1 it.
186+
1 l.
187+
1 lab.
188+
1 lám.
189+
1 lat.
190+
1 lca.
191+
1 lco.
192+
1 ldo.lda.
193+
1 lic.
194+
1 licda.
195+
1 licdo.
196+
1 lit.
197+
1 loc.
198+
1 lonx.
199+
1 ltda.
200+
1 ltdo.
201+
1 m.
202+
1 maiúsc.
203+
1 masc.
204+
1 mat.
205+
1 máx.
206+
1 mc.
207+
1 mecan.
208+
1 med.
209+
1 merc.
210+
1 mercad.
211+
1 min.
212+
1 mín.
213+
1 minist.
214+
1 mod.
215+
1 ms.
216+
1 mt.
217+
1 mun.
218+
1 mús.
219+
1 mz.
220+
1 n.
221+
1 nac.
222+
1 n.do
223+
1 n.doed.
224+
1 neg.
225+
1 nom.
226+
1 not.
227+
1 nov.
228+
1 n.p.
229+
1 ntva.
230+
1 ntvo.
231+
1 núm.
232+
1 o.
233+
1 obs.
234+
1 of.
235+
1 o.p.
236+
1 op.
237+
1 op.cit.
238+
1 opús.
239+
1 orix.
240+
1 out.
241+
1 p.
242+
1 pal.
243+
1 par.
244+
1 parr.
245+
1 part.
246+
1 pat.
247+
1 pav.
248+
1 páx.
249+
1 p.b.
250+
1 P.D.
251+
1 pdo.
252+
1 pen.
253+
1 per.
254+
1 pers.
255+
1 pl.
256+
1 plu.
257+
1 p.m.
258+
1 p.m.a.
259+
1 p.n.
260+
1 pob.
261+
1 pol.
262+
1 port.
263+
1 pos.
264+
1 pr.
265+
1 pral.
266+
1 pref.
267+
1 prelim.
268+
1 prep.
269+
1 pres.
270+
1 prínc.
271+
1 priv.
272+
1 prnl.
273+
1 proc.
274+
1 prof.
275+
1 pról.
276+
1 pron.
277+
1 prov.
278+
1 próx.
279+
1 P.S.
280+
1 pta.
281+
1 pte.
282+
1 publ.
283+
1 públ.
284+
1 pza.
285+
1 r.
286+
1 rec.
287+
1 red.
288+
1 reed.
289+
1 ref.
290+
1 reg.
291+
1 rel.
292+
1 rev.
293+
1 rex.
294+
1 R.I.P.
295+
1 r.p.m.
296+
1 rte.
297+
1 s.
298+
1 S.A.
299+
1 sáb.
300+
1 s.d.
301+
1 sec.
302+
1 séc.
303+
1 secr.
304+
1 seg.
305+
1 sent.
306+
1 s.e.o.o.
307+
1 serv.
308+
1 set.
309+
1 símb
310+
1 símb.
311+
1 sing.
312+
1 s.l.
313+
1 S.L.
314+
1 s.l.s.a.
315+
1 s.n.
316+
1 sobr.
317+
1 soc.
318+
1 Sr.
319+
1 Sra.
320+
1 st.
321+
1 Sta.
322+
1 Sto.
323+
1 subs.
324+
1 subx.
325+
1 sum.
326+
1 sup.
327+
1 supl.
328+
1 suplem.
329+
1 sus.
330+
1 t.
331+
1 téc.
332+
1 tel.
333+
1 teléf.
334+
1 telegr.
335+
1 test.
336+
1 tfno.
337+
1 tip.
338+
1 tít.
339+
1 tón.
340+
1 trad.
341+
1 trans.
342+
1 trat.
343+
1 trav.
344+
1 trib.
345+
1 tripl.
346+
1 tv.
347+
1 u.
348+
1 ú.
349+
1 últ.
350+
1 univ.
351+
1 urb.
352+
1 v.
353+
1 v.
354+
1 Vde.
355+
1 Vde/s.
356+
1 ven.
357+
1 venc.
358+
1 vers.
359+
1 v.gr.
360+
1 vid.
361+
1 vol.
362+
1 VV.
363+
1 x.
364+
1 xan.
365+
1 xer.
366+
1 xll.
367+
1 x.p.
368+
1 xud.
369+
1 xur.
370+
1 xust.
371+
1 xv.

cvutils/data/gl/alphabet.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
aábcdeéfghiílmnñoópqurstuúvxz
1+
aábcdeéfghiílmnñoópqrstuúüvxz

cvutils/data/gl/punct.tsv

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
EOS !
2+
EOS ?
3+
EOS .

0 commit comments

Comments
 (0)