Skip to content

Conversation

KristjanESPERANTO
Copy link
Collaborator

@KristjanESPERANTO KristjanESPERANTO commented Aug 6, 2025

This is replacing the non-functioning concept script (gen_word_error_correction.js) with a working one (gen_word_error_correction.mjs) that is integrated into the build process 🥳

It retrieves all month and week names from the Intl API and packs them together with the manual definitions (in word_error_correction_manual.yaml) into word_error_correction.yaml.

This means that we support over 240 languages with that, compared to only a handful previously 🤯 And we don't even have to worry about maintaining the strings, as they are always queried dynamically during the build 😁

Example

Before, the string 月曜日-金曜日 09:00-17:00 was not usable.

Before

Prettified: not possible

Warnings:

月 <--- (Unexpected token: "月" This means that the syntax is not valid at that point or it is currently not supported.) 
(Use `node --trace-uncaught ...` to show where the exception was thrown)

After

Prettified: Mo-Fr 09:00-17:00

Warnings:

月曜日 <--- (Please use the English abbreviation "Mo" for "月曜日".)
月曜日-金曜日 <--- (Please use the English abbreviation "Fr" for "金曜日".)

Short names

With my last commit, I also added short names. However, I had to filter out ambiguous names because there were too many of them. Without filtering them out, we would receive a large number of such warnings:

  "sat": "Word \"sat\" is ambiguous: Sa (English) or Sep (Hausa) or Sa (Igbo). Please specify language context or use English weekday name."
  "sun": "Word \"sun\" is ambiguous: Su (English) or Su (Faroese) or Jun (Tongan). Please specify language context or use English weekday name."

I'm really glad to be creating this PR now. It took me a lot of time 😴. Since the tests look good, I'd actually like to merge it right away. But since it's quite a significant change, I'll wait a little for your feedback @ypid 🙂

Copy link

@HolgerJeromin HolgerJeromin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
I was not aware the Intl API is available in node.

@KristjanESPERANTO
Copy link
Collaborator Author

Nice! I was not aware the Intl API is available in node.

Yes, that's cool. So we don't need any external dependencies for that. It just feels a bit wrong to use brute force to find out which languages are supported by Intl.

@HolgerJeromin
Copy link

It just feels a bit wrong to use brute force to find out which languages are supported by Intl.

Yeah, but js engine local only and at build time. So 🫣

…endencies

- Replacement for removed non-functional concept script `gen_word_error_correction.js` with production-ready implementation
- Without external dependencies
- Use native `Intl.DateTimeFormat` for date/time formatting
- Use native `Intl.DisplayNames` for dynamic language name resolution
- Add dynamic locale discovery covering 140+ languages
- Implement ambiguous word detection with warning system
These entries are now automatically generated.
- Increase supported languages from 146 to 244
- Maintain low conflict rate with only on additional ambiguous word detected
@KristjanESPERANTO KristjanESPERANTO force-pushed the feat/gen_word_error_correction branch from fe9b4c9 to 95b6166 Compare August 18, 2025 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants