Skip to content

Conversation

@tukkek
Copy link
Contributor

@tukkek tukkek commented Jun 30, 2023

After #425, there are 9 files in humans/ relating to names, which is a substantial chunk of the folder. I've taken the liberty of reorganizing them in this PR rather than creating an issue first to discuss the changes, with a names/ sub-folder.

Pros:

  1. Proper structure (such as with data/games/rpg/ and data/words/literature/).
  2. Better discoverability (may be easy for users to not realize there are more name-related files when browsing humans/).
  3. More elegant file-names (see below).

Cons:

  1. Will break backwards compatibility (if someone is using the whole project as git-submodule, for example).

As per 3, I've taken the liberty to rename the files for a more elegant and cohesive naming scheme. for example:

  • firstNanes.json and lastNames.json to usa.json and usaSurnames.json
  • norwayFirstNamesBoys.json to norwayMale.json

I hope the benefits out-weight the compatibility issue, especially as it's safe to expect more name-related files to be added again in the future but I understand this might be a concern for some projects.

Here's a simpler view of the folder changes: https://github.com/tukkek/corpora/tree/names-folder/data/humans/names

@tukkek
Copy link
Contributor Author

tukkek commented Jun 30, 2023

Here's a simpler view of the folder changes: https://github.com/tukkek/corpora/tree/names-folder/data/humans/names

@hugovk
Copy link
Collaborator

hugovk commented Jun 30, 2023

Corpora doesn't make any compatibility promises, so I think this is okay?

Let's ask @dariusk :)

@tukkek
Copy link
Contributor Author

tukkek commented Jun 30, 2023

Corpora doesn't make any compatibility promises

A few more thoughts on this matter:

  1. This isn't an API change which requires logic updates and re-testing. Updating a JSON file's address should be trivial.
  2. It''s unlikely that anyone is doing git-submodule for a single file or two, which should be most use-cases for corpora. Even if they are, git sub-modules don't update automatically and end-users should be aware of issues when doing so.
  3. People hot-linking (such as this) could and should just as easily be doing so in a stable manner instead (this).

This is probably overkill argumentation just to get this issue merged but I'm hoping it may inform the project as a whole (as a data-focused project in comparison to a programming library API or similar). I do appreciate the discussion though, as it may ultimately affect end-users! Cheers to all.

@tukkek
Copy link
Contributor Author

tukkek commented Jul 6, 2023

Corpora doesn't make any compatibility promises

Since it's been a week with no opinions against (and considering previous discussion), do you think it'd be alright to merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants