Closes #63 | Create dataloader for MongabayConservation#538
Closes #63 | Create dataloader for MongabayConservation#538megasiska86 wants to merge 3 commits intoSEACrowd:masterfrom
Conversation
jensan-1
left a comment
There was a problem hiding this comment.
Hello @megasiska86, thank you for implementing the dataloader for MongabayConservation.
Tested and works fine! Just some minor fix to the nits, which are provided in the suggestion.
Also, it's okay to remove the README.md as the instruction to call the data will be provided in the SEACrowd catalogue later.
| (positive, neutral, negative) based on related topics. | ||
| """ | ||
|
|
||
| _HOMEPAGE = "" |
There was a problem hiding this comment.
| _HOMEPAGE = "" | |
| _HOMEPAGE = "https://huggingface.co/datasets/Datasaur/mongabay-experiment" |
|
|
||
| _HOMEPAGE = "" | ||
|
|
||
| _LICENSE = "The Unlicense (unlicense)" |
There was a problem hiding this comment.
| _LICENSE = "The Unlicense (unlicense)" | |
| _LICENSE = Licenses.UNLICENSE.value |
|
|
||
| from seacrowd.utils import schemas | ||
| from seacrowd.utils.configs import SEACrowdConfig | ||
| from seacrowd.utils.constants import Tasks |
There was a problem hiding this comment.
| from seacrowd.utils.constants import Tasks | |
| from seacrowd.utils.constants import Tasks, Licenses |
|
Here is my general review of the dataloader:
Here are my comments of the dataset:
tag-classification test resultsentiment-classification test result |
| ), | ||
| ] | ||
|
|
||
| DEFAULT_CONFIG_NAME = f"{_DATASETNAME}_source" |
There was a problem hiding this comment.
This should be one of the config name defined previously.
| _HOMEPAGE = "" | ||
|
|
||
| _LICENSE = "The Unlicense (unlicense)" | ||
|
|
There was a problem hiding this comment.
| _LOCAL = False | |
In that vein, I also have a question. As the dataset owner and the paper author, is there a reason why you used separate URLs for the subsets (e.g., Mongabay-tags-classification, Mongabay-sentiment-classification, mongabay-experiment) rather than using this unified dataset URL, @megasiska86? cc: @akhdanfadh @jen-santoso |
Thank you for the review.
cc: @akhdanfadh |
Nice concern @holylovenia |
I see. Could you use mongabay_collectionas the URL for this dataloader for simplicity? |
|
A friendly reminder for @megasiska86 to address the suggestions. 🙏 |
|
Hi @megasiska86, is there anything we can help you with for the dataloader? |
Okay, will do it in the end of this week. thank you |
Thanks @megasiska86! Feel free to ask @akhdanfadh and @jen-santoso if you have any questions or concerns! |
|
Hi @megasiska86, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) by 30 May, so it'd be great if we could wrap up the reviewing and merge this PR before then. |
|
Hi @megasiska86, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) in 31 hours, so it'd be great if we could wrap up the reviewing and merge this PR before then. |
|
Hi @megasiska86, thank you for contributing to SEACrowd! I would like to let you know that we are still looking forward to completing this PR (and dataloader issues) and maintaining SEACrowd Data Hub. We hope to enable access to as many standardized dataloaders as possible for SEA datasets. Feel free to continue the PR whenever you're available, and if you would like to re-assign this dataloader to someone else, just let us know and we can help. 💪 Thanks again! cc: @akhdanfadh @jen-santoso |
Please name your PR title and the first line of PR message after the issue it will close. You can use the following examples:
Title: Closes #63| Add/Update Dataloader Mongabay
First line PR Message: Closes #63
where you replace the {ISSUE_NUMBER} with the one corresponding to your dataset.
Checkbox
seacrowd/sea_datasets/{my_dataset}/{my_dataset}.py(please use only lowercase and underscore for dataset folder naming, as mentioned in dataset issue) and its__init__.pywithin{my_dataset}folder._CITATION,_DATASETNAME,_DESCRIPTION,_HOMEPAGE,_LICENSE,_LOCAL,_URLs,_SUPPORTED_TASKS,_SOURCE_VERSION, and_SEACROWD_VERSIONvariables._info(),_split_generators()and_generate_examples()in dataloader script.BUILDER_CONFIGSclass attribute is a list with at least oneSEACrowdConfigfor the source schema and one for a seacrowd schema.datasets.load_datasetfunction.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.pyorpython -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py --subset_id {subset_name_without_source_or_seacrowd_suffix}.