feat: direct download csv from datasets api#342
feat: direct download csv from datasets api#342hmgomes wants to merge 1 commit intoadaptive-machine-learning:mainfrom
Conversation
|
I am not adding a test for this because it would require downloading all CSV files available every time it runs |
Outcome of the validation testThose that failed below should be fine because they just don't have CSV equivalent files. from _source_list.py === Testing Sensor (csv) === === Testing Hyper100k (csv) === === Testing CovtFD (csv) === === Testing Covtype (csv) === === Testing CovtypeTiny (csv) === === Testing CovtypeNorm (csv) === === Testing RBFm_100k (csv) === === Testing RTG_2abrupt (csv) === === Testing ElectricityTiny (csv) === === Testing Electricity (csv) === === Testing Fried (csv) === === Testing FriedTiny (csv) === === Testing Bike (csv) === |
Add CSV Support to Built-in Datasets API
This PR adds CSV support to the built-in Datasets API while preserving the current default behaviour.
After this change, built-in datasets still download ARFF by default, but can now also download and open CSV files when available:
Why
CapyMOA already stores CSV source URLs for several datasets in
_source_list.py, but the public Datasets API only used the ARFF sources.This made it harder to:
This PR exposes that capability in a simple way without changing the existing ARFF default.
What Changed
file_type="arff" | "csv"support to the shared dataset download logicBehaviour
Default behaviour is unchanged:
CSV can now be requested explicitly:
Notes
optimise=Falsemay be required because these streams are not MOA-backedValidation
Small test of the CSV-backed built-in datasets with representative classification and regression cases.
Examples tested
ElectricityTiny(file_type="csv")FriedTiny(file_type="csv")Hyper100k(file_type="csv")Also ran a small end-to-end evaluation on:
with
PassiveAggressiveClassifierusing:Example Test