For parquet file workloads we can determine the number of records from the actual parquet files, it is not necessary to manually specify it as we currently do:
class Mnist(MnistBase):
def __init__(self, name: str, cache_dir: str):
super().__init__(name, "mnist", cache_dir=cache_dir)
@property
def record_count(self) -> int:
return 60000
Similarly for any workload which limits the count such as *-test variants, we also know the limit:
class MnistTest(MnistBase):
"""Reduced, "test" variant of mnist; with 1% of the full dataset (600
passages and 20 queries)."""
def __init__(self, name: str, cache_dir: str):
super().__init__(name, "mnist", cache_dir=cache_dir, limit=600, query_limit=20)
@property
def record_count(self) -> int:
return 600
For parquet file workloads we can determine the number of records from the actual parquet files, it is not necessary to manually specify it as we currently do:
Similarly for any workload which limits the count such as
*-testvariants, we also know the limit: