-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Proposal Summary
Make OpenImageV6 dataset import faster
Motivation
Simply loading a pre-existing dataset from the disk to the API should take seconds, not minutes..!
Details
I'm new to fiftyone. I started by loading a 5K subset of OpenImageV6:
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset(
"open-images-v6",
split="validation",
image_ids=image_ids
)
so far so good. Dowloading takes a few minutes, of course, but when I do this:
dataset_from_dir = fo.Dataset.from_dir(
dataset_dir="/home/sampsa/fiftyone/open-images-v6/validation",
dataset_type=fo.types.OpenImagesV6Dataset
)
It takes [6.6m elapsed, 0s remaining, 14.8 samples/s]
just to import the data to the API from the disk (!).
That's only for a very small dataset. I'd hate to imagine what would happen for 10k-100k image sets.
What happens under-the-hood? Why is this so slow? For comparison, using the "classic" pycocotools COCO API just takes a few seconds to load a coco dataset.
Willingness to contribute
The FiftyOne Community encourages new feature contributions. Would you or
another member of your organization be willing to contribute an implementation
of this feature?
- Yes. I can contribute this feature independently.
- [X ] Yes. I would be willing to contribute this feature with guidance from
the FiftyOne community. - No. I cannot contribute this feature at this time.