Skip to content

Use streaming request & parser API in validate_manifest_uris #38

@mtratsiuk

Description

@mtratsiuk

validate_manifest_uris is used to validate file formats pointed by groundtruth_uri and taskdata_uri fields in manifest. Currently it fetches full file first and only then applies validation.

Those files could be quite large and we can improve validation performance/mem consumption by using streaming request and passing chunks into streaming json parser. Here is potential solution using ijson lib: https://github.com/hCaptcha/hmt-basemodels/blob/30-add-gt-models/basemodels/streaming_json.py

@gaieges

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions