validate_manifest_uris is used to validate file formats pointed by groundtruth_uri and taskdata_uri fields in manifest. Currently it fetches full file first and only then applies validation.
Those files could be quite large and we can improve validation performance/mem consumption by using streaming request and passing chunks into streaming json parser. Here is potential solution using ijson lib: https://github.com/hCaptcha/hmt-basemodels/blob/30-add-gt-models/basemodels/streaming_json.py
@gaieges