Skip to content

EO4EU/workflow-pre-pro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The EO4EU logo

Funded by the EU.

Pre-Processor

This component is responsible for fetching data from datasource scripts and insitu files and uploading them to S3. The container runs pre_pro.app:__main__, which calls pre_pro.execution:PreProcessorExecution.__call__ with incoming messages. The logic is as follows:

  • Read the eo4eu/datasources-config configmap entry, which is a base64 encoded JSON list of (again) base64 encoded python scripts.

  • Read the eo4eu/metadata configmap entry, which is a base64 encoded JSON list of dataset metainfo objects following the general EO4EU metainfo spec. Sometimes this metainfo does not exist, in which case the Pre-Processor creates some basic metainfo based on the downloaded files and adds the default dataset names dataset-000, dataset-001, etc...

  • Read the eo4eu/inSituData configmap entry, which is a path to an S3 object in the eo4eu-insitu bucket.

  • Read the eo4eu/inSituMeta configmap entry, which is a base64 encoded JSON list of one dataset metainfo object. This usually does not exist, and the Pre-Processor creates some basic metainfo with the dataset name INSITU

  • Create a list of pre_pro.requests.Request objects, each of which represents a datasource/insitu dataset. The code for fetching the data lies in the .driver field of the request, which has a pre_pro.drivers.DSDriver. The driver itself has one of the three:

    • pre_pro.drivers.ScriptFetcher: Runs a datasource script under a less privileged user and detects the new files in the working directory.
    • pre_pro.drivers.InsituFetcher: Downloads the insitu S3 object, which is typically an archive file (.zip), and unpacks its contents.
    • pre_pro.drivers.InsituV2Fetcher: Downloads the insitu S3 objects, which are specified through the insitu V2 metainfo.
  • Each request is run through the pre_pro.execution.PreProcessorExecution._execute_request. This first calls pre_pro.drivers.DSDriver.ls on the requests driver. This is where the file downloads happen. The metainfo is compared to the actual files downloaded, and an algorithm tries to match each metainfo entry to a downloaded file.

  • All files are uploaded to s3://<s3-bucket-name>/source/.

  • The metainfo objects for each file are joined up into full dataset metainfo objects.

  • The dataset metainfo objects are then joined up and put into the kafka message going to the next component, as well as to the S3 bucket.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published