Support compressing/decompressing files to s3

Currently, the contents of the `data` directory are directly uploaded to S3 and retrieved exactly as they are. Given that data files lend themselves well to being compressed (generally), substantial storage, bandwidth and time savings may be possible for larger projects if something like gzip were run on the `data` directory. This raises one problem, idiosyncratic to the AP's use case, which is where we have publicly viewable HTML files in a subdirectory of `data` meant for people to look at.

I have identified three possible approaches:

**Option 1**

Compress all folders under `data` _except_ for `reports`, or some similarly named subfolder, that is explicitly not compressed before being uploaded to s3. Thus, anything put in that subfolder will be accessible directly by s3 pathing.

On s3, the data files would look like this after compression:

```
data/manual.gz
data/processed.gz
data/source.gz
data/reports/my_report.html
data/reports/some_image.png
```


**Option 2**

Support a 'protect' dotfile in directories that marks that directory and all subfolders as being compression-exempt. For example, `data/reports` would now have a file in it, `data/reports/.nocompress`, which would stop it from being compressed before being uploaded to s3. This would have the same overall s3 folder structure as above.

**Option 3**

The `datakit-data.json` config file expands to include a whitelist of folders to not be compressed, with the default set to `data/reports` (or just `reports`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support compressing/decompressing files to s3 #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support compressing/decompressing files to s3 #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions