This is an application that encrypts incoming data and uploads it to the given minio bucket. The data is encrypted using AES-256 with GCM inspired by gocryptfs.
Developed and Tested on Windows10 and Docker Engine v24.0.7
The configuration uses TOML format in the file config.toml. The file format is:
- endpoint
stringendpoint for minio - accessKeyID
stringvalue for minio access key - secretAccessKey
stringvalue for minio secret access key - useSSL
boolvalue for SSL option - bucketName
stringvalue for bucket name - encryptionKey
stringencryption key in hex format written in string of 64bytes(hex) or 32 bytes(string) - chunking
booloption if the files will be uploaded in chunks
The config.toml contains an example with example keys. NEVER UPLOAD THE REAL KEYS.
Restart the application after configuration changes
After completing the configuration.toml and before starting the main application run to start minio container from the main folder
docker-compose up -dFinally start the full application
go run .Files can uploaded to the /upload/file endpoint. An example command
curl --location 'endpoint:8080/upload/file' \
--form 'upload=@"PATH_TO_FILE"' \To upload a file with chunk-size of 1 MB you could run a command:
curl --location 'localhost:8080/upload/file' \
--form 'upload=@"taurus-minio/uploads/big.txt"' \
--form 'chunk-size="1MB"'File download can be done to the /file/:filename endpoint.
Files can be downloaded with the following command
curl <endpoint>:8080/file/<filename> -O -JExample for downloading file big.txt
curl localhost:8080/file/big.txt -O -JThis task required the application to support files of arbitrary size.

This meant that a tyrannous user could upload a very large file and the application should be able to handle it. However, this application required files to be encrypted and saving big files on the disk, even temporarily would hinder the performance as it is not very efficient.
To combat this issue, the application uses data streams of IO readers when passing data and never storing the full file in memory. Thus, the file is started to be processed as soon as the block of it is received.
Such feat is accomplished using golang io.Pipe which creates a reader and a writer which communicate with each other. The writer encrypts/decrypts and writes the data as long as it is able to receive raw-data from a file and the reader passes the data to the end destination.
To simplify the access to the API gin web framework was used. The file download would be done by GET request and the upload would be done by POST request to the gin router endpoint.
All the encryption is performed in the encryption package. It uses AES-256 with GCM. The main idea of file storage is based upon the image from gocryptfs.

The encryption process is as follows:
- At the start of each each file being uploaded, we generate a 16 byte
File IDand write it at the start of the file. - For each block of a fixed size we generate an
IVof 12 Bytes which is written to the encrypted file as well as used for encryption. - The IV is used for encryption and
File IDalongside withblock numberare used for encryption as additional data orAAD. - The encrypted data is then written to the storage at the very end.
This gives us per file encryption.
The decryption is performed as follows:
- Once the file has begun downloading, fetch first 16 bytes of data and store them as
File ID. - Then keep a track of blocks read so far as
block numberand use it for decryption. - Read block by block. For each block first read the 12 bytes of
IVstored before each block. - Using the
IVandblock numberandFile IDare used to decrypt. Theblock numberandFile IDare used as additional data(AAD) forAES-256 GCMdecryption. - Serve the decrypted data to the user
The integrity is preserved by the additional data (block number and File ID). This additional data is used for preserving file integrity when encrypting/decrypting with AES-256 GCM
The design behind file chunking is not very exquisite solution. It relies on a simple naming scheme. Moreover, current approach is susceptible to chunk renaming, or in other words file content reordering.
The name is derived by simply appending string _chunk{#id} to the end of the file name. For example if we have image.png split into 3 chunks it would be named in the storage as image.png_chunk0, image.png_chunk1,image.png_chunk2.
First of all, whenever file is uploaded the parameter chunk-size is passed in the form-data as a string. It takes integer values for the number part and B, KB, MB, GB, TB, PB parts for the size part.
Then the file is read until chunk size is reached. Once the full size of chunk is filled, we begin creating a new chunk. NOTE, the chunk is never read in full, it is uploaded as any single file using data stream.
Since we have per file encryption, encrypting each chunk file is simple.
Each chunk will have small encryption overhead of up to 28 bytes. It is achieved by: 28Bytes = 12Bytes (IV size) + 16Bytes(GCM Tag). In rare cases where chunk-size < 44Bytes 12Bytes (IV size) + 16Bytes(GCM Tag)+ 16Bytes(File ID), the chunk overhead is up to 44 Bytes. However these are relatively low. Moreover, using such small chunks for big files is not recommended as it is not efficient.
When downloading chunks, the chunks are fetched by filename and _chunk{#id} part is added. The list of all available chunks is returned from minio, although, it can only return up to 1000 file names, and that is the current chunk number limitation of this application.
Once the chunk count is established the number of routines is started to download chunks in parallel. The chunks can be downloaded in parallel, however, they must be delivered in order.
So the routines are split up amongst which chunk files can they download and they do not start to download new chunk until their current downloaded chunk is processed. This is accomplished with go channels.
For example we have 7 chunks and 3 routines the work distribution is:
routine1 is responsible for chunks : 1,4,7
routine2 is responsible for chunks : 2,5,
routine3 is responsible for chunks : 3,6
The download is done:
# All routines downloaded and block until chunk {x} is delivered
deliver chunk {1}
routine {1} download chunk{4}
deliver chunk {2}
routine {2} download chunk{5}
deliver chunk {3}
routine {3} download chunk{6}
deliver chunk {4}
routine {1} download chunk{7}
deliver chunk {5}
deliver chunk {6}
deliver chunk {7}
Once a routine delivers it can start downloading its next chunk while some other earlier routine is delivering. Currently routines only download 1 chunk at a time, however, this could theoretically be increased, but is yet to be tested.
The current application handles most of the errors by crashing if something goes wrong, like integrity errors or invalid options. This is something that is yet to be fixed for expanding it later
These are the files and their sizes the app was tested for
| Size | Upload Time | Download Time | |
|---|---|---|---|
| Small | 4B | 0.17s | 0.001s |
| Big | 4KB | 0.06s | 0.001s |
| Image | 3MB | 0.2s | 0.02s |
| Exe | 130MB | 3.38s | 0.83s |
| Zip | 5.1GB | 136s | 37s |
The tests could be expanded more, however, all of these yielded correct roundtrip result aka, the files contained the same content.
I don't think the chunk naming scheme I chose is good, therefore, it should be reworked.
If a long file name is given and using chunks the naming method would not work.
The upload of the files is currently slow as parsing each chunk is done in a single function call. This could further be parallelized in a future work.
This is an encryption error/weakness that I noticed as I am finishing this document. For example, in the current implementation, if an attacker gains access to the bucket and renames the chunk files, the algorithm would not know that, as blockIDs are currently per chunk and not per whole file.
Iterating over block number per whole file rather than per chunk would be the fix
Handle errors in a way that does not terminate the application
Currently the unit tests are only there for encryption module as it required some testing while developing the app to ensure nothing breaks.
Benchmark the application, to understand upload/download speeds, number of concurrent users. This would also help to improve the app performance later.