This project contains the command line interface (CLI) and the corresponding Docker image of the Data Privacy Toolkit (DPT) Refer to the documentation for information about the offered capabilities.
The DPT library is a Java project, and it is currently being tested against the following releases:
- AdoptOpenJDK Hotspot 11
- AdoptOpenJDK Hotspot 17
- Microsoft Build of OpenJDK 11
- Microsoft Build of OpenJDK 17
- Amazon Corretto Build of OpenJDK 11
- Amazon Corretto Build of OpenJDK 17
It currently builds using gradle version 8.0.2, offering gradle wrapper as a convenience tool.
Note that the following instructions need to be executed within the /docker subfolder of the repository.
The CLI can be tested by running:
./gradlew buildThis gradle task will compile the project, execute all the tests specified in the /src/test/java subfolder, and it will create the final jar file in the /build/libs subfolder.
Running the following command gradle will create the uberjar, i.e. a jar file that also contains all the dependencies required to run DPT as an application.
./gradlew shadowJarAfter having created the uberjar, it can be executed as follows:
java -jar build/libs/data-privacy-toolkit-cli-${VERSION}-all.jarwhere VERSION is the current version of the project. At the moment of writing this value is set to 6.0.0-SNAPSHOT.
Refer to the version value in build.gradle for an updated reference.
The process of creating the jar with the dependencies does not pack models that might be required by the free text processing capabilities of the toolkit. These models are generally released under specific licences and must therefore be independently added to the jar file, or made available to the java VM via classpath.
After having built the uberjar, it is possible to create a docker image with the following steps:
docker build -t data-privacy-toolkit:local .where data-privacy-toolkit is a name assigned to the final image.
Note that the tag is set to local as a convention.
Any tag can be set, as long as the name does not overwrite required existing images.
After that, the DPT docker image can be executed as follows:
docker run --rm -it data-privacy-toolkit:localNew versions of the DPT docker image are automatically deployed to the image registry https://quay.io by the CI/CD pipeline.
Namely, every PR to the main branch will cause a new docker image to be available in the registry.
More precisely, the deployment steps of the CI/CD pipeline (see corresponding workflow) will deploy two tags for the image.
First, it will push an image having tag set to the current GIT commit hash, and then it will tag the same image new latest.
The following command allows the retrieval of the image currently labeled as latest.
docker pull quay.io/data_privacy_toolkit/cliTo run such image, one simply needs to execute the following command:
docker run --rm -it quay.io/data_privacy_toolkit/cliThe DPT docker image expects certain information to be provided through volume mounting, see the specifications in the Dockerfile. Namely, the image expects the following:
- The input dataset to be available within a folder mounted as
/input; - The result of the task execution will be made available in folder mounted as
/output, note that it is generally not recommended to mount/inputand/outputon the same folder; - The configuration should be written in a file named
config.jsonand it should be made available within a folder mounted as/config; - Optionally, the file structures created by the persistency/consistency features of DPT can be saved by mounting a folder as
/consistency
Thus, the following command is the docker run command using all options and assuming the folders input, output, config, and consistency are available in the working directory.
docker run --rm -it \
--mount type=bind,source=$PWD/input,target=/input \
--mount type=bind,source=$PWD/output,target=/output \
--mount type=bind,source=$PWD/config,target=/config \
--mount type=bind,source=$PWD/consistency,target=/consistency quay.io/data_privacy_toolkit/cliPlease refer to the existing end-to-end tests and the documentation for examples of configuration for the various tasks. q