Skip to content

Conversation

@kkmattil
Copy link
Contributor

Proposed changes

Briefly describe the changes you've made here. Remember to add a link to the preview page of your branch.

Checklist before requesting a review

@kkmattil kkmattil requested a review from galfthan October 10, 2025 12:38
Copy link
Collaborator

@galfthan galfthan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally the new document should be spell checked, I did not do a thorough check for typos etc.

* [Importing data ](./sd-desktop-access.md)
* [Exporting data via user interface](./sd-desktop-export.md)
* [Export data programmatically](./sd-desktop-export-commandline.md)
* [Using HPC resources for sensitive data](./tutorials/sdsi.md)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addition is missing from https://csc-guide-preview.2.rahtiapp.fi/origin/kkmattil-patch-1/data/sensitive-data/sd-services-toc/ where the other bullets are repeated

# Submitting jobs from SD Desktop to the HPC environment of CSC

The limited computing capacity of a SD Desktop virtual machines can prevent running heavy analysis tasks
for sensitive data. This document describes, how heavy computing tasks can be submitted form SD Desktop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

form=>from


Please note following details that limit the usage of this procedure:

* The service is not yet in full production. Access will be provided by a request for projects that have computing tasks that are compatible with the current status of the service. Contact [email protected] the enable this service.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Access can be requested for projects that have computing tasks that are compatible with the current status of the service.

* The *sdsi-client* job submission tool, described in this document, will work only for the approved projects.
* Each job reserves always one, and only one, full Puhti node for your task. Try to construct your batch job so that it uses effectively all the 40 computing cores of one Puhti node.
* The input files that the job uses must be uploaded to SD Connect before the job submission. Even though the job is submitted from SD Desktop, you can't utilize any files from the SD Desktop VM in the batch job.
* The jobs submitted from SD Desktop to Puhti have higher security level that normal Puhti jobs but lower than that of SD Desktop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that=> than

The increased security of the computing tasks that are submitted with the sdsi-client is based on two main features:

1. Both input and output data are stored and transported in encrypted format. Encryption is done automatically using the SD Connect methodology.
2. For the actual analysis the data is temporarily decrypted to the local disk area of a compute node that is fully reserved for only this one job. Thus there can't be other users in the node during the execution of the job, which effectively eliminates the possibility that other users could access the disk areas, memory or process list of the node during the processing. All the data will be removed from local disk of the node when the job ends.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus there are no other users in the node during the execution of the job, and other users cannot access the disk areas, memory or process list of the node during the processing. The technical and operational measures that implements this key restriction are described in the Puhti TOMs (LINKKI, puuttuu vielä researchistä??)

or few computing cores, you can use tools like _gnuparallel_, _nextfllow_ or _snakemake_ to submit several
computing tasks to be executed in the same time.

In the examples below we have a tar-arcvive file that has been stored to SD Connect: `2008749-sdsi-input/data_1000.tar.c4gh`. The tar file contains 1000 text files (_.txt_) for which we want to compute md5sum. Bellow we have three alternative ways to run the tasks so that all 40 cores are effectively used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tar-archive


Please note following details that limit the usage of this procedure:

* The service is not yet in full production. Access will be provided by a request for projects that have computing tasks that are compatible with the current status of the service. Contact [email protected] the enable this service.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should here link to TOMs and also say that you should use them to evaluate of the service is sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have link to toms in the page.

Link to Puhti TOMs added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants