broadinstitute · jaspreetishar · Oct 9, 2025 · Oct 23, 2025 · Oct 23, 2025 · Oct 23, 2025
diff --git a/docs/cloud/Manifold/Tutorial_for_generating_LandscapeFiles.md b/docs/cloud/Manifold/Tutorial_for_generating_LandscapeFiles.md
@@ -0,0 +1,164 @@
+# Tutorial: Generating Celldega LandscapeFiles on Manifold
+
+This guide explains how to submit workflows for generating **Celldega LandscapeFiles** on **Manifold**, using a Jupyter Notebook environment.
+
+---
+
+## 1. Overview
+
+The `run_landscape_wdl.sh` script automates launching the **LandscapeFiles WDL workflow** via **AWS HealthOmics**.
+It performs the following steps:
+
+1. Validates inputs
+2. Clones the [official workflow repository](https://github.com/broadinstitute/stp_celldega_landscape_files)
+3. Submits the WDL pipeline to AWS HealthOmics using a user-provided `.json` input file
+
+The input file (e.g., `celldega_inputs.json`) defines workflow parameters such as dataset location, output bucket, and the technology.
+
+---
+
+## 2. Environment and File Setup
+
+### AWS HealthOmics Environment
+
+Create an environment using the **“WDL on AWS HealthOmics”** base image.
+This image includes the dependencies required to submit WDL workflows to AWS HealthOmics directly.
+
+![](new_compute_env.png)
+
+---
+
+### The Shell Script
+
+`run_landscape_wdl.sh` handles workflow setup and execution automatically.
+You can use relative or absolute paths to reference the script in your notebook.
+
+> **Note:** The `run_landscape_wdl.sh` script is available on the [official workflow repository](https://github.com/broadinstitute/stp_celldega_landscape_files).
+> Once deployed, you can reference it directly from your environment or the shared Manifold workbench.
+
+---
+
+### Example Input JSON
+
+You can either create the input JSON manually or generate it programmatically.
+Here’s a quick example (`celldega_inputs.json`):
+
+```python
+import json
+
+inputs = {
+    "LandscapeFiles.sample": "dataset-name",
+    "LandscapeFiles.data_dir": "s3://project/data/",
+    "LandscapeFiles.bucket_path_landscape_files": "s3://project/landscape_files/",
+    "LandscapeFiles.technology": "technology-name",
+    "LandscapeFiles.bin_size": 2,
+    "LandscapeFiles.tile_size": 500,
+    "LandscapeFiles.use_dummy_clusters": False
+}
+
+with open("celldega_inputs.json", "w") as f:
+    json.dump(inputs, f, indent=2)
+```
+
+This file specifies:
+
+* **Which sample** to process
+* **Where to find** the instrument data
+* **Where to save** the output LandscapeFiles
+* **Which spatial technology** is used (e.g., **Xenium**, **Visium-HD**, **MERSCOPE**)
+
+---
+
+### Input Parameter Glossary
+
+| Parameter                            | Type    | Description                                                                                                                           |
+| ------------------------------------ | ------- | ------------------------------------------------------------------------------------------------------------------------------------- |
+| `bucket_path_landscape_files`        | `str`   | AWS S3 path to the **output directory** where generated LandscapeFiles will be saved (e.g., `s3://my-output-bucket/landscapefiles/`). |
+| `data_dir`                           | `str`   | AWS S3 path to the **input data directory** containing the files needed for processing (e.g., `s3://my-input-bucket/data/`).          |
+| `sample`                             | `str`   | The **sample name** or identifier for the dataset being processed.                                                                    |
+| `technology`                         | `str`   | The **spatial technology** used to generate the data (e.g., `"Visium-HD"`, `"MERSCOPE"`, `"Xenium"`).                        |
+| `bin_size` *(optional)*              | `int`   | For sST technologies like Visium-HD: the **spatial binning size** (in microns). Default: `2`.                |
+| `celldega_docker_image` *(optional)* | `str`   | The **Docker image URI** for running Celldega. Override this to use a specific image version. Default: latest maintained version.     |
+| `image_file_path` *(optional)*       | `str`   | For sST technologies like Visium-HD: Path to an **associated image file** (e.g., histology image) for visualization.                                                       |
+| `image_scale` *(optional)*           | `float` | Scaling factor applied to the input image when image and coordinate resolutions differ.                                               |
+| `jitter` *(optional)*                | `int`   | For sST technologies like Visium-HD: Adds small **random spatial offsets** to prevent overplotting of bins.                                                    |
+| `tile_size` *(optional)*             | `int`   | The **tile dimension (in pixels)** used to subdivide large data for processing.                                                       |
+| `use_dummy_clusters`                 | `bool`  | Whether to use **dummy clusters** (a single cluster labeled `0`) instead of real clustering results.                                  |
+
+
+---
+
+## 3. Grant Execution Permissions
+
+In your Jupyter notebook, make the script executable:
+
+```bash
+!chmod +x run_landscape_wdl.sh
+```
+
+---
+
+## 4. View Script Usage (Optional)
+
+To display the script’s usage instructions:
+
+```bash
+!./run_landscape_wdl.sh --help
+```
+
+Output:
+
+```
+Usage: ./run_landscape_wdl.sh --input <input_json>
+Example: ./run_landscape_wdl.sh --input celldega_inputs.json
+```
+
+---
+
+## 5. Submit Independent Workflow Runs
+
+You can run one or multiple input files independently.
+Each submission is self-contained and processes a separate dataset.
+
+Example:
+
+```bash
+!./run_landscape_wdl.sh --input celldega_inputs_technology-1.json
+!./run_landscape_wdl.sh --input celldega_inputs_technology-2.json
+!./run_landscape_wdl.sh --input celldega_inputs_technology-3.json
+```
+
+---
+
+## 6. Monitoring Workflows
+
+Monitor workflow progress and job status under the **Pipelines** tab in Manifold.
+
+![](pipelines_tab.png)
+
+---
+
+## 7. Retrieve Results
+
+After successful submission, workflow outputs are written to the S3 path defined in your JSON file:
+
+```
+s3://output-data-dir/
+```
+
+---
+
+## 8. Troubleshooting
+
+Each workflow run creates a dedicated folder under:
+`s3://output-data-dir/workflow_logs/`
+
+This makes it easier to debug or track submissions.
+
+Common issues:
+
+| Error                                                       | Cause                 | Solution                                                                         |
+| ----------------------------------------------------------- | --------------------- | -------------------------------------------------------------------------------- |
+| `omics: command not found`                                  | AWS Omics CLI missing | Ensure the environment uses the **WDL on AWS HealthOmics** image.                |
+| `Could not read LandscapeFiles.bucket_path_landscape_files` | Missing JSON key      | Check that your input JSON includes this field.                                  |
+| `Permission denied`                             | Credential issue  | Contact the Manifold and STP team.       |
diff --git a/docs/cloud/Manifold/new_compute_env.png b/docs/cloud/Manifold/new_compute_env.png
diff --git a/docs/cloud/Manifold/pipelines_tab.png b/docs/cloud/Manifold/pipelines_tab.png
diff --git a/docs/cloud_env/Terra-segment.md → ...cloud/Terra/Cell_Segmentation_on_Terra.md b/docs/cloud_env/Terra-segment.md → ...cloud/Terra/Cell_Segmentation_on_Terra.md
diff --git a/docs/cloud/Terra/Tutorial_for_generating_LandscapeFiles.md b/docs/cloud/Terra/Tutorial_for_generating_LandscapeFiles.md
@@ -0,0 +1,108 @@
+# Tutorial: Generating Celldega LandscapeFiles on Terra
+
+This guide walks you through submitting and running workflows to generate **Celldega LandscapeFiles** using the **Terra** platform.
+
+---
+
+## 1. Overview
+
+The **Celldega LandscapeFiles** workflow (`stp_celldega_landscape_files`) can be launched directly from Terra via **Dockstore**.
+You’ll need a Terra workspace before proceeding.
+
+---
+
+## 2. Step-by-Step Instructions
+
+### Step 1: Create or Select a Workspace
+
+1. Log in to [Terra](https://app.terra.bio/).
+2. Create a new workspace or open an existing one where you want to run the workflow.
+
+![](workspace.png)
+
+---
+
+### Step 2: Locate the Workflow
+
+1. In your workspace, navigate to the **Workflows** tab.
+   ![](workflow.png)
+
+2. Click **Find a Workflow**.
+
+3. Select **Dockstore.org** as the source.
+   ![](find_a_workflow.png)
+
+4. Search for **stp_celldega_landscape_files**.
+
+5. Click **Launch with Terra** to import the workflow.
+
+---
+
+### Step 3: Import the Workflow
+
+1. In the **Launch with Terra** dialog, select your workspace from the **Destination Workspace** dropdown menu.
+2. Click **Import** to add the workflow to your workspace.
+3. Once imported, open the workflow by clicking on its name.
+
+![](import_a_workflow.png)
+
+---
+
+### Step 4: Configure Inputs
+
+1. Review the input fields listed for the workflow.
+2. Provide all **required inputs** and any **optional parameters** relevant to your technology (e.g., **Visium**, **Xenium**, **MERSCOPE**).
+3. *(Optional but recommended)* Set a **cost threshold** to limit compute expenses and prevent unintended overruns.
+
+![](wdl_inputs.png)
+
+---
+
+### Input Parameter Glossary
+
+| Parameter                            | Type    | Description                                                                                                                           |
+| ------------------------------------ | ------- | ------------------------------------------------------------------------------------------------------------------------------------- |
+| `bucket_path_landscape_files`        | `str`   | The Google Cloud Storage path to the **output directory** where generated LandscapeFiles will be saved (e.g., `s3://my-output-bucket/landscapefiles/`). |
+| `data_dir`                           | `str`   | The Google Cloud Storage path to the **input data directory** containing the files needed for processing (e.g., `s3://my-input-bucket/data/`).          |
+| `sample`                             | `str`   | The **sample name** or identifier for the dataset being processed.                                                                    |
+| `technology`                         | `str`   | The **spatial technology** used to generate the data (e.g., `"Visium-HD"`, `"MERSCOPE"`, `"Xenium"`).                        |
+| `bin_size` *(optional)*              | `int`   | For sST technologies like Visium-HD: the **spatial binning size** (in microns). Default: `2`.                |
+| `celldega_docker_image` *(optional)* | `str`   | The **Docker image URI** for running Celldega. Override this to use a specific image version. Default: latest maintained version.     |
+| `image_file_path` *(optional)*       | `str`   | For sST technologies like Visium-HD: Path to an **associated image file** (e.g., histology image) for visualization.                                                       |
+| `image_scale` *(optional)*           | `float` | Scaling factor applied to the input image when image and coordinate resolutions differ.                                               |
+| `jitter` *(optional)*                | `int`   | For sST technologies like Visium-HD: Adds small **random spatial offsets** to prevent overplotting of bins.                                                    |
+| `tile_size` *(optional)*             | `int`   | The **tile dimension (in pixels)** used to subdivide large data for processing.                                                       |
+| `use_dummy_clusters`                 | `bool`  | Whether to use **dummy clusters** (a single cluster labeled `0`) instead of real clustering results.                                  |
+
+
+---
+
+### Step 5: Launch the Workflow
+
+1. After saving your input configuration, click **Launch** to start the workflow.
+2. The workflow will begin executing on Terra. You can continue to monitor progress as it runs.
+
+---
+
+### Step 6: Monitor the Workflow
+
+1. Open the **Submission History** tab to monitor workflow progress.
+2. You can view logs, task statuses, and cost information in real time.
+3. Once the run is complete, the generated **Celldega LandscapeFiles** will appear in the output directory you specified.
+
+![](submission_history.png)
+
+---
+
+## 3. Outputs
+
+Upon successful completion, the workflow will produce a set of **Celldega LandscapeFiles** in your specified output bucket.
+These files can be used for downstream visualization, data exploration, etc.
+
+---
+
+## 4. Tips and Best Practices
+
+* Verify that all input file paths are correct before launching a run.
+* Use Terra’s built-in cost monitoring tools to stay within your compute budget.
+* If a workflow fails, review the **Job Manager Logs** under the **Submission History** tab for detailed error messages.
diff --git a/docs/cloud/Terra/find_a_workflow.png b/docs/cloud/Terra/find_a_workflow.png
diff --git a/docs/cloud/Terra/import_a_workflow.png b/docs/cloud/Terra/import_a_workflow.png
diff --git a/docs/cloud/Terra/submission_history.png b/docs/cloud/Terra/submission_history.png
diff --git a/docs/cloud/Terra/wdl_inputs.png b/docs/cloud/Terra/wdl_inputs.png
diff --git a/docs/cloud/Terra/workflow.png b/docs/cloud/Terra/workflow.png
diff --git a/docs/cloud/Terra/workspace.png b/docs/cloud/Terra/workspace.png
diff --git a/docs/cloud/index.md b/docs/cloud/index.md
@@ -0,0 +1,4 @@
+# Tutorials for Generating LandscapeFiles
+[Manifold](Manifold/Tutorial_for_generating_LandscapeFiles.md)
+
+[Terra](Terra/Tutorial_for_generating_LandscapeFiles.md)
diff --git a/mkdocs.yaml b/mkdocs.yaml
@@ -34,9 +34,13 @@ nav:
       - API: javascript/api.md
   - Technologies:
       - technologies/index.md
-  - Cloud Environments:
-    - Terra.bio:
-        - cloud_env/Terra-segment.md
+  - Cloud:
+    - cloud/index.md
+    - Manifold:
+        - Tutorial for generating LandscapeFiles: cloud/Manifold/Tutorial_for_generating_LandscapeFiles.md
+    - Terra:
+        - Tutorial for generating LandscapeFiles: cloud/Terra/Tutorial_for_generating_LandscapeFiles.md
+        - Cell Segmentation on Terra: cloud/Terra/Cell_Segmentation_on_Terra.md
   - Gallery:
       - gallery/index.md
       - Xenium:

diff --git a/tests/unit/test_pre/test_sbg_tile.py b/tests/unit/test_pre/test_sbg_tile.py
@@ -1,13 +1,14 @@
 import importlib.util
+from pathlib import Path
 import sys
 import types
-from pathlib import Path
 
 import numpy as np
 import pandas as pd
 import pytest
 from scipy.sparse import csr_matrix
 
+
 ROOT_DIR = Path(__file__).resolve().parents[3]
 PRE_ROOT = ROOT_DIR / "src" / "celldega" / "pre"