Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 164 additions & 0 deletions docs/cloud/Manifold/Tutorial_for_generating_LandscapeFiles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Tutorial: Generating Celldega LandscapeFiles on Manifold

This guide explains how to submit workflows for generating **Celldega LandscapeFiles** on **Manifold**, using a Jupyter Notebook environment.

---

## 1. Overview

The `run_landscape_wdl.sh` script automates launching the **LandscapeFiles WDL workflow** via **AWS HealthOmics**.
It performs the following steps:

1. Validates inputs
2. Clones the [official workflow repository](https://github.com/broadinstitute/stp_celldega_landscape_files)
3. Submits the WDL pipeline to AWS HealthOmics using a user-provided `.json` input file

The input file (e.g., `celldega_inputs.json`) defines workflow parameters such as dataset location, output bucket, and the technology.

---

## 2. Environment and File Setup

### AWS HealthOmics Environment

Create an environment using the **“WDL on AWS HealthOmics”** base image.
This image includes the dependencies required to submit WDL workflows to AWS HealthOmics directly.

![](new_compute_env.png)

---

### The Shell Script

`run_landscape_wdl.sh` handles workflow setup and execution automatically.
You can use relative or absolute paths to reference the script in your notebook.

> **Note:** The `run_landscape_wdl.sh` script is available on the [official workflow repository](https://github.com/broadinstitute/stp_celldega_landscape_files).
> Once deployed, you can reference it directly from your environment or the shared Manifold workbench.

---

### Example Input JSON

You can either create the input JSON manually or generate it programmatically.
Here’s a quick example (`celldega_inputs.json`):

```python
import json

inputs = {
"LandscapeFiles.sample": "dataset-name",
"LandscapeFiles.data_dir": "s3://project/data/",
"LandscapeFiles.bucket_path_landscape_files": "s3://project/landscape_files/",
"LandscapeFiles.technology": "technology-name",
"LandscapeFiles.bin_size": 2,
"LandscapeFiles.tile_size": 500,
"LandscapeFiles.use_dummy_clusters": False
}

with open("celldega_inputs.json", "w") as f:
json.dump(inputs, f, indent=2)
```

This file specifies:

* **Which sample** to process
* **Where to find** the instrument data
* **Where to save** the output LandscapeFiles
* **Which spatial technology** is used (e.g., **Xenium**, **Visium-HD**, **MERSCOPE**)

---

### Input Parameter Glossary

| Parameter | Type | Description |
| ------------------------------------ | ------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `bucket_path_landscape_files` | `str` | AWS S3 path to the **output directory** where generated LandscapeFiles will be saved (e.g., `s3://my-output-bucket/landscapefiles/`). |
| `data_dir` | `str` | AWS S3 path to the **input data directory** containing the files needed for processing (e.g., `s3://my-input-bucket/data/`). |
| `sample` | `str` | The **sample name** or identifier for the dataset being processed. |
| `technology` | `str` | The **spatial technology** used to generate the data (e.g., `"Visium-HD"`, `"MERSCOPE"`, `"Xenium"`). |
| `bin_size` *(optional)* | `int` | For sST technologies like Visium-HD: the **spatial binning size** (in microns). Default: `2`. |
| `celldega_docker_image` *(optional)* | `str` | The **Docker image URI** for running Celldega. Override this to use a specific image version. Default: latest maintained version. |
| `image_file_path` *(optional)* | `str` | For sST technologies like Visium-HD: Path to an **associated image file** (e.g., histology image) for visualization. |
| `image_scale` *(optional)* | `float` | Scaling factor applied to the input image when image and coordinate resolutions differ. |
| `jitter` *(optional)* | `int` | For sST technologies like Visium-HD: Adds small **random spatial offsets** to prevent overplotting of bins. |
| `tile_size` *(optional)* | `int` | The **tile dimension (in pixels)** used to subdivide large data for processing. |
| `use_dummy_clusters` | `bool` | Whether to use **dummy clusters** (a single cluster labeled `0`) instead of real clustering results. |


---

## 3. Grant Execution Permissions

In your Jupyter notebook, make the script executable:

```bash
!chmod +x run_landscape_wdl.sh
```

---

## 4. View Script Usage (Optional)

To display the script’s usage instructions:

```bash
!./run_landscape_wdl.sh --help
```

Output:

```
Usage: ./run_landscape_wdl.sh --input <input_json>
Example: ./run_landscape_wdl.sh --input celldega_inputs.json
```

---

## 5. Submit Independent Workflow Runs

You can run one or multiple input files independently.
Each submission is self-contained and processes a separate dataset.

Example:

```bash
!./run_landscape_wdl.sh --input celldega_inputs_technology-1.json
!./run_landscape_wdl.sh --input celldega_inputs_technology-2.json
!./run_landscape_wdl.sh --input celldega_inputs_technology-3.json
```

---

## 6. Monitoring Workflows

Monitor workflow progress and job status under the **Pipelines** tab in Manifold.

![](pipelines_tab.png)

---

## 7. Retrieve Results

After successful submission, workflow outputs are written to the S3 path defined in your JSON file:

```
s3://output-data-dir/
```

---

## 8. Troubleshooting

Each workflow run creates a dedicated folder under:
`s3://output-data-dir/workflow_logs/`

This makes it easier to debug or track submissions.

Common issues:

| Error | Cause | Solution |
| ----------------------------------------------------------- | --------------------- | -------------------------------------------------------------------------------- |
| `omics: command not found` | AWS Omics CLI missing | Ensure the environment uses the **WDL on AWS HealthOmics** image. |
| `Could not read LandscapeFiles.bucket_path_landscape_files` | Missing JSON key | Check that your input JSON includes this field. |
| `Permission denied` | Credential issue | Contact the Manifold and STP team. |
Binary file added docs/cloud/Manifold/new_compute_env.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/cloud/Manifold/pipelines_tab.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
108 changes: 108 additions & 0 deletions docs/cloud/Terra/Tutorial_for_generating_LandscapeFiles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Tutorial: Generating Celldega LandscapeFiles on Terra

This guide walks you through submitting and running workflows to generate **Celldega LandscapeFiles** using the **Terra** platform.

---

## 1. Overview

The **Celldega LandscapeFiles** workflow (`stp_celldega_landscape_files`) can be launched directly from Terra via **Dockstore**.
You’ll need a Terra workspace before proceeding.

---

## 2. Step-by-Step Instructions

### Step 1: Create or Select a Workspace

1. Log in to [Terra](https://app.terra.bio/).
2. Create a new workspace or open an existing one where you want to run the workflow.

![](workspace.png)

---

### Step 2: Locate the Workflow

1. In your workspace, navigate to the **Workflows** tab.
![](workflow.png)

2. Click **Find a Workflow**.

3. Select **Dockstore.org** as the source.
![](find_a_workflow.png)

4. Search for **stp_celldega_landscape_files**.

5. Click **Launch with Terra** to import the workflow.

---

### Step 3: Import the Workflow

1. In the **Launch with Terra** dialog, select your workspace from the **Destination Workspace** dropdown menu.
2. Click **Import** to add the workflow to your workspace.
3. Once imported, open the workflow by clicking on its name.

![](import_a_workflow.png)

---

### Step 4: Configure Inputs

1. Review the input fields listed for the workflow.
2. Provide all **required inputs** and any **optional parameters** relevant to your technology (e.g., **Visium**, **Xenium**, **MERSCOPE**).
3. *(Optional but recommended)* Set a **cost threshold** to limit compute expenses and prevent unintended overruns.

![](wdl_inputs.png)

---

### Input Parameter Glossary

| Parameter | Type | Description |
| ------------------------------------ | ------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `bucket_path_landscape_files` | `str` | The Google Cloud Storage path to the **output directory** where generated LandscapeFiles will be saved (e.g., `s3://my-output-bucket/landscapefiles/`). |
| `data_dir` | `str` | The Google Cloud Storage path to the **input data directory** containing the files needed for processing (e.g., `s3://my-input-bucket/data/`). |
| `sample` | `str` | The **sample name** or identifier for the dataset being processed. |
| `technology` | `str` | The **spatial technology** used to generate the data (e.g., `"Visium-HD"`, `"MERSCOPE"`, `"Xenium"`). |
| `bin_size` *(optional)* | `int` | For sST technologies like Visium-HD: the **spatial binning size** (in microns). Default: `2`. |
| `celldega_docker_image` *(optional)* | `str` | The **Docker image URI** for running Celldega. Override this to use a specific image version. Default: latest maintained version. |
| `image_file_path` *(optional)* | `str` | For sST technologies like Visium-HD: Path to an **associated image file** (e.g., histology image) for visualization. |
| `image_scale` *(optional)* | `float` | Scaling factor applied to the input image when image and coordinate resolutions differ. |
| `jitter` *(optional)* | `int` | For sST technologies like Visium-HD: Adds small **random spatial offsets** to prevent overplotting of bins. |
| `tile_size` *(optional)* | `int` | The **tile dimension (in pixels)** used to subdivide large data for processing. |
| `use_dummy_clusters` | `bool` | Whether to use **dummy clusters** (a single cluster labeled `0`) instead of real clustering results. |


---

### Step 5: Launch the Workflow

1. After saving your input configuration, click **Launch** to start the workflow.
2. The workflow will begin executing on Terra. You can continue to monitor progress as it runs.

---

### Step 6: Monitor the Workflow

1. Open the **Submission History** tab to monitor workflow progress.
2. You can view logs, task statuses, and cost information in real time.
3. Once the run is complete, the generated **Celldega LandscapeFiles** will appear in the output directory you specified.

![](submission_history.png)

---

## 3. Outputs

Upon successful completion, the workflow will produce a set of **Celldega LandscapeFiles** in your specified output bucket.
These files can be used for downstream visualization, data exploration, etc.

---

## 4. Tips and Best Practices

* Verify that all input file paths are correct before launching a run.
* Use Terra’s built-in cost monitoring tools to stay within your compute budget.
* If a workflow fails, review the **Job Manager Logs** under the **Submission History** tab for detailed error messages.
Binary file added docs/cloud/Terra/find_a_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/cloud/Terra/import_a_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/cloud/Terra/submission_history.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/cloud/Terra/wdl_inputs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/cloud/Terra/workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/cloud/Terra/workspace.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/cloud/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Tutorials for Generating LandscapeFiles
[Manifold](Manifold/Tutorial_for_generating_LandscapeFiles.md)

[Terra](Terra/Tutorial_for_generating_LandscapeFiles.md)
10 changes: 7 additions & 3 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,13 @@ nav:
- API: javascript/api.md
- Technologies:
- technologies/index.md
- Cloud Environments:
- Terra.bio:
- cloud_env/Terra-segment.md
- Cloud:
- cloud/index.md
- Manifold:
- Tutorial for generating LandscapeFiles: cloud/Manifold/Tutorial_for_generating_LandscapeFiles.md
- Terra:
- Tutorial for generating LandscapeFiles: cloud/Terra/Tutorial_for_generating_LandscapeFiles.md
- Cell Segmentation on Terra: cloud/Terra/Cell_Segmentation_on_Terra.md
- Gallery:
- gallery/index.md
- Xenium:
Expand Down
3 changes: 2 additions & 1 deletion tests/unit/test_pre/test_sbg_tile.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
import importlib.util
from pathlib import Path
import sys
import types
from pathlib import Path

import numpy as np
import pandas as pd
import pytest
from scipy.sparse import csr_matrix


ROOT_DIR = Path(__file__).resolve().parents[3]
PRE_ROOT = ROOT_DIR / "src" / "celldega" / "pre"

Expand Down
Loading