Skip to content

Commit 6729bf6

Browse files
lazappircannood
andauthored
Update process results workflow for new website (#919)
* Add combine_output reporting component * Update process_task_results to combine outputs * Update get_task_info to new schema * Update get_method_info to new schema * Update get_metric_info to new schema * Update get_dataset_info to new schema * Update get_results to new schema Need to add metric execution info * Add metric resources to results component/schema * Update generate_qc to match new schema Add some new QC checks * Update combine_output to new schema * Update viash version and reference * Update process_results workflow to new components * Add render_report component * Add render_report to process results workflow * Handle missing values in generate_qc() This can happen when a method is disabled/skipped so there are no results to check * Handle missing controls in results report Skip scaling/results when there are no controls or give a warning when some controls failed * update common submodule * Strip quotes from descriptions/summaries * Add roles to author details * Add QC check for number of successful controls * Handle missing exit codes in report * Add schema validation to process_results workflow * Fix combine_output image version * Handle alternative field names in get_dataset_info * Handle v1 slots in get_method_info * Handle null author fields in report * Add missing information in control QC checks * Handle old doc URL location in get_method_info * Prefix component additional info in get_metric_info * Cleanup removed argument in get_results * Fix test script for generate_qc * Add authors to datasets, methods, metrics * schemas were moved to the common_resources repo * fix schema paths * set common submodule to different branch for testing * Fix resource * fix schema paths in the script * authors and references were moved into core * add a params placeholder for ease of use * show number of passed checks as well * fix result schema path * Add bibliography file * Add shared util functions * Use shared functions for authors and references * update submodule (#934) * Add scripts/create_resources/task_results_v4 * Update main reference Co-authored-by: Robrecht Cannoodt <[email protected]> * Use temporary directory in render-report * Style reporting R scripts * add auto wf * add script to reprocess task results * Handle missing scaled scores in generate_qc * Set unknown error in get_results * fix script * Handle missing fields in old task info * Handle missing additional info in authors field * Fix typo in get_references_list() * Handle missing summary/label in get_task_info * Handle minimal dataset info in old results * Handle missing file size in get_dataset_info * Handle empty string in get_references_list() * Handle method info stored in functionality field * Move get_additional_info() to shared functions * Handle missing maximize in get_metric_info * Handle missing metric values in get_results * Properly handle workflow component in get_results * Handle metrics stored in functionality field * Give better error when dataset IDs don't map * Remove duplicate datasets in get_dataset_info * Handle infinite values in generate_qc * Add check that any valid scores are found * Adjust dataset process mapping * Handle source_urls in render_report * Fix additional info in get_method_info * Handle missing file size/date in report * Use regex to match DOI references * Fix empty scores check in report * Fix DOI regex * Handle DOIs without text citations * Warn about missing values for succeeded metrics * Improve controls check in report * update submodule * Add results filtering (#935) * introduce new component * implement component * format code * Update src/reporting/process_task_results/main.nf * Apply suggestions from code review Co-authored-by: Luke Zappia <[email protected]> * add back previously removed arguments --------- Co-authored-by: Luke Zappia <[email protected]> --------- Co-authored-by: Robrecht Cannoodt <[email protected]>
1 parent 7687b7b commit 6729bf6

File tree

30 files changed

+7269
-1021
lines changed

30 files changed

+7269
-1021
lines changed

_viash.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ keywords: [openproblems, benchmarking, single-cell omics]
1111
references:
1212
doi:
1313
# Malte Luecken, Scott Gigante, Daniel Burkhardt, Robrecht Cannoodt, et al.
14-
# Defining and benchmarking open problems in single-cell analysis,
15-
# 03 April 2024, PREPRINT (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-4181617/v1]
16-
- 10.21203/rs.3.rs-4181617/v1
14+
# Defining and benchmarking open problems in single-cell analysis.
15+
# Nat Biotechnol 43, 1035–1040 (2025).
16+
- 10.1038/s41587-025-02694-w
1717

1818
links:
1919
issue_tracker: https://github.com/openproblems-bio/openproblems/issues
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
#!/bin/bash
2+
3+
# get the root of the directory
4+
REPO_ROOT=$(git rev-parse --show-toplevel)
5+
6+
# ensure that the command below is run from the root of the repository
7+
cd "$REPO_ROOT"
8+
9+
set -e
10+
11+
OUT_DIR="resources"
12+
13+
echo ">>> Fetching raw results..."
14+
aws s3 sync --profile op \
15+
s3://openproblems-data/resources/ \
16+
"$OUT_DIR/" \
17+
--exclude "*" \
18+
--include "**/results/run_*/*" \
19+
--delete
20+
21+
echo ">>> Patch state.yaml files..."
22+
# fix state.yaml id and output_trace
23+
python <<HERE
24+
import os
25+
import re
26+
import glob
27+
28+
def update_state_file(file_path, new_id):
29+
with open(file_path, 'r') as file:
30+
content = file.read()
31+
32+
# if output_trace is missing, add it
33+
if 'output_trace:' not in content:
34+
content += "\noutput_trace: !file trace.txt\n"
35+
36+
# replace the id with the value of the glob ** pattern
37+
content = re.sub(r'id: .+', f'id: {new_id}/processed', content)
38+
39+
with open(file_path, 'w') as file:
40+
file.write(content)
41+
42+
# find all state.yaml files
43+
state_files = glob.glob('resources/**/state.yaml', recursive=True)
44+
for state_file in state_files:
45+
# extract the id from the path
46+
match = re.search(r'resources/(.+?)/state\.yaml', state_file)
47+
if match:
48+
new_id = match.group(1)
49+
update_state_file(state_file, new_id)
50+
print(f"Updated {state_file} with id: {new_id}")
51+
else:
52+
print(f"Could not extract id from {state_file}, skipping.")
53+
HERE
54+
55+
echo ">>> Creating params.yaml..."
56+
cat > /tmp/params.yaml << HERE
57+
input_states: resources/*/results/run_*/state.yaml
58+
rename_keys: 'input_task_info:output_task_info;input_dataset_info:output_dataset_info;input_method_configs:output_method_configs;input_metric_configs:output_metric_configs;input_scores:output_scores;input_trace:output_trace'
59+
output_state: '\$id/state.yaml'
60+
settings: '{"output_combined": "\$id/output_combined.json", "output_report": "\$id/output_report.html", "output_task_info": "\$id/output_task_info.json", "output_dataset_info": "\$id/output_dataset_info.json", "output_method_info": "\$id/output_method_info.json", "output_metric_info": "\$id/output_metric_info.json", "output_results": "\$id/output_results.json", "output_scores": "\$id/output_quality_control.json"}'
61+
publish_dir: "$OUT_DIR"
62+
HERE
63+
64+
echo ">>> Processing results..."
65+
nextflow run target/nextflow/reporting/process_task_results/main.nf \
66+
-profile docker \
67+
-params-file /tmp/params.yaml \
68+
-c common/nextflow_helpers/labels_ci.config \
69+
-entry auto \
70+
-resume
71+
72+
# find all files in $OUT with the pattern output_report.html
73+
echo ">>> List reports..."
74+
find "$OUT_DIR" -name "output_report.html"
75+
76+
# echo ">>> Uploading processed results to S3..."
77+
# aws s3 sync --profile op \
78+
# "resources_test/openproblems/task_results_v4/" \
79+
# "s3://openproblems-data/resources_test/openproblems/task_results_v4/" \
80+
# --delete --dryrun
81+
82+
# echo
83+
# echo ">>> Done!"
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
#!/bin/bash
2+
3+
# get the root of the directory
4+
REPO_ROOT=$(git rev-parse --show-toplevel)
5+
6+
# ensure that the command below is run from the root of the repository
7+
cd "$REPO_ROOT"
8+
9+
set -e
10+
11+
OUT_DIR="resources_test/openproblems/task_results_v4"
12+
13+
echo ">>> Fetching raw results..."
14+
aws s3 sync --profile op \
15+
s3://openproblems-data/resources/task_batch_integration/results/run_2025-01-23_18-03-16/ \
16+
"$OUT_DIR/raw/" \
17+
--delete
18+
19+
echo
20+
echo ">>> Processing results..."
21+
if [ -d "$OUT_DIR/processed" ]; then rm -Rf $OUT_DIR/processed; fi
22+
nextflow run target/nextflow/reporting/process_task_results/main.nf \
23+
-profile docker \
24+
--input_task_info $OUT_DIR/raw/task_info.yaml \
25+
--input_dataset_info $OUT_DIR/raw/dataset_uns.yaml \
26+
--input_method_configs $OUT_DIR/raw/method_configs.yaml \
27+
--input_metric_configs $OUT_DIR/raw/metric_configs.yaml \
28+
--input_scores $OUT_DIR/raw/score_uns.yaml \
29+
--input_trace $OUT_DIR/raw/trace.txt \
30+
--output_state state.yaml \
31+
--publishDir $OUT_DIR/processed
32+
33+
echo ">>> Uploading processed results to S3..."
34+
aws s3 sync --profile op \
35+
"resources_test/openproblems/task_results_v4/" \
36+
"s3://openproblems-data/resources_test/openproblems/task_results_v4/" \
37+
--delete --dryrun
38+
39+
echo
40+
echo ">>> Done!"
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
name: combine_output
2+
namespace: reporting
3+
description: Combine task outputs into a single JSON
4+
5+
argument_groups:
6+
- name: Inputs
7+
arguments:
8+
- name: --input_task_info
9+
type: file
10+
description: Task info file
11+
info:
12+
format:
13+
type: json
14+
schema: /common/schemas/results_v4/task_info.json
15+
required: true
16+
example: resources_test/openproblems/task_results_v4/processed/task_info.json
17+
- name: --input_dataset_info
18+
type: file
19+
description: Dataset info file
20+
info:
21+
format:
22+
type: json
23+
schema: /common/schemas/results_v4/dataset_info.json
24+
required: true
25+
example: resources_test/openproblems/task_results_v4/processed/dataset_info.json
26+
- name: --input_method_info
27+
type: file
28+
description: Method info file
29+
info:
30+
format:
31+
type: json
32+
schema: /common/schemas/results_v4/method_info.json
33+
required: true
34+
example: resources_test/openproblems/task_results_v4/processed/method_info.json
35+
- name: --input_metric_info
36+
type: file
37+
description: Metric info file
38+
info:
39+
format:
40+
type: json
41+
schema: /common/schemas/results_v4/metric_info.json
42+
required: true
43+
example: resources_test/openproblems/task_results_v4/processed/metric_info.json
44+
- name: --input_results
45+
type: file
46+
description: Results file
47+
info:
48+
format:
49+
type: json
50+
schema: /common/schemas/results_v4/results.json
51+
required: true
52+
example: resources_test/openproblems/task_results_v4/processed/results.json
53+
- name: --input_quality_control
54+
type: file
55+
description: Quality control file
56+
info:
57+
format:
58+
type: json
59+
schema: /common/schemas/results_v4/quality_control.json
60+
required: true
61+
example: resources_test/openproblems/task_results_v4/processed/quality_control.json
62+
63+
- name: Outputs
64+
arguments:
65+
- name: --output
66+
type: file
67+
direction: output
68+
description: Combined output JSON
69+
default: combined_output.json
70+
info:
71+
format:
72+
type: json
73+
schema: /common/schemas/results_v4/combined_output.json
74+
75+
resources:
76+
- type: r_script
77+
path: script.R
78+
- path: /common/schemas
79+
dest: schemas
80+
81+
test_resources:
82+
- type: python_script
83+
path: /common/component_tests/run_and_check_output.py
84+
- path: /resources_test/openproblems/task_results_v4
85+
dest: resources_test/openproblems/task_results_v4
86+
87+
engines:
88+
- type: docker
89+
image: openproblems/base_r:1
90+
setup:
91+
- type: apt
92+
packages:
93+
- nodejs
94+
- npm
95+
- type: docker
96+
run: npm install -g ajv-cli
97+
98+
runners:
99+
- type: executable
100+
- type: nextflow
101+
directives:
102+
label: [lowmem, lowtime, lowcpu]
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
## VIASH START
2+
processed_dir <- "resources_test/openproblems/task_results_v4/processed"
3+
4+
par <- list(
5+
# Inputs
6+
input_task_info = paste0(processed_dir, "/task_info.json"),
7+
input_quality_control = paste0(processed_dir, "/quality_control.json"),
8+
input_metric_info = paste0(processed_dir, "/metric_info.json"),
9+
input_method_info = paste0(processed_dir, "/method_info.json"),
10+
input_dataset_info = paste0(processed_dir, "/dataset_info.json"),
11+
input_results = paste0(processed_dir, "/results.json"),
12+
# Outputs
13+
output = "task_results.json"
14+
)
15+
## VIASH END
16+
17+
################################################################################
18+
# MAIN SCRIPT
19+
################################################################################
20+
21+
cat("====== Combine output ======\n")
22+
23+
cat("\n>>> Reading input files...\n")
24+
cat("Reading task info from '", par$input_task_info, "'...\n", sep = "")
25+
task_info <- jsonlite::read_json(par$input_task_info)
26+
27+
cat(
28+
"Reading quality control from '",
29+
par$input_quality_control,
30+
"'...\n",
31+
sep = ""
32+
)
33+
quality_control <- jsonlite::read_json(par$input_quality_control)
34+
35+
cat("Reading metric info from '", par$input_metric_info, "'...\n", sep = "")
36+
metric_info <- jsonlite::read_json(par$input_metric_info)
37+
38+
cat("Reading method info from '", par$input_method_info, "'...\n", sep = "")
39+
method_info <- jsonlite::read_json(par$input_method_info)
40+
41+
cat("Reading dataset info from '", par$input_dataset_info, "'...\n", sep = "")
42+
dataset_info <- jsonlite::read_json(par$input_dataset_info)
43+
44+
cat("Reading results from '", par$input_results, "'...\n", sep = "")
45+
results <- jsonlite::read_json(par$input_results)
46+
47+
cat("\n>>> Combining outputs...\n")
48+
# Create combined output according to task_results.json
49+
combined_output <- list(
50+
task_info = task_info,
51+
dataset_info = dataset_info,
52+
method_info = method_info,
53+
metric_info = metric_info,
54+
results = results,
55+
quality_control = quality_control
56+
)
57+
58+
cat("\n>>> Writing output file...\n")
59+
cat("Writing combined output to '", par$output, "'...\n", sep = "")
60+
jsonlite::write_json(
61+
combined_output,
62+
par$output,
63+
pretty = TRUE,
64+
null = "null",
65+
na = "null",
66+
auto_unbox = TRUE
67+
)
68+
69+
cat("\n>>> Validating output against schema...\n")
70+
results_schemas <- file.path(meta$resources_dir, "schemas", "results_v4")
71+
ajv_args <- paste(
72+
"validate",
73+
"--spec draft2020",
74+
"-s",
75+
file.path(results_schemas, "combined_output.json"),
76+
"-r",
77+
file.path(results_schemas, "task_info.json"),
78+
"-r",
79+
file.path(results_schemas, "dataset_info.json"),
80+
"-r",
81+
file.path(results_schemas, "method_info.json"),
82+
"-r",
83+
file.path(results_schemas, "metric_info.json"),
84+
"-r",
85+
file.path(results_schemas, "results.json"),
86+
"-r",
87+
file.path(results_schemas, "quality_control.json"),
88+
"-r",
89+
file.path(results_schemas, "core.json"),
90+
"-d",
91+
par$output
92+
)
93+
94+
cat("Running validation command:", "ajv", ajv_args, "\n")
95+
cat("Output:\n")
96+
validation_result <- system2("ajv", ajv_args)
97+
98+
if (validation_result == 0) {
99+
cat("JSON validation passed successfully!\n")
100+
} else {
101+
cat("JSON validation failed!\n")
102+
stop("Output JSON does not conform to schema")
103+
}
104+
105+
cat("\n>>> Done!\n")

0 commit comments

Comments
 (0)