Flowcept can generate summarized reports from provenance records.
Current report implementations:
report_type="provenance_card"withformat="markdown"(default)report_type="provenance_report"withformat="pdf"(executive PDF with plots)
Use:
from flowcept import Flowcept
# Default path: markdown provenance card
Flowcept.generate_report(
report_type="provenance_card",
format="markdown",
output_path="PROVENANCE_CARD.md",
records=my_records, # or input_jsonl_path=..., or workflow_id/campaign_id
)Markdown provenance cards are the default reporting mode.
from flowcept import Flowcept
# 1) Generate from workflow_id (DB-backed mode)
Flowcept.generate_report(
report_type="provenance_card",
format="markdown",
workflow_id="20c5939f-f3ee-4031-9303-a9e68a5a8092",
output_path="PROVENANCE_CARD.md",
)
# 2) Generate from in-memory records
Flowcept.generate_report(
report_type="provenance_card",
format="markdown",
records=my_records,
output_path="PROVENANCE_CARD_FROM_RECORDS.md",
)
# 3) Generate from Flowcept JSONL buffer
Flowcept.generate_report(
report_type="provenance_card",
format="markdown",
input_jsonl_path="/tmp/flowcept_buffer.jsonl",
output_path="PROVENANCE_CARD_FROM_JSONL.md",
)You can optionally print the generated markdown report in a rich terminal:
from flowcept import Flowcept
Flowcept.generate_report(
report_type="provenance_card",
format="markdown",
records=my_records,
output_path="PROVENANCE_CARD.md",
print_markdown=True,
)If Rich is not installed and print_markdown=True, Flowcept raises an error.
Install Rich via:
pip install flowcept["extras"]Exactly one input mode must be provided:
input_jsonl_path: read from a Flowcept JSONL buffer file.records: list of dictionaries already loaded in memory.workflow_idorcampaign_id: query workflow, task, and object documents from DB.
The provenance card is summarized, not raw-dump oriented.
- Grouping key:
activity_id. - Per-group summary includes:
- number of task records aggregated (
n_tasks) - status counts - timing aggregates (median/summary fields)
This aggregation method is written in generated output under Aggregation Method.
When objects are present, reports include metadata-only summaries:
- counts by type
- counts by storage mode (
in_objectvsgridfs) - linkage counts (task/workflow-linked)
- object version and size summaries
Blob payload bytes are excluded from report rendering.
Below is a real example equivalent to generated markdown content for:
Workflow Provenance Card: Perceptron GridSearch.
- Workflow Name:
Perceptron GridSearch - Workflow ID:
20c5939f-f3ee-4031-9303-a9e68a5a8092 - Campaign ID:
661344de-ddf4-497d-a5ba-0d01c67cfb79 - Execution Start (UTC):
2026-02-19 05:05:10 - Execution End (UTC):
2026-02-19 05:05:12 - Total Elapsed (s):
1.501 - User:
rsr - System Name:
Darwin - Environment ID:
laptop - Workflow Subtype:
ml_workflow - Code Repository:
branch=skills, short_sha=f3df676, dirty=dirty - Git Remote:
git@github.com:ORNL/flowcept.git - Workflow args:
python_random_seeded:Trueseed:42torch_cuda_manual_seeded:Falsetorch_cudnn_benchmark:Falsetorch_cudnn_deterministic:Truetorch_deterministic_algorithms:Truetorch_manual_seeded:True
- Total Activities:
3 - Status Counts:
{'FINISHED': 7} - Total Elapsed Workflow Time (s):
1.501train_and_validate:0.088 sget_dataset:0.056 sselect_best_model:0.041 s
- Resource Totals:
Memory Used:7.78 MBAverage CPU (%):54.1%- IO:
Read:38.49 MBWrite:55.11 MBRead Ops:1,454Write Ops:155
- Key Observations:
- Slowest Activity:
train_and_validateat0.088 s - Largest IO Activity:
train_and_validatewith Read31.74 MBand Write52.10 MB
- Slowest Activity:
input data
│
▼
get_dataset
│
train_and_validate
│
select_best_model
▼
output data
Rows are sorted by First Started At (ascending).
| Activity | Status Counts | First Started At | Last Ended At | Median Elapsed (s) |
|---|---|---|---|---|
| get_dataset | {'FINISHED': 1} | 2026-02-19 05:05:10 | 2026-02-19 05:05:10 | 0.056 |
| train_and_validate | {'FINISHED': 5} | 2026-02-19 05:05:10 | 2026-02-19 05:05:12 | 0.088 |
| select_best_model | {'FINISHED': 1} | 2026-02-19 05:05:12 | 2026-02-19 05:05:12 | 0.041 |
- get_dataset (subtype=``dataprep``)
- Used:
n_samples:120split_ratio:0.8
- Generated:
dataset_id:f1e918cc-a3eb-4dd8-8036-5f6e4fc140d1x_train_shape:[96, 2]x_val_shape:[24, 2]y_train_shape:[96, 1]y_val_shape:[24, 1]
- Used:
- train_and_validate (
n=5, subtype=``learning``)- Used (aggregated): includes
epochs,learning_rate,n_input_neurons,config_id, and other fields. - Generated (aggregated): includes
best_val_loss,val_loss,val_accuracy, and model object ids.
- Used (aggregated): includes
- select_best_model (subtype=``model_selection``)
- Generated:
selected_config_id:cfg_5selected_loss:0.0490574836730957selected_model_object_id:ae18a739-1ffe-45a8-ae64-827a079579a6
- Generated:
| Metric | Value |
|---|---|
| Telemetry Samples (task start/end pairs) | 7 |
| CPU User Time Delta | 7.380 |
| CPU System Time Delta | 1.940 |
| Average CPU (%) Delta | 54.1% |
| Average CPU Frequency | 3,228 |
| Memory Used Delta | 7.78 MB |
| Average Memory (%) | 73.7% |
| Average Swap (%) | 90.0% |
| Disk Read Time Delta (ms) | 224.000 |
| Disk Write Time Delta (ms) | 14.000 |
| Disk Busy Time Delta (ms) | 0.000 |
| Metric | Value |
|---|---|
| Total Objects | 6 |
| By Type | {'dataset': 1, 'ml_model': 5} |
| By Storage | {'in_object': 1, 'gridfs': 5} |
| Task-linked Objects | 6 |
| Workflow-linked Objects | 6 |
| Max Version | 7 |
| Total Size | 13.66 KB |
| Average Size | 2.28 KB |
| Max Size | 4.10 KB |
- Datasets
f1e918cc-a3eb-4dd8-8036-5f6e4fc140d1- version:
0 - storage:
in_object - size:
4.10 KB - task_id:
1771477510.9383209 - workflow_id:
20c5939f-f3ee-4031-9303-a9e68a5a8092 - timestamp:
2026-02-19 05:05:10 - sha256:
7d7b4be35ea11f66e9a785d1b39cfb8fc31f8fd23020bc74918872ab5855253c
- version:
- Models
ae18a739-1ffe-45a8-ae64-827a079579a6- version:
7 - storage:
gridfs - size:
1.91 KB - tags:
best - custom_metadata includes
checkpoint_epoch,class,config_id,learning_rate,loss, andmodel_profile.
- version:
- Grouping key:
activity_id. - Each grouped row may aggregate multiple task records (
n_tasks). - Aggregated metrics currently include count/status/timing.
Generator footer example:
- Provenance card generated by Flowcept | GitHub | Version: 0.9.14 on Feb 19, 2026 at 12:05 AM EST
PDF reports are intended for executive-friendly rendering and include plots.
pip install flowcept[report_pdf]from flowcept import Flowcept
# 1) Generate PDF from workflow_id (DB-backed mode)
stats = Flowcept.generate_report(
report_type="provenance_report",
format="pdf",
workflow_id="5def1173-d417-420b-a7ed-61ada01772cd",
output_path="PROVENANCE_REPORT.pdf",
)
print(stats["output"])
# 2) Generate PDF from in-memory records
Flowcept.generate_report(
report_type="provenance_report",
format="pdf",
records=my_records,
output_path="PROVENANCE_REPORT_FROM_RECORDS.pdf",
)
# 3) Generate PDF from a Flowcept JSONL file
Flowcept.generate_report(
report_type="provenance_report",
format="pdf",
input_jsonl_path="/tmp/flowcept_buffer.jsonl",
output_path="PROVENANCE_REPORT_FROM_JSONL.pdf",
)PDF report plots include:
- Top slowest activities
- Top fastest activities
- Most resource-demanding activities (IO)
- Telemetry-aware charts when telemetry fields are available