This repo contains a collection of utility scripts for installing and deploying HTCondor and Pegasus on systems running LLNL's Flux resource manager.
Warning
This repo is NOT meant to be used for system-wide or multi-user deployments of HTCondor or Pegasus. This repo is only meant for single-user deployments in which all HTCondor daemons are launched on and managed from a single node. If you are trying to deploy HTCondor and Pegasus across an entire Flux-scheduled system, refer to the official HTCondor and Pegasus documentation. However, you can still refer to the section on Running Pegasus Workflows Under Flux for information about how to configure Pegasus workflows to use Flux for job scheduling and management.
To install HTCondor and Pegasus with Flux support, users can run the install.sh script. This script
will download HTCondor and Pegasus and install both of them in the same directory. The table below summarizes
all the options that can be provided to install.sh:
| Flag | Requires Value? | Flag Required? | Default Value | Description |
|---|---|---|---|---|
-a |
Yes | No | x86_64 | Sets the architecture for which to download Pegasus |
-o |
Yes | No | rhel | Sets the OS for which to download Pegasus |
-v |
Yes | No | 8 | Sets the OS version for which to download Pegasus |
-p |
Yes | No | $PWD/pegasus_install |
Sets the installation prefix for HTCondor and Pegasus |
-j |
Yes | No | 1 | Sets the number of parallel builders to pass to make -j |
-w |
Yes | No | None | Sets extra flags to pass to wget for getting tarballs |
-c |
Yes | No | None | Sets extra flags to pass to cmake when buildingHTCondor from source |
-d |
No | No | N/A | If provided, allow install.sh to delete existing directories |
-s |
No | No | N/A | If provided, clone Git repos with SSH instead of HTTPS |
After running install.sh, a new script called pegasus_prefix.sh will be created in the same directory as install.sh. This
script is used by the other scripts in this repo to perform other operations.
Note
Valid values for the -a, -o, and -v options can be found by looking at the tarballs at
https://download.pegasus.isi.edu/pegasus/5.1.2.dev.0.
Important
Flux support is still in-progress for HTCondor under this PR, and
Flux support in Pegasus has been added as of the 5.1.2 release. As a result, install.sh currently builds HTCondor
from source using this branch, and it installs
Pegasus using pre-release tarballs found here.
If you'd rather build a different version of HTCondor or Pegasus, you can edit the variables at the top of
install.sh, but keep in mind that doing so may break Flux support.
To simplify the process of starting and stopping HTCondor and Pegasus for a single user, this repo provides
the start_pegasus.sh and stop_pegasus.sh scripts. Before running either script, users should first source
the setup_pegasus_env.sh script. This script sets several environment variables necessary for the other scripts
and HTCondor to work.
After sourcing setup_pegasus_env.sh, users can start HTCondor and configure it for Pegasus by running
start_pegasus.sh. This script starts all of HTCondor's daemons on the current node by running condor_master,
and it configures HTCondor's GLite/BLAHP component for Pegasus using the pegasus-configure-glite command.
After starting HTCondor, users can check the status of the deployment by running check_pegasus.sh. This
script will print out the following information which can be used to verify that HTCondor is running
correctly:
- The running HTCondor daemons
- The version of Pegasus
- The status of HTCondor
- The status of any running HTCondor jobs
Finally, users can shutdown HTCondor by running stop_pegasus.sh. This script simply identifies
the PID of condor_master and kills that process.
Note
All of these scripts depend on the pegasus_prefix.sh script created by install.sh.
Users should not delete this script unless they are uninstalling HTCondor and Pegasus,
which can be done with the uninstall.sh script.
Running Pegasus workflows under any batch scheduler (e.g., Slurm, Flux, LSF) requires configuring Pegasus to use HTCondor's BLAHP (formerly glite) component. BLAHP converts HTCondor jobs into batch scripts for a specified batch scheduler, and it provides an interface for other scheduling-related tasks (e.g., checking job status).
To configure Pegasus to use BLAHP's Flux support, users need to add two profiles to their site definitions.
First, the pegasus.style profile should be set to glite. This tells Pegasus to use BLAHP. Second,
the condor.grid_resource profile should be set to batch flux. This tells BLAHP to generate batch scripts
and commands for Flux. Below are some examples of how to specify this.
YAML Config
sites:
- name: local-flux
directories:
# The following is a shared directory amongst all the nodes in the cluster
- type: sharedScratch
path: /lfs/local-flux/glite-sharedfs-example/shared-scratch
fileServers:
- url: file:///lfs/local-flux/glite-sharedfs-example/shared-scratch
operation: all
profiles:
pegasus:
style: glite
condor:
grid_resource: batch flux
# This last part isn't necessary for Flux support, but it's a good
# idea to include it
env:
PEGASUS_HOME: /lfs/software/pegasusPython API
from Pegasus.api import Directory, FileServer, Operation, Site, SiteCatalog, Workflow
wflow = Workflow("example_workflow")
sites = SiteCatalog()
shared_scratch_dir = "/lfs/local-flux/glite-sharedfs-example/shared-scratch"
flux_site = Site("local-flux").add_directories(
Directory(Directory.SHARED_SCRATCH, shared_scratch_dir).add_file_server(
FileServer("file://" + shared_scratch_dir, Operation.ALL)
)
)
flux_site.add_pegasus_profile(style="glite")
flux_site.add_condor_profile(grid_resource="batch flux")
sites.add_sites(flux_site)
wflow.add_site_catalog(sites)After telling Pegasus to use Flux for a site, users can set additional Pegasus profiles to define resource and other job requirements. The Pegasus documentation has a table that explains each profile here.
Finally, users need to tell Pegasus to use the site configured for Flux. This can be done in one of two ways.
If you want your entire workflow to run on the Flux site, simply add -s <site_name> to your pegasus-plan
invocation. If you only want specific jobs to run on the Flux site, you can set the selector.execution_site
profile for those jobs to the name of the Flux site.
Pegasus and HTCondor are not designed to support Flux's hierarchical scheduling capabilities, so neither tool will try to perform any hierarchical scheduling on their own. However, there are two ways in which users can take advantange of hierarchical scheduling.
First, users can utilize Flux's hierarchical scheduling capabilities to isolate their Pegasus workflows from other users. This allows workflows to run through Flux without incurring overheads from having to contend with other users' jobs in the system-wide scheduler. In the past, the only way to avoid this overhead was to use the pegasus-mpi-cluster (PMC), which orchestrates workflow DAGs through a single MPI program. PMC avoids the overheads of system-wide scheduling, but it also eliminates all overheads from batch scheduling logic, which can produce unrealistically fast workflow makespan.
To isolate your Pegasus workflow from other users, simply do the following:
- Get a Flux allocation with
flux allocorflux batch - Within the allocation, start HTCondor and Pegasus using the instructions above
- Within the allocation, plan and run your workflow with Pegasus
The way to take advantage of hierarchical scheduling is to simply use a script that invokes Flux
as the transformation for your Pegasus job(s). By putting your job's logic in a shell script,
you can use Flux commands like flux batch and flux alloc within that script to perform hierarchical
scheduling within a single job.
To allow users to play around with Pegasus on a Flux system, we provide a Dockerfile
that will provide a Flux instance, an install of Pegasus and HTCondor, and the
setup_pegasus_env.sh, start_pegasus.sh, stop_pegasus.sh, and check_pegasus.sh
scripts.
Caution
This Dockerfile is still under development. There is no guarantee that it will work yet. This notice will be removed once the Dockerfile is complete and ready for use.
Copyright 2026 Global Computing Lab.
The code in this repository is distributed under the terms of the Apache License, Version 2.0 with LLVM Exceptions.
See LICENSE and COPYRIGHT for more details.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
This material is based upon work supported by the US National Science Foundation under Grant No. 2530461, 2513101, 2331152, 2223704, 2138811, and 2103845.
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 and was supported by the LLNL-LDRD Program under Project No. 24-SI-005.