Skip to content

729/additional slurm configuration options#730

Closed
jkrue wants to merge 79 commits intomasterfrom
729/additional_slurm_configuration_options
Closed

729/additional slurm configuration options#730
jkrue wants to merge 79 commits intomasterfrom
729/additional_slurm_configuration_options

Conversation

@jkrue
Copy link
Copy Markdown
Member

@jkrue jkrue commented Mar 31, 2026

solves #729

XaverStiensmeier and others added 30 commits May 13, 2025 07:51
* added write to remote

* first attempt at remote writing

* changed instances of write_yaml that happen for creation to direct remote writes.

* added TODO

* changed os_versions in cloud_node_requirements.yaml

* changed version number. Went down to 3 to align with github repository

* fixed global variable not static causing parallel create runs to affect each other

* pleasing linter

* pleasing linter
* Attempt using TRES_CORE_MEMORY

* instead of //32 and capped at 2000 //4 +1000 sounds more reasonable

* fixed equation

* log warning if ram < 4096

* added unit

* line too long
* fixed volume name or id

* added new volume key "id" to schema

* added new volume key "id" to rest model

* moved rest models to models/ folder

* fixed tests, improved function naming

* improved readability. Ignored pylint multiple branches for volume creation

* duplicate ignore

* added disable duplicate code

* removed code duplicate

* fixed name not set bug, renamed path of ingetration_test bibigrid.yaml to bibigrid_test.yaml

* added info in configuration.md

* updated bibigrid.yaml
* Add build system

- use uv for dependency management
- add bibigrid entrypoint
- use click for command line parsing

* Adapt pyproject.toml

* adapy pyproject.toml

* Restructure package

- add pyproject.toml and use uv as build system
- move ansible resources into package and adapt paths

* remove auto-generated files

* remove auto-generated file again

* updated resources paths to bibigrid/resources

* Updated CLI click. Changed to match structure and argument

* dirty fix

* minor updates to usability and documentation

* fixed path for integration_test

* fixed startup to new run_action structure

* pleased pylint

* updated documentation from 'bibigrid -c' to 'bibigrid create'

* rebuild uv.lock

* updated version in pyproject.toml to match version change to align with future bibigrid releases

* changed cli to main to be more explicit

* added a single line to explain how to install BiBiGrid as a package. Can be improved upon in the future.

---------

Co-authored-by: Xaver Stiensmeier <xaverstiensmeier@gmx.de>
# Conflicts:
#	bibigrid/resources/playbook/roles/bibigrid/tasks/001-apt.yaml
XaverStiensmeier and others added 28 commits October 14, 2025 11:53
Fixed duplicate definition of Volume. Fixed id not having a default value
…te configuration not following pedantic closely. Configuration structure needs to be reviewed to improve code readability in that regard for the future.
…To Better Align With REST API Using Pedantic (#689)

* added option to add server groups

* replaced the schema validation by the rest pydantic models

* switched completely to pydantic. Added hack to satisfy pedantic despite configuration not following pedantic closely. Configuration structure needs to be reviewed to improve code readability in that regard for the future.

* pleasing linter. Using inheritance to streamline configs

* trying to please linter

* updated documentation

* updated tests

* tried to please linter
# Conflicts:
#	bibigrid/models/configuration.py
* updated packages and fixed openstack changes

* fixed tests and updated versions correctly in all files

* beginning of rework to get_free_resources

* please linter. Improve action selection process. Refactor UPPER_CASE partials to snake_case

* moved provider to contextmanager

* fixed line too long

* moved j2 template to template folder

* fixed bcrypt 5 not being compatible with latest passlib version by setting bcrypt to 4.3

* pin passlib version, too
* fixed gres and included installation of NVIDIA drivers

* Fixed gres, zabbix repository and logging

Untested. Includes installation for nvidia

* removed pausing line

* support new rest schema

* updated gres and volume documentation

* removed unnecessary log line

removed a log line with little semantic information

#705
Added Uvicorn logging configuration to enhance logging format.

Old:
INFO:     172.18.0.6:59830 - "GET /bibigrid/state/kkzd9om3e6otye3 HTTP/1.1" 200 OK


New:

2026-03-03 10:49:49,447 [INFO] Waiting for application startup.
* Improves Nvidia and Cuda out-of-the-box setup

In addition to the nvidia drivers, BiBiGrid now also installs the cuda toolkit and the container toolkit. It also configures the toolkit to run with docker. This can be tested with "sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi" on a gpu node

#719

* Fixed capitalization

#719

* Pleases linter

#719

* Pleases linter

#719

* Updated documentation.

#719
* fixed gres and included installation of NVIDIA drivers

* Fixed gres, zabbix repository and logging

Untested. Includes installation for nvidia

* removed pausing line

* support new rest schema

* updated gres and volume documentation

* allow selection of floating ip by id

now a specific floating ip can be assigned to master or vpn by setting the floatingIpId key in the configuration.

#709

* prevent preexisting fip deletion

now floating ips are no longer released on clouds where they are defined as pre-existing

#709

* check floating ip existence

added _check_floating_ip to verify the existence of the floating ip

#705

* update provider tests

updated provider tests to cover new api openstack api calls that had not been used in the past

#709

* fixed tests

some tests failed due to unexpected new parameters from the new floating_ip check on terminate

#709

* fix remaining floating ip

apparently when you create a floating ip attached to a server and delete that server quickly, the floating ip remains.

#709

* fix terminate for REST

REST's terminate did not pass the floating ip list. Now it is passed as well.

#709

* pleases linter

#709

* adds documentation for floatingIpId

#709

---------

Co-authored-by: Jan Krüger <jkrue@users.noreply.github.com>
feat(Logging):added timestamps to uvicorn logging
* adds slurm_conf default_partition

#721

* Fixes gres on demand scheduling

By adding a full gres.conf to each node, the on demand scheduling is fixed. Also fixes bibilog and enables DebugFlag=Gres and TaskPlugin=task/cgroup

#726
… ubuntu user on mount points (#723)

* Aligned with SimpleVM

Changed MemSpecLimit to maximum 4000 instead of 8000. Changed owner of mounted volumes to ubuntu

#721

* Aligned with SimpleVM

Changed MemSpecLimit to maximum 4000 instead of 8000. Changed owner of mounted volumes to ubuntu

#721
@jkrue jkrue self-assigned this Mar 31, 2026
@jkrue jkrue closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants