Eager error checking for 'RUNFILES_DIR' in the run template by stagnation · Pull Request #900 · aspect-build/rules_py

stagnation · 2026-03-30T14:42:59Z

I had a weird error occur during a --sandbox_debug session where instead of running the code it just crashed with

Error:   × Unable to run command:
  ├─▶ Unable to create base venv directory
  ╰─▶ Permission denied (os error 13)

4062937 stat("/.mutator.venv", 0x7fff2b5c0ed0) = -1 ENOENT (No such file or directory)
4062937 mkdir("/.mutator.venv", 0777)   = -1 EACCES (Permission denied)
4062937 stat("/.mutator.venv", 0x7fff2b5c0c20) = -1 ENOENT (No such file or directory)
4062937 write(2, "\33[31m  \342\224\234\342\224\200\342\226\266 \33[0mUnable to create base venv directory", 57) = 57

This came from not having the RUNFILES_DIR environment variable setup correctly. This was with a remote build (did not try to reproduce with a local build) with --build_runfile_links=false so the bash runfiles snippet could not find the directories and set the variables to empty values. This is not caught by set -u as they are indeed set.

Instead the script proceeds with an empty variable expansion and bug. With this change we get a more actionable error message and avoids running code that targets "<empty>/<something>" which is known to be a bad idea. Thankfully it just crashes but with hard to understand messages. Where the sandbox debug behavior does not reflect what actually happens in the action when run by Bazel.

Changes are visible to end-users: yes/no

Improves error reporting in rare cases. Avoids the risk of follow on errors of empty shell variables.

Test plan

Manual testing; please provide instructions so we can reproduce.

I think the key here is build remotely without runfiles links, then follow Bazel's sandbox debug functionality.

I had a weird error occur during a `--sandbox_debug` session where instead of running the code it just crashed with ``` Error: × Unable to run command: ├─▶ Unable to create base venv directory ╰─▶ Permission denied (os error 13) ``` ``` 4062937 stat("/.mutator.venv", 0x7fff2b5c0ed0) = -1 ENOENT (No such file or directory) 4062937 mkdir("/.mutator.venv", 0777) = -1 EACCES (Permission denied) 4062937 stat("/.mutator.venv", 0x7fff2b5c0c20) = -1 ENOENT (No such file or directory) 4062937 write(2, "\33[31m \342\224\234\342\224\200\342\226\266 \33[0mUnable to create base venv directory", 57) = 57 ``` This came from not having the RUNFILES_DIR environment variable setup correctly. Which could be because bazel's `--show_subcommands` did not print it properly, I don't know precisely why and did not care to chase it down, I might do that later, or chalk it up to user error. This script does have `set -u` so I expect it to just fail instead of proceeding with an empty variable expansion and bug chasing exercise but with this eager control I now have a more actionable error message.

CLAassistant · 2026-03-30T14:43:06Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

aspect-workflows · 2026-03-30T14:43:18Z

Bazel 8 (Test)

12 test targets passed

Targets

//examples/multi_version:py_version_default_test [k8-fastbuild-ST-32abbbcff250]         1s
//examples/multi_version:py_version_test [k8-fastbuild-ST-32abbbcff250]                 1s
//examples/pytest:absolute_main_test [k8-fastbuild-ST-32abbbcff250]                     458ms
//examples/pytest:main_with_colon_test [k8-fastbuild-ST-32abbbcff250]                   412ms
//examples/pytest:pytest_test [k8-fastbuild-ST-32abbbcff250]                            1s
//examples/pytest:sharded/test [k8-fastbuild-ST-32abbbcff250]                           2s
//examples/pytest:test_naming_723 [k8-fastbuild-ST-32abbbcff250]                        1s
//examples/virtual_deps:pytest_test [k8-fastbuild-ST-32abbbcff250]                      1s
//py/tests/external-deps:test_can_import_runfiles_helper [k8-fastbuild-ST-32abbbcff250] 633ms
//py/tests/internal-deps:assert [k8-fastbuild-ST-32abbbcff250]                          495ms
//py/tests/py-test:test_env_vars [k8-fastbuild-ST-32abbbcff250]                         418ms
//py/tests/repo_relative_imports/test:test [k8-fastbuild-ST-32abbbcff250]               525ms

Total test execution time was 11s. 95 tests (88.8%) were fully cached saving 37s.

Bazel 9 (Test)

12 test targets passed

Targets

//examples/multi_version:py_version_default_test [k8-fastbuild-ST-32abbbcff250]         2s
//examples/multi_version:py_version_test [k8-fastbuild-ST-32abbbcff250]                 1s
//examples/pytest:absolute_main_test [k8-fastbuild-ST-32abbbcff250]                     731ms
//examples/pytest:main_with_colon_test [k8-fastbuild-ST-32abbbcff250]                   567ms
//examples/pytest:pytest_test [k8-fastbuild-ST-32abbbcff250]                            2s
//examples/pytest:sharded/test [k8-fastbuild-ST-32abbbcff250]                           2s
//examples/pytest:test_naming_723 [k8-fastbuild-ST-32abbbcff250]                        1s
//examples/virtual_deps:pytest_test [k8-fastbuild-ST-32abbbcff250]                      1s
//py/tests/external-deps:test_can_import_runfiles_helper [k8-fastbuild-ST-32abbbcff250] 676ms
//py/tests/internal-deps:assert [k8-fastbuild-ST-32abbbcff250]                          639ms
//py/tests/py-test:test_env_vars [k8-fastbuild-ST-32abbbcff250]                         517ms
//py/tests/repo_relative_imports/test:test [k8-fastbuild-ST-32abbbcff250]               567ms

Total test execution time was 13s. 95 tests (88.8%) were fully cached saving 1m 5s.

Bazel 8 (Test)

e2e

13 test targets passed

Targets

//cases/interpreter-version-541:test_3_10_classic [k8-fastbuild-ST-3746b67a8e25]        360ms
//cases/interpreter-version-541:test_3_11_classic [k8-fastbuild-ST-1a8602e72ef0]        418ms
//cases/interpreter-version-541:test_3_9_classic [k8-fastbuild-ST-cc09ec9be975]         990ms
//cases/pth-namespace-547:test_legacy [k8-fastbuild-ST-144eaf95864e]                    978ms
//cases/pytest-main-867:test_collection_check [k8-fastbuild-ST-849c15114b99]            2s
//cases/pytest-main-867:test_direct_pytest_main [k8-fastbuild-ST-849c15114b99]          1s
//cases/pytest-main-867:test_example [k8-fastbuild-ST-849c15114b99]                     2s
//cases/pytest-main-867:test_multi_src [k8-fastbuild-ST-849c15114b99]                   1s
//cases/pytest-main-867:test_naming_regression [k8-fastbuild-ST-849c15114b99]           1s
//cases/pytest-mock-530:test_mock_py_test [k8-fastbuild-ST-ead18db1a9e5]                780ms
//cases/repository-rule-deps-299:all_direct [k8-fastbuild-ST-1a8602e72ef0]              369ms
//cases/repository-rule-deps-299:test [k8-fastbuild-ST-1a8602e72ef0]                    478ms
//cases/root-dir-paths-538:test_root_dir [k8-fastbuild]                                 42ms

Total test execution time was 12s. 33 tests (71.7%) were fully cached saving 34s.

Bazel 9 (Test)

e2e

13 test targets passed

Targets

//cases/interpreter-version-541:test_3_10_classic [k8-fastbuild-ST-3746b67a8e25]        1s
//cases/interpreter-version-541:test_3_11_classic [k8-fastbuild-ST-1a8602e72ef0]        409ms
//cases/interpreter-version-541:test_3_9_classic [k8-fastbuild-ST-cc09ec9be975]         1s
//cases/pth-namespace-547:test_legacy [k8-fastbuild-ST-144eaf95864e]                    919ms
//cases/pytest-main-867:test_collection_check [k8-fastbuild-ST-849c15114b99]            3s
//cases/pytest-main-867:test_direct_pytest_main [k8-fastbuild-ST-849c15114b99]          1s
//cases/pytest-main-867:test_example [k8-fastbuild-ST-849c15114b99]                     2s
//cases/pytest-main-867:test_multi_src [k8-fastbuild-ST-849c15114b99]                   972ms
//cases/pytest-main-867:test_naming_regression [k8-fastbuild-ST-849c15114b99]           1s
//cases/pytest-mock-530:test_mock_py_test [k8-fastbuild-ST-ead18db1a9e5]                1s
//cases/repository-rule-deps-299:all_direct [k8-fastbuild-ST-1a8602e72ef0]              433ms
//cases/repository-rule-deps-299:test [k8-fastbuild-ST-1a8602e72ef0]                    508ms
//cases/root-dir-paths-538:test_root_dir [k8-fastbuild]                                 44ms

Total test execution time was 13s. 33 tests (71.7%) were fully cached saving 34s.

Bazel 8 (Test)

examples/uv_pip_compile

All tests were cache hits

1 test (100.0%) was fully cached saving 444ms.

stagnation · 2026-03-30T14:52:34Z

For the reference, here is a redacted --sandbox_debug and --subcommands which is unexpectedly brief and un-helpful.

(cd /home/nwirekli/.cache/bazel/_bazel_nwirekli/1367cfbaeda75c6c72875bb6bf917b00/execroot/_main && \
  exec env - \
  bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<external repo>+/path/to/mutator --flags goes here)
# Configuration: 218a25cecc8aef27f19c1f20c75b3a8b236074adf6b6ea99ea087fcb0371930a
# Execution platform: @@<repo>+//platform/buildbarn:bare_x86_64_linux
# Runner: remote

I should note that we do not see the hermetic-sandbox "open /bin/sh -i" here
after the command.

stagnation · 2026-03-30T14:53:50Z

And I don't get all the RUNFILES variables through aquery either. I think they are usually hidden,
it's been a while since I debugged a python process in the sandbox so I'm a bit rusty on the state of affairs, but I usually grab the environment from --sandbox_debug. Humm.

stagnation · 2026-03-31T08:35:11Z

Doing some more tinkering in the sandbox I realize that the variable is removed by runfiles_export_envvars.

echo "RUNFILES_DIR 1: $RUNFILES_DIR"


# --- begin runfiles.bash initialization v3 ---
# Copy-pasted from the Bazel Bash runfiles library v3.
# https://github.com/bazelbuild/bazel/blob/master/tools/bash/runfiles/runfiles.bash
set -uo pipefail; set +e; f=bazel_tools/tools/bash/runfiles/runfiles.bash
source "${RUNFILES_DIR:-/dev/null}/$f" 2>/dev/null || \
  source "$(grep -sm1 "^$f " "${RUNFILES_MANIFEST_FILE:-/dev/null}" | cut -f2- -d' ')" 2>/dev/null || \
  source "$0.runfiles/$f" 2>/dev/null || \
  source "$(grep -sm1 "^$f " "$0.runfiles_manifest" | cut -f2- -d' ')" 2>/dev/null || \
  source "$(grep -sm1 "^$f " "$0.exe.runfiles_manifest" | cut -f2- -d' ')" 2>/dev/null || \
  { echo>&2 "ERROR: runfiles.bash initializer cannot find $f. An executable rule may have forgotten to expose it in the runfiles, or the binary may require RUNFILES_DIR to be set."; exit 1; }; f=; set -e
# --- end runfiles.bash initialization v3 ---

runfiles_export_envvars
echo "RUNFILES_DIR 2: $RUNFILES_DIR"

Gives the (redacted) output:

RUNFILES 1: bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles
RUNFILES 2:
bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator: line 53: RUNFILES_DIR: RUNFILES_DIR must be set

stagnation · 2026-03-31T08:44:18Z

The runfiles issue was an optimization --build_runfile_links=false.
Now with a lot of tinkering I can run within the sandbox, but error handling in the runfiles layer was not so fun.

Here is the trace of the bash runfiles library execution

RUNFILES 1: bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles
+ runfiles_export_envvars
+ [[ ! -f bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles_manifest ]]
+ [[ ! -f bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles_manifest ]]
+ [[ ! -d bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles ]]
+ [[ bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles_manifest == */MANIFEST ]]
+ [[ bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles_manifest == *_manifest ]]
+ [[ -d bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles ]]
+ export RUNFILES_DIR=
+ RUNFILES_DIR=
+ set +x
RUNFILES 2:

Diagnosed well:

$ ls bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles
ls: cannot access 'bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles': No such file or directory

With --remote_download_all and --build_runfile_links we do get the file and execution can proceed.

stagnation · 2026-04-07T10:59:50Z

I updated the PR message :)

Do you think I should also file an issue with the upstream runfiles snippet? That it would be better if it explicitly unset the variables?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eager error checking for 'RUNFILES_DIR' in the run template#900

Eager error checking for 'RUNFILES_DIR' in the run template#900
stagnation wants to merge 1 commit intoaspect-build:mainfrom
stagnation:patch-1

stagnation commented Mar 30, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 30, 2026

Uh oh!

aspect-workflows bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

stagnation commented Mar 30, 2026

Uh oh!

stagnation commented Mar 30, 2026

Uh oh!

stagnation commented Mar 31, 2026

Uh oh!

stagnation commented Mar 31, 2026

Uh oh!

stagnation commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

stagnation commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes are visible to end-users: yes/no

Test plan

Uh oh!

CLAassistant commented Mar 30, 2026

Uh oh!

aspect-workflows bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bazel 8 (Test)

Bazel 9 (Test)

Bazel 8 (Test)

e2e

Bazel 9 (Test)

e2e

Bazel 8 (Test)

examples/uv_pip_compile

Uh oh!

stagnation commented Mar 30, 2026

Uh oh!

stagnation commented Mar 30, 2026

Uh oh!

stagnation commented Mar 31, 2026

Uh oh!

stagnation commented Mar 31, 2026

Uh oh!

stagnation commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stagnation commented Mar 30, 2026 •

edited

Loading

aspect-workflows bot commented Mar 30, 2026 •

edited

Loading