Skip to content

Eager error checking for 'RUNFILES_DIR' in the run template#900

Open
stagnation wants to merge 1 commit intoaspect-build:mainfrom
stagnation:patch-1
Open

Eager error checking for 'RUNFILES_DIR' in the run template#900
stagnation wants to merge 1 commit intoaspect-build:mainfrom
stagnation:patch-1

Conversation

@stagnation
Copy link
Copy Markdown

@stagnation stagnation commented Mar 30, 2026

I had a weird error occur during a --sandbox_debug session where instead of running the code it just crashed with

Error:   × Unable to run command:
  ├─▶ Unable to create base venv directory
  ╰─▶ Permission denied (os error 13)
4062937 stat("/.mutator.venv", 0x7fff2b5c0ed0) = -1 ENOENT (No such file or directory)
4062937 mkdir("/.mutator.venv", 0777)   = -1 EACCES (Permission denied)
4062937 stat("/.mutator.venv", 0x7fff2b5c0c20) = -1 ENOENT (No such file or directory)
4062937 write(2, "\33[31m  \342\224\234\342\224\200\342\226\266 \33[0mUnable to create base venv directory", 57) = 57

This came from not having the RUNFILES_DIR environment variable setup correctly. This was with a remote build (did not try to reproduce with a local build) with --build_runfile_links=false so the bash runfiles snippet could not find the directories and set the variables to empty values. This is not caught by set -u as they are indeed set.

Instead the script proceeds with an empty variable expansion and bug. With this change we get a more actionable error message and avoids running code that targets "<empty>/<something>" which is known to be a bad idea. Thankfully it just crashes but with hard to understand messages. Where the sandbox debug behavior does not reflect what actually happens in the action when run by Bazel.

Changes are visible to end-users: yes/no

Improves error reporting in rare cases. Avoids the risk of follow on errors of empty shell variables.

Test plan

  • Manual testing; please provide instructions so we can reproduce.

I think the key here is build remotely without runfiles links, then follow Bazel's sandbox debug functionality.

I had a weird error occur during a `--sandbox_debug` session where instead of running the code it just crashed with

```
Error:   × Unable to run command:
  ├─▶ Unable to create base venv directory
  ╰─▶ Permission denied (os error 13)
```

```
4062937 stat("/.mutator.venv", 0x7fff2b5c0ed0) = -1 ENOENT (No such file or directory)
4062937 mkdir("/.mutator.venv", 0777)   = -1 EACCES (Permission denied)
4062937 stat("/.mutator.venv", 0x7fff2b5c0c20) = -1 ENOENT (No such file or directory)
4062937 write(2, "\33[31m  \342\224\234\342\224\200\342\226\266 \33[0mUnable to create base venv directory", 57) = 57
```

This came from not having the RUNFILES_DIR environment variable setup correctly. Which could be because bazel's `--show_subcommands` did not print it properly, I don't know precisely why and did not care to chase it down, I might do that later, or chalk it up to user error.

This script does have `set -u` so I expect it to just fail instead of proceeding with an empty variable expansion and bug chasing exercise but with this eager control I now have a more actionable error message.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@aspect-workflows
Copy link
Copy Markdown

aspect-workflows bot commented Mar 30, 2026

Bazel 8 (Test)

12 test targets passed

Targets
//examples/multi_version:py_version_default_test [k8-fastbuild-ST-32abbbcff250]         1s
//examples/multi_version:py_version_test [k8-fastbuild-ST-32abbbcff250]                 1s
//examples/pytest:absolute_main_test [k8-fastbuild-ST-32abbbcff250]                     458ms
//examples/pytest:main_with_colon_test [k8-fastbuild-ST-32abbbcff250]                   412ms
//examples/pytest:pytest_test [k8-fastbuild-ST-32abbbcff250]                            1s
//examples/pytest:sharded/test [k8-fastbuild-ST-32abbbcff250]                           2s
//examples/pytest:test_naming_723 [k8-fastbuild-ST-32abbbcff250]                        1s
//examples/virtual_deps:pytest_test [k8-fastbuild-ST-32abbbcff250]                      1s
//py/tests/external-deps:test_can_import_runfiles_helper [k8-fastbuild-ST-32abbbcff250] 633ms
//py/tests/internal-deps:assert [k8-fastbuild-ST-32abbbcff250]                          495ms
//py/tests/py-test:test_env_vars [k8-fastbuild-ST-32abbbcff250]                         418ms
//py/tests/repo_relative_imports/test:test [k8-fastbuild-ST-32abbbcff250]               525ms

Total test execution time was 11s. 95 tests (88.8%) were fully cached saving 37s.


Bazel 9 (Test)

12 test targets passed

Targets
//examples/multi_version:py_version_default_test [k8-fastbuild-ST-32abbbcff250]         2s
//examples/multi_version:py_version_test [k8-fastbuild-ST-32abbbcff250]                 1s
//examples/pytest:absolute_main_test [k8-fastbuild-ST-32abbbcff250]                     731ms
//examples/pytest:main_with_colon_test [k8-fastbuild-ST-32abbbcff250]                   567ms
//examples/pytest:pytest_test [k8-fastbuild-ST-32abbbcff250]                            2s
//examples/pytest:sharded/test [k8-fastbuild-ST-32abbbcff250]                           2s
//examples/pytest:test_naming_723 [k8-fastbuild-ST-32abbbcff250]                        1s
//examples/virtual_deps:pytest_test [k8-fastbuild-ST-32abbbcff250]                      1s
//py/tests/external-deps:test_can_import_runfiles_helper [k8-fastbuild-ST-32abbbcff250] 676ms
//py/tests/internal-deps:assert [k8-fastbuild-ST-32abbbcff250]                          639ms
//py/tests/py-test:test_env_vars [k8-fastbuild-ST-32abbbcff250]                         517ms
//py/tests/repo_relative_imports/test:test [k8-fastbuild-ST-32abbbcff250]               567ms

Total test execution time was 13s. 95 tests (88.8%) were fully cached saving 1m 5s.


Bazel 8 (Test)

e2e

13 test targets passed

Targets
//cases/interpreter-version-541:test_3_10_classic [k8-fastbuild-ST-3746b67a8e25]        360ms
//cases/interpreter-version-541:test_3_11_classic [k8-fastbuild-ST-1a8602e72ef0]        418ms
//cases/interpreter-version-541:test_3_9_classic [k8-fastbuild-ST-cc09ec9be975]         990ms
//cases/pth-namespace-547:test_legacy [k8-fastbuild-ST-144eaf95864e]                    978ms
//cases/pytest-main-867:test_collection_check [k8-fastbuild-ST-849c15114b99]            2s
//cases/pytest-main-867:test_direct_pytest_main [k8-fastbuild-ST-849c15114b99]          1s
//cases/pytest-main-867:test_example [k8-fastbuild-ST-849c15114b99]                     2s
//cases/pytest-main-867:test_multi_src [k8-fastbuild-ST-849c15114b99]                   1s
//cases/pytest-main-867:test_naming_regression [k8-fastbuild-ST-849c15114b99]           1s
//cases/pytest-mock-530:test_mock_py_test [k8-fastbuild-ST-ead18db1a9e5]                780ms
//cases/repository-rule-deps-299:all_direct [k8-fastbuild-ST-1a8602e72ef0]              369ms
//cases/repository-rule-deps-299:test [k8-fastbuild-ST-1a8602e72ef0]                    478ms
//cases/root-dir-paths-538:test_root_dir [k8-fastbuild]                                 42ms

Total test execution time was 12s. 33 tests (71.7%) were fully cached saving 34s.


Bazel 9 (Test)

e2e

13 test targets passed

Targets
//cases/interpreter-version-541:test_3_10_classic [k8-fastbuild-ST-3746b67a8e25]        1s
//cases/interpreter-version-541:test_3_11_classic [k8-fastbuild-ST-1a8602e72ef0]        409ms
//cases/interpreter-version-541:test_3_9_classic [k8-fastbuild-ST-cc09ec9be975]         1s
//cases/pth-namespace-547:test_legacy [k8-fastbuild-ST-144eaf95864e]                    919ms
//cases/pytest-main-867:test_collection_check [k8-fastbuild-ST-849c15114b99]            3s
//cases/pytest-main-867:test_direct_pytest_main [k8-fastbuild-ST-849c15114b99]          1s
//cases/pytest-main-867:test_example [k8-fastbuild-ST-849c15114b99]                     2s
//cases/pytest-main-867:test_multi_src [k8-fastbuild-ST-849c15114b99]                   972ms
//cases/pytest-main-867:test_naming_regression [k8-fastbuild-ST-849c15114b99]           1s
//cases/pytest-mock-530:test_mock_py_test [k8-fastbuild-ST-ead18db1a9e5]                1s
//cases/repository-rule-deps-299:all_direct [k8-fastbuild-ST-1a8602e72ef0]              433ms
//cases/repository-rule-deps-299:test [k8-fastbuild-ST-1a8602e72ef0]                    508ms
//cases/root-dir-paths-538:test_root_dir [k8-fastbuild]                                 44ms

Total test execution time was 13s. 33 tests (71.7%) were fully cached saving 34s.


Bazel 8 (Test)

examples/uv_pip_compile

All tests were cache hits

1 test (100.0%) was fully cached saving 444ms.

@stagnation
Copy link
Copy Markdown
Author

For the reference, here is a redacted --sandbox_debug and --subcommands which is unexpectedly brief and un-helpful.

(cd /home/nwirekli/.cache/bazel/_bazel_nwirekli/1367cfbaeda75c6c72875bb6bf917b00/execroot/_main && \
  exec env - \
  bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<external repo>+/path/to/mutator --flags goes here)
# Configuration: 218a25cecc8aef27f19c1f20c75b3a8b236074adf6b6ea99ea087fcb0371930a
# Execution platform: @@<repo>+//platform/buildbarn:bare_x86_64_linux
# Runner: remote

I should note that we do not see the hermetic-sandbox "open /bin/sh -i" here
after the command.

@stagnation
Copy link
Copy Markdown
Author

And I don't get all the RUNFILES variables through aquery either. I think they are usually hidden,
it's been a while since I debugged a python process in the sandbox so I'm a bit rusty on the state of affairs, but I usually grab the environment from --sandbox_debug. Humm.

@stagnation
Copy link
Copy Markdown
Author

Doing some more tinkering in the sandbox I realize that the variable is removed by runfiles_export_envvars.

echo "RUNFILES_DIR 1: $RUNFILES_DIR"


# --- begin runfiles.bash initialization v3 ---
# Copy-pasted from the Bazel Bash runfiles library v3.
# https://github.com/bazelbuild/bazel/blob/master/tools/bash/runfiles/runfiles.bash
set -uo pipefail; set +e; f=bazel_tools/tools/bash/runfiles/runfiles.bash
source "${RUNFILES_DIR:-/dev/null}/$f" 2>/dev/null || \
  source "$(grep -sm1 "^$f " "${RUNFILES_MANIFEST_FILE:-/dev/null}" | cut -f2- -d' ')" 2>/dev/null || \
  source "$0.runfiles/$f" 2>/dev/null || \
  source "$(grep -sm1 "^$f " "$0.runfiles_manifest" | cut -f2- -d' ')" 2>/dev/null || \
  source "$(grep -sm1 "^$f " "$0.exe.runfiles_manifest" | cut -f2- -d' ')" 2>/dev/null || \
  { echo>&2 "ERROR: runfiles.bash initializer cannot find $f. An executable rule may have forgotten to expose it in the runfiles, or the binary may require RUNFILES_DIR to be set."; exit 1; }; f=; set -e
# --- end runfiles.bash initialization v3 ---

runfiles_export_envvars
echo "RUNFILES_DIR 2: $RUNFILES_DIR"

Gives the (redacted) output:

RUNFILES 1: bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles
RUNFILES 2:
bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator: line 53: RUNFILES_DIR: RUNFILES_DIR must be set

@stagnation
Copy link
Copy Markdown
Author

The runfiles issue was an optimization --build_runfile_links=false.
Now with a lot of tinkering I can run within the sandbox, but error handling in the runfiles layer was not so fun.

Here is the trace of the bash runfiles library execution

RUNFILES 1: bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles
+ runfiles_export_envvars
+ [[ ! -f bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles_manifest ]]
+ [[ ! -f bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles_manifest ]]
+ [[ ! -d bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles ]]
+ [[ bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles_manifest == */MANIFEST ]]
+ [[ bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles_manifest == *_manifest ]]
+ [[ -d bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles ]]
+ export RUNFILES_DIR=
+ RUNFILES_DIR=
+ set +x
RUNFILES 2:

Diagnosed well:

$ ls bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles
ls: cannot access 'bazel-out/k8-opt-exec-ST-cfbc71dc7dfe/bin/external/<repo>+/path/to/mutator.runfiles': No such file or directory

With --remote_download_all and --build_runfile_links we do get the file and execution can proceed.

@stagnation
Copy link
Copy Markdown
Author

I updated the PR message :)

Do you think I should also file an issue with the upstream runfiles snippet? That it would be better if it explicitly unset the variables?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants