Skip to content

Conversation

@jackmarsh
Copy link

The cd $PKG_DIR pattern breaks when tool paths are relative, which happens in remote execution. After the cd, the relative tool path is no longer valid.

This removes the cd by using named srcs and passing $SRCS_GO directly instead of globbing with *.go. Adds -I$PKG_DIR so cgo can still find headers.

jackmarsh and others added 3 commits December 23, 2025 16:22
The cgo_library rule was using unnamed tools syntax (tools = [CONFIG.GO.GO_TOOL])
with $TOOL variable. This causes remote build execution to fail because Please
doesn't include tool outputs in the remote action's input tree when using this
syntax.

This change switches to named tools syntax (tools = {"go": CONFIG.GO.GO_TOOL})
with $TOOLS_GO variable, which is consistent with how _go_library_cmds handles
tools and enables proper RBE support for cgo builds.
The cgo_library rule changes directory with 'cd $PKG_DIR/' before running
the go tool. This is necessary because:
1. The *.go glob needs the correct working directory
2. Header files must be relative to cgo's working directory for #include

In remote execution, tool paths are relative to $TMP_DIR. After the cd,
these relative paths become invalid.

Fix: Convert tool path to absolute before the cd using a case statement.
Uses $TMP_DIR/$TOOLS_GO for relative paths (remote), keeps absolute
paths unchanged (local).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The cd $PKG_DIR pattern breaks when tool paths are relative, which
happens in remote execution. This removes the cd and instead uses
named srcs with $SRCS_GO directly, adding -I$PKG_DIR for headers.
@toastwaffle
Copy link
Contributor

Do you have a minimal reproduction example for the bug you're trying to fix here? What remote execution system are you using?

We're using cgo_library with remote execution without any trouble, so I'm curious what is going wrong for you

@jackmarsh
Copy link
Author

Thanks for the quick reply Sam,.

Environment Details

Here's our setup for reproducibility:

Versions:

  • Please: 17.27.0
  • go-rules: v1.28.2

.plzconfig (relevant sections):

[please]
version = 17.27.0

[Plugin "go"]
Target = //plugins:go
gotool = //third_party/go:gotool|go
defaultstatic = true
cgoenabled = true

.plzconfig.rbe:

[remote]
url = <mettle-api>:9090
casurl = <elan-cas>:9090
asseturl = <zeal-asset>:9090
instance = <instance-name>
numexecutors = 32

Setup

We run our own custom RBE system, heavily inspired by thought-machine/please-servers. We use:

  • Mettle (execution API)
  • Elan (CAS)
  • Zeal (remote asset API)

The issue manifested when building go-ethereum as a subrepo dependency. go-ethereum has several CGO libraries (secp256k1, blst, etc.) that we use across our codebase.

When building remotely, we got:

$ plz build //third_party/go:go-ethereum --profile rbe
Build stopped after 2.7s. 3 targets failed:
    ///third_party/go/github.com_DataDog_zstd//:_zstd#cgo
Remotely executed command exited with 127
Stderr:
bash: line 1: third_party/go/gotool/bin/go: No such file or directory

    ///third_party/go/github.com_cockroachdb_pebble//internal/manual:_manual#cgo
Remotely executed command exited with 127
Stderr:
bash: line 1: third_party/go/gotool/bin/go: No such file or directory

    ///third_party/go/gopkg.in_olebedev_go-duktape.v3//:_go-duktape.v3#cgo
Remotely executed command exited with 127
Stderr:
bash: line 1: third_party/go/gotool/bin/go: No such file or directory

The TOOL environment variable was set to a relative path (third_party/go/gotool/bin/go), but when the CGO compilation did a cd into a subdirectory, that relative path became invalid.

My initial hypothesis was that the cd in cgo.build_defs was the problem. If we could avoid changing directories, the relative path would remain valid. That's why I submitted the PR to:

  1. Use named sources instead of glob
  2. Use -I include paths instead of cd

After your comment about CGO working fine with your RBE, I went back and compared our Mettle worker implementation against upstream please-servers.

I discovered we were missing this "crappy little hack" from worker.go line 740:

// This is a crappy little hack; tool paths that are made relative don't always work
// (notably for "go build" which needs an absolute path for -toolexec). For now, we
// fix up here, but ideally we shouldn't need to know the detail of this.
if strings.HasPrefix(v.Name, "TOOL") && !path.IsAbs(v.Value) && strings.ContainsRune(v.Value, '/') {
    v.Value = w.sandboxDir(v.Value)
} else if v.Name == "PATH" {
    v.Value = strings.ReplaceAll(v.Value, "~", w.home)
} else if v.Name == "TEST" {
    v.Value = w.sandboxDir(v.Value)
}

Where sandboxDir() converts relative paths to absolute:

func (w *worker) sandboxDir(v string) string {
    if w.sandbox != "" && w.altSandbox != "" {
        return path.Join("/tmp/plz_sandbox", v)
    }
    return path.Join(w.dir, v)  // Makes relative path absolute
}

Our worker was passing TOOL=third_party/go/gotool/bin/go (relative) directly to the command, rather than converting it to TOOL=/tmp/mettle_xxx/third_party/go/gotool/bin/go (absolute).

The Fix

I added the same transformation to our worker and after deploying this fix, all CGO builds (including go-ethereum) work perfectly via RBE:

$ plz build //third_party/go:go-ethereum --profile rbe
Build finished; total time 16.64s, incrementality 100.0%. Outputs:
//third_party/go:go-ethereum:
  plz-out/gen/third_party/go/pkg/linux_amd64/github.com/ethereum/go-ethereum

I'm happy to close this PR since I've switched back to the official go-rules, instead of my fork with this patch and everything now works. The issue was entirely in our RBE worker implementation.

But, I'm curious if there are any issues with the PR's approach? I feel like using named sources instead of the glob, and using -I instead of cd, would make the build more robust for any RBE system that might not convert relative paths to absolute. It would be more defensive, even if please-servers already handles it.

Merry Christmas! 🎄

@toastwaffle
Copy link
Contributor

oh. That is a horrible hack.

I think we should move forward with this PR, yes. Do you think the test case in test/cgo/BUILD is sufficient to validate that cgo_library targets will still build (locally), or do you thing it's worth doing some testing with one of the other libraries you mentioned that will more realistically evaluate the cgo rules?

We don't have anything set up to test remote execution on Github, so I'm going to test this PR on our internal repository as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants