Skip to content

Conversation

@rMaxiQp
Copy link
Contributor

@rMaxiQp rMaxiQp commented Oct 8, 2025

What?

Expose listeners that getFreeBindAddr() creates

Why?

Currently, getFreeBindAddr() finds a free-to-use address by try-binding. After it finds the valid port, it returns the API address, and closes the listener. In between the old listener closing and new listener opening, the address port can be stolen. One fix here would be exposing the old listener without closing.

One caveat, now the GlobalState has a net.Listener field hanging mainly because it's needed for testing... Maybe there can be a better approach here.

Checklist

  • I have performed a self-review of my code.
  • I have commented on my code, particularly in hard-to-understand areas.
  • I have added tests for my changes.
  • I have run linter and tests locally (make check) and all pass.

Checklist: Documentation (only for k6 maintainers and if relevant)

Please do not merge this PR until the following items are filled out.

  • I have added the correct milestone and labels to the PR.
  • I have updated the release notes: link
  • I have updated or added an issue to the k6-documentation: grafana/k6-docs#NUMBER if applicable
  • I have updated or added an issue to the TypeScript definitions: grafana/k6-DefinitelyTyped#NUMBER if applicable

Related PR(s)/Issue(s)

Relates to #3846

@rMaxiQp rMaxiQp changed the title Debug tests Fix API Address Collisions: expose listener for serving Oct 8, 2025
@rMaxiQp rMaxiQp marked this pull request as ready for review October 8, 2025 20:36
@rMaxiQp rMaxiQp requested a review from a team as a code owner October 8, 2025 20:36
@rMaxiQp rMaxiQp requested review from AgnesToulet and ankur22 and removed request for a team October 8, 2025 20:36
@rMaxiQp rMaxiQp temporarily deployed to azure-trusted-signing October 9, 2025 08:09 — with GitHub Actions Inactive
@rMaxiQp rMaxiQp temporarily deployed to azure-trusted-signing October 9, 2025 08:11 — with GitHub Actions Inactive
rMaxiQp and others added 6 commits October 9, 2025 12:57
* Update release issue template after 1.3.0

* update doc release note instructions

* remove extra line

* Update .github/ISSUE_TEMPLATE/release.md

Co-authored-by: Heitor Tashiro Sergent <[email protected]>

---------

Co-authored-by: Heitor Tashiro Sergent <[email protected]>
@rMaxiQp rMaxiQp temporarily deployed to azure-trusted-signing October 13, 2025 13:50 — with GitHub Actions Inactive
@rMaxiQp rMaxiQp temporarily deployed to azure-trusted-signing October 13, 2025 13:51 — with GitHub Actions Inactive
@AgnesToulet
Copy link
Contributor

Hi! Thanks for your contribution, the approach seems OK to me. An alternative would be to move the code responsible for setting up the server in the api package (along the GetServer function) and mock it in the tests but it also seems a big refactor "just" for testing.

To make the code more readable, I would move theses lines in a separate function though and I would completely separate the code used in tests (where c.gs.ServerListener != nil) and the code that existed before with comments to explain all this. Something like:

// ServerListener is set up in tests 
if c.gs.ServerListener != nil {
	if aerr := srv.Serve(c.gs.ServerListener); aerr != nil {
		logger.WithError(aerr).Error("Error from API server")
		c.gs.OSExit(int(exitcodes.CannotStartRESTAPI))
	}
} else {
	if aerr := srv.ListenAndServe(); aerr != nil && !errors.Is(aerr, http.ErrServerClosed) {
		// Only exit k6 if the user has explicitly set the REST API address
		if cmd.Flags().Lookup("address").Changed {
			logger.WithError(aerr).Error("Error from API server")
			c.gs.OSExit(int(exitcodes.CannotStartRESTAPI))
		} else {
			logger.WithError(aerr).Warn("Error from API server")
		}
	}
}

This ensures the behavior at runtime doesn't change (as in your changes, where the error handling is a bit different than it was before).

Also, the xk6 tests are failing on Windows and MacOS with your changes. I'm not sure why but I can reproduce on my Windows machine so I'll look into it deeper today.

@rMaxiQp
Copy link
Contributor Author

rMaxiQp commented Oct 16, 2025

Hi @AgnesToulet , thank you for your input! I tried to refactor the code a bit to address your comment. Failing on Mac and Windows is strange... I was trying to reproduce the error on my Mac Air, and it seems to struggle with other issues atm (many browser instances are created). I'll try to reproduce on Mac a bit more later but also create a Windows setup.

@rMaxiQp rMaxiQp temporarily deployed to azure-trusted-signing October 17, 2025 07:56 — with GitHub Actions Inactive
@rMaxiQp rMaxiQp temporarily deployed to azure-trusted-signing October 17, 2025 07:58 — with GitHub Actions Inactive
Copy link
Contributor

@AgnesToulet AgnesToulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for refactoring the code. It fixed the xk6 issue as well as you were changing the existing behavior a bit before (returning an error not only when the address was changed).

I still find it a bit hard to read as part of the code related to the server is still in the run function. What about moving more code in the serve function, something like:

func (c *cmdRun) serve(globalCtx context.Context, cmd *cobra.Command, srv *http.Server) func() {
	var logger logrus.FieldLogger = c.gs.Logger
	srvCtx, srvCancel := context.WithCancel(globalCtx)

	// We cannot use backgroundProcesses here, since we need the REST API to
	// be down before we can close the samples channel above and finish the
	// processing the metrics pipeline.
	apiWG := &sync.WaitGroup{}
	apiWG.Add(2)

	go func() {
		defer apiWG.Done()
		logger.Debugf("Starting the REST API server on %s", c.gs.Flags.Address)
		if c.gs.Flags.ProfilingEnabled {
			logger.Debugf("Profiling exposed on http://%s/debug/pprof/", c.gs.Flags.Address)
		}

		// ServerListener is set up in tests
		if c.gs.ServerListener != nil {
			if err := srv.Serve(c.gs.ServerListener); err != nil && !errors.Is(err, http.ErrServerClosed) {
				logger.WithError(err).Error("Error from API server")
				c.gs.OSExit(int(exitcodes.CannotStartRESTAPI))
			}

			return
		}

		if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
			// Only exit k6 if the user has explicitly set the REST API address
			if cmd.Flags().Lookup("address").Changed {
				logger.WithError(err).Error("Error from API server")
				c.gs.OSExit(int(exitcodes.CannotStartRESTAPI))
			} else {
				logger.WithError(err).Warn("Error from API server")
			}
		}
	}()

	go func() {
		defer apiWG.Done()
		<-srvCtx.Done()
		shutdCtx, shutdCancel := context.WithTimeout(globalCtx, 1*time.Second)
		defer shutdCancel()
		if aerr := srv.Shutdown(shutdCtx); aerr != nil {
			logger.WithError(aerr).Debug("REST API server did not shut down correctly")
		}
	}()

	return func() {
		srvCancel()
		apiWG.Wait()
	}
}

// the server has fully stopped.
//
// nolint:nestif
func (c *cmdRun) serve(cmd *cobra.Command, srv *http.Server, shutdown func()) func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary to pass the full Command object to the function, what about just having an addressFlagChanged bool param? (name could probably be improved 😅)

Co-authored-by: Agnès Toulet <[email protected]>
restore default behavior on error

ensure same behavior
@rMaxiQp
Copy link
Contributor Author

rMaxiQp commented Oct 19, 2025

Hi @AgnesToulet , thank you for the reviews!

I didn't realize that my change altered the default behavior earlier 🙈. Thank you for pointing it out. Though it's a bit verbose, now we should have same behavior between actual and test besides the net.Listener change.

@rMaxiQp rMaxiQp temporarily deployed to azure-trusted-signing October 28, 2025 09:31 — with GitHub Actions Inactive
@rMaxiQp rMaxiQp temporarily deployed to azure-trusted-signing October 28, 2025 09:33 — with GitHub Actions Inactive
AgnesToulet
AgnesToulet previously approved these changes Oct 30, 2025
Copy link
Contributor

@AgnesToulet AgnesToulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, looks good to me now 👍

@andrewslotin andrewslotin requested review from codebien and removed request for ankur22 October 30, 2025 14:10
Copy link
Contributor

@codebien codebien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rMaxiQp, thanks for your contribution. I've left a few comments.

// serve starts the REST API server and ensures it is properly shut down when
// the srvCtx is done. It returns a function that can be waited on to ensure
// the server has fully stopped.
func (c *cmdRun) serve(globalCtx context.Context, srv *http.Server, addrSetByUser bool) func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this refactor here? It doesn't seem to be directly useful to the getFreeBindAddr issue. We should divide this in a different pull request.

metricsEngine,
execScheduler,
)
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It usually better to have the concurrency control on the caller instead of moving nested in a function.

We should do

go c.serve(
// or

go func () {
  c.serve(
}()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying out with #5370 to address your comment on the refactoring, and please let me know if I misunderstood your suggestion 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is what I meant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! We can get the refactor #5370 reviewed & merged first then clean up the mess in this PR then. It should have less diff afterwards.

Comment on lines -137 to -140
defer func() {
assert.NoError(tb, listener.Close())
}()
return addr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not closing anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original issue was that the port is tested out then released. In between the port release and reclaim, this port can be taken by other processes, leading to flaky address collision.

The current proposal would be returning the opened connection instead. From a quick search, getFreeBindAddr can be exposed to gRPC usage or http server usage. The original gRPC usage creates the listener using returning address and no explicit Close(). For HTTP server, the original implementation is using ListenAndServe which create the listener then Serve. Here, we can use Serve instead which would also handle Listener Closing.

FallbackLogger: testutils.NewLogger(tb).WithField("fallback", true),
Usage: usage.New(),
TestStatus: lib.NewTestStatus(),
ServerListener: listener,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ServerListener is only used in tests then we should just deprecate it and use a local variable for the testing package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate more on this topic?

ServerListener is used in /internal/cmd/run.go where we want to span up a http server using specific address. Since the current address is still claimed by the listener, calling ListenAndServe would fail but Serve would work. If we use a local variable for the testing package, is there an alternative to alter how we spin up the server in /internal/cmd/run.go ?

@mstoykov mstoykov added this to the v1.5.0 milestone Nov 5, 2025
@rMaxiQp
Copy link
Contributor Author

rMaxiQp commented Nov 5, 2025

Hi @codebien, thank you for the comments! I can start working on migrating the refactor on top a separate PR. For other comments, I shared my current understanding and hopefully it helps to complete the picture. If there are some context that I'm missing out, please let me know!

Copy link
Contributor

@codebien codebien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rMaxiQp you can probably already rebase this pull request and remove the changes migrated to the other pull request. I don't think you need to wait for the other pull request, the changes seems to be independent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants