-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix API Address Collisions: expose listener for serving #5227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
* Update release issue template after 1.3.0 * update doc release note instructions * remove extra line * Update .github/ISSUE_TEMPLATE/release.md Co-authored-by: Heitor Tashiro Sergent <[email protected]> --------- Co-authored-by: Heitor Tashiro Sergent <[email protected]>
…3616) Moved the validation function to the ramping vus
|
Hi! Thanks for your contribution, the approach seems OK to me. An alternative would be to move the code responsible for setting up the server in the To make the code more readable, I would move theses lines in a separate function though and I would completely separate the code used in tests (where This ensures the behavior at runtime doesn't change (as in your changes, where the error handling is a bit different than it was before). Also, the |
|
Hi @AgnesToulet , thank you for your input! I tried to refactor the code a bit to address your comment. Failing on Mac and Windows is strange... I was trying to reproduce the error on my Mac Air, and it seems to struggle with other issues atm (many browser instances are created). I'll try to reproduce on Mac a bit more later but also create a Windows setup. |
AgnesToulet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for refactoring the code. It fixed the xk6 issue as well as you were changing the existing behavior a bit before (returning an error not only when the address was changed).
I still find it a bit hard to read as part of the code related to the server is still in the run function. What about moving more code in the serve function, something like:
func (c *cmdRun) serve(globalCtx context.Context, cmd *cobra.Command, srv *http.Server) func() {
var logger logrus.FieldLogger = c.gs.Logger
srvCtx, srvCancel := context.WithCancel(globalCtx)
// We cannot use backgroundProcesses here, since we need the REST API to
// be down before we can close the samples channel above and finish the
// processing the metrics pipeline.
apiWG := &sync.WaitGroup{}
apiWG.Add(2)
go func() {
defer apiWG.Done()
logger.Debugf("Starting the REST API server on %s", c.gs.Flags.Address)
if c.gs.Flags.ProfilingEnabled {
logger.Debugf("Profiling exposed on http://%s/debug/pprof/", c.gs.Flags.Address)
}
// ServerListener is set up in tests
if c.gs.ServerListener != nil {
if err := srv.Serve(c.gs.ServerListener); err != nil && !errors.Is(err, http.ErrServerClosed) {
logger.WithError(err).Error("Error from API server")
c.gs.OSExit(int(exitcodes.CannotStartRESTAPI))
}
return
}
if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
// Only exit k6 if the user has explicitly set the REST API address
if cmd.Flags().Lookup("address").Changed {
logger.WithError(err).Error("Error from API server")
c.gs.OSExit(int(exitcodes.CannotStartRESTAPI))
} else {
logger.WithError(err).Warn("Error from API server")
}
}
}()
go func() {
defer apiWG.Done()
<-srvCtx.Done()
shutdCtx, shutdCancel := context.WithTimeout(globalCtx, 1*time.Second)
defer shutdCancel()
if aerr := srv.Shutdown(shutdCtx); aerr != nil {
logger.WithError(aerr).Debug("REST API server did not shut down correctly")
}
}()
return func() {
srvCancel()
apiWG.Wait()
}
}
internal/cmd/run.go
Outdated
| // the server has fully stopped. | ||
| // | ||
| // nolint:nestif | ||
| func (c *cmdRun) serve(cmd *cobra.Command, srv *http.Server, shutdown func()) func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem necessary to pass the full Command object to the function, what about just having an addressFlagChanged bool param? (name could probably be improved 😅)
Co-authored-by: Agnès Toulet <[email protected]>
restore default behavior on error ensure same behavior
|
Hi @AgnesToulet , thank you for the reviews! I didn't realize that my change altered the default behavior earlier 🙈. Thank you for pointing it out. Though it's a bit verbose, now we should have same behavior between actual and test besides the |
AgnesToulet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, looks good to me now 👍
codebien
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rMaxiQp, thanks for your contribution. I've left a few comments.
internal/cmd/run.go
Outdated
| // serve starts the REST API server and ensures it is properly shut down when | ||
| // the srvCtx is done. It returns a function that can be waited on to ensure | ||
| // the server has fully stopped. | ||
| func (c *cmdRun) serve(globalCtx context.Context, srv *http.Server, addrSetByUser bool) func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this refactor here? It doesn't seem to be directly useful to the getFreeBindAddr issue. We should divide this in a different pull request.
| metricsEngine, | ||
| execScheduler, | ||
| ) | ||
| go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It usually better to have the concurrency control on the caller instead of moving nested in a function.
We should do
go c.serve(
// or
go func () {
c.serve(
}()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying out with #5370 to address your comment on the refactoring, and please let me know if I misunderstood your suggestion 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is what I meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! We can get the refactor #5370 reviewed & merged first then clean up the mess in this PR then. It should have less diff afterwards.
| defer func() { | ||
| assert.NoError(tb, listener.Close()) | ||
| }() | ||
| return addr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not closing anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original issue was that the port is tested out then released. In between the port release and reclaim, this port can be taken by other processes, leading to flaky address collision.
The current proposal would be returning the opened connection instead. From a quick search, getFreeBindAddr can be exposed to gRPC usage or http server usage. The original gRPC usage creates the listener using returning address and no explicit Close(). For HTTP server, the original implementation is using ListenAndServe which create the listener then Serve. Here, we can use Serve instead which would also handle Listener Closing.
| FallbackLogger: testutils.NewLogger(tb).WithField("fallback", true), | ||
| Usage: usage.New(), | ||
| TestStatus: lib.NewTestStatus(), | ||
| ServerListener: listener, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ServerListener is only used in tests then we should just deprecate it and use a local variable for the testing package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate more on this topic?
ServerListener is used in /internal/cmd/run.go where we want to span up a http server using specific address. Since the current address is still claimed by the listener, calling ListenAndServe would fail but Serve would work. If we use a local variable for the testing package, is there an alternative to alter how we spin up the server in /internal/cmd/run.go ?
|
Hi @codebien, thank you for the comments! I can start working on migrating the refactor on top a separate PR. For other comments, I shared my current understanding and hopefully it helps to complete the picture. If there are some context that I'm missing out, please let me know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rMaxiQp you can probably already rebase this pull request and remove the changes migrated to the other pull request. I don't think you need to wait for the other pull request, the changes seems to be independent.
What?
Expose listeners that
getFreeBindAddr()createsWhy?
Currently,
getFreeBindAddr()finds a free-to-use address by try-binding. After it finds the valid port, it returns the API address, and closes the listener. In between the old listener closing and new listener opening, the address port can be stolen. One fix here would be exposing the old listener without closing.One caveat, now the
GlobalStatehas anet.Listenerfield hanging mainly because it's needed for testing... Maybe there can be a better approach here.Checklist
make check) and all pass.Checklist: Documentation (only for k6 maintainers and if relevant)
Please do not merge this PR until the following items are filled out.
Related PR(s)/Issue(s)
Relates to #3846