-
Notifications
You must be signed in to change notification settings - Fork 836
Add retries for netlink calls that may return a EINTR #1154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,321 @@ | ||
| // Package netlinksafe wraps vishvandanda/netlink functions that may return EINTR. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you forking the code it is better to indicate the source , open source licenses and copyright should be respected License is apache so it is ok, but I noticed they have copyright https://github.com/moby/moby/blob/6de8ba3bc52377ddb06d80c76b0c8d302cb81261/LICENSE#L179 // Copyright Docker, Inc. Different projects follow different rules, but this seems to be correct to the best of my knowledge
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, a copyright attribution is the right thing to do here. Something like Given that the upstream files do not have a copyright header, we're on our own to Do The Right Thing.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've made #1159 as a follow up to get this added |
||
| // | ||
| // A Handle instantiated using [NewHandle] or [NewHandleAt] can be used in place | ||
| // of a netlink.Handle, it's a wrapper that replaces methods that need to be | ||
| // wrapped. Functions that use the package handle need to be called as "netlinksafe.X" | ||
| // instead of "netlink.X". | ||
| // | ||
| // The wrapped functions currently return EINTR when NLM_F_DUMP_INTR flagged | ||
| // in a netlink response, meaning something changed during the dump so results | ||
| // may be incomplete or inconsistent. | ||
| // | ||
| // At present, the possibly incomplete/inconsistent results are not returned | ||
| // by netlink functions along with the EINTR. So, it's not possible to do | ||
| // anything but retry. After maxAttempts the EINTR will be returned to the | ||
| // caller. | ||
| package netlinksafe | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably bike-shedding, but I'm slightly wondering if Also similar comments as on the containerd PR;
Slightly wondering if (now that more projects start to fork this code) it would perhaps be a better idea to upstream the package from moby to the github.com/vishvananda/netlink repository (as a utility package), so that we don't end up with a gazillion forks of the same code. To my understanding, it was not correct to do this by default, but having it as an optional utility package could still work? Any thoughts, @robmry?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @thaJeztah want to discuss this on slack? I sent you a dm on the cncf slack.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agreed, when I picked the name it felt wrong. Happy to adjust it when a name is figured out
Agreed on this too.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MikeZappa87 about to jump into another call (sorry! etoomanycalls), but will have a peek later
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternative is to downgrade the netlink pkg version until this is resolved as well?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm ok with
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly, |
||
|
|
||
| import ( | ||
| "log" | ||
|
|
||
| "github.com/pkg/errors" | ||
| "github.com/vishvananda/netlink" | ||
| "github.com/vishvananda/netlink/nl" | ||
| "github.com/vishvananda/netns" | ||
| ) | ||
|
|
||
| // Arbitrary limit on max attempts at netlink calls if they are repeatedly interrupted. | ||
| const maxAttempts = 5 | ||
|
|
||
| type Handle struct { | ||
| *netlink.Handle | ||
| } | ||
|
|
||
| func NewHandle(nlFamilies ...int) (Handle, error) { | ||
| nlh, err := netlink.NewHandle(nlFamilies...) | ||
| if err != nil { | ||
| return Handle{}, err | ||
| } | ||
| return Handle{nlh}, nil | ||
| } | ||
|
|
||
| func NewHandleAt(ns netns.NsHandle, nlFamilies ...int) (Handle, error) { | ||
| nlh, err := netlink.NewHandleAt(ns, nlFamilies...) | ||
| if err != nil { | ||
| return Handle{}, err | ||
| } | ||
| return Handle{nlh}, nil | ||
| } | ||
|
|
||
| func (h Handle) Close() { | ||
| if h.Handle != nil { | ||
| h.Handle.Close() | ||
| } | ||
| } | ||
|
|
||
| func retryOnIntr(f func() error) { | ||
| for attempt := 0; attempt < maxAttempts; attempt++ { | ||
| if err := f(); !errors.Is(err, netlink.ErrDumpInterrupted) { | ||
| return | ||
| } | ||
| } | ||
| log.Printf("netlink call interrupted after %d attempts", maxAttempts) | ||
| } | ||
|
|
||
| func discardErrDumpInterrupted(err error) error { | ||
| if errors.Is(err, netlink.ErrDumpInterrupted) { | ||
| // The netlink function has returned possibly-inconsistent data along with the | ||
| // error. Discard the error and return the data. This restores the behaviour of | ||
| // the netlink package prior to v1.2.1, in which NLM_F_DUMP_INTR was ignored in | ||
| // the netlink response. | ||
| log.Printf("discarding ErrDumpInterrupted: %+v", errors.WithStack(err)) | ||
| return nil | ||
| } | ||
| return err | ||
| } | ||
|
|
||
| // AddrList calls netlink.AddrList, retrying if necessary. | ||
| func AddrList(link netlink.Link, family int) ([]netlink.Addr, error) { | ||
| var addrs []netlink.Addr | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| addrs, err = netlink.AddrList(link, family) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return addrs, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // LinkByName calls h.Handle.LinkByName, retrying if necessary. The netlink function | ||
| // doesn't normally ask the kernel for a dump of links. But, on an old kernel, it | ||
| // will do as a fallback and that dump may get inconsistent results. | ||
| func (h Handle) LinkByName(name string) (netlink.Link, error) { | ||
| var link netlink.Link | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| link, err = h.Handle.LinkByName(name) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return link, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // LinkByName calls netlink.LinkByName, retrying if necessary. The netlink | ||
| // function doesn't normally ask the kernel for a dump of links. But, on an old | ||
| // kernel, it will do as a fallback and that dump may get inconsistent results. | ||
| func LinkByName(name string) (netlink.Link, error) { | ||
| var link netlink.Link | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| link, err = netlink.LinkByName(name) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return link, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // LinkList calls h.Handle.LinkList, retrying if necessary. | ||
| func (h Handle) LinkList() ([]netlink.Link, error) { | ||
| var links []netlink.Link | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| links, err = h.Handle.LinkList() //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return links, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // LinkList calls netlink.Handle.LinkList, retrying if necessary. | ||
| func LinkList() ([]netlink.Link, error) { | ||
| var links []netlink.Link | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| links, err = netlink.LinkList() //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return links, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // RouteList calls h.Handle.RouteList, retrying if necessary. | ||
| func (h Handle) RouteList(link netlink.Link, family int) ([]netlink.Route, error) { | ||
| var routes []netlink.Route | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| routes, err = h.Handle.RouteList(link, family) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return routes, err | ||
| } | ||
|
|
||
| // RouteList calls netlink.RouteList, retrying if necessary. | ||
| func RouteList(link netlink.Link, family int) ([]netlink.Route, error) { | ||
| var route []netlink.Route | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| route, err = netlink.RouteList(link, family) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return route, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // BridgeVlanList calls netlink.BridgeVlanList, retrying if necessary. | ||
| func BridgeVlanList() (map[int32][]*nl.BridgeVlanInfo, error) { | ||
| var err error | ||
| var info map[int32][]*nl.BridgeVlanInfo | ||
| retryOnIntr(func() error { | ||
| info, err = netlink.BridgeVlanList() //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return info, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // RouteListFiltered calls h.Handle.RouteListFiltered, retrying if necessary. | ||
| func (h Handle) RouteListFiltered(family int, filter *netlink.Route, filterMask uint64) ([]netlink.Route, error) { | ||
| var routes []netlink.Route | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| routes, err = h.Handle.RouteListFiltered(family, filter, filterMask) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return routes, err | ||
| } | ||
|
|
||
| // RouteListFiltered calls netlink.RouteListFiltered, retrying if necessary. | ||
| func RouteListFiltered(family int, filter *netlink.Route, filterMask uint64) ([]netlink.Route, error) { | ||
| var route []netlink.Route | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| route, err = netlink.RouteListFiltered(family, filter, filterMask) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return route, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // QdiscList calls netlink.QdiscList, retrying if necessary. | ||
| func QdiscList(link netlink.Link) ([]netlink.Qdisc, error) { | ||
| var qdisc []netlink.Qdisc | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| qdisc, err = netlink.QdiscList(link) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return qdisc, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // QdiscList calls h.Handle.QdiscList, retrying if necessary. | ||
| func (h *Handle) QdiscList(link netlink.Link) ([]netlink.Qdisc, error) { | ||
| var qdisc []netlink.Qdisc | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| qdisc, err = h.Handle.QdiscList(link) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return qdisc, err | ||
| } | ||
|
|
||
| // LinkGetProtinfo calls netlink.LinkGetProtinfo, retrying if necessary. | ||
| func LinkGetProtinfo(link netlink.Link) (netlink.Protinfo, error) { | ||
| var protinfo netlink.Protinfo | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| protinfo, err = netlink.LinkGetProtinfo(link) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return protinfo, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // LinkGetProtinfo calls h.Handle.LinkGetProtinfo, retrying if necessary. | ||
| func (h *Handle) LinkGetProtinfo(link netlink.Link) (netlink.Protinfo, error) { | ||
| var protinfo netlink.Protinfo | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| protinfo, err = h.Handle.LinkGetProtinfo(link) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return protinfo, err | ||
| } | ||
|
|
||
| // RuleListFiltered calls netlink.RuleListFiltered, retrying if necessary. | ||
| func RuleListFiltered(family int, filter *netlink.Rule, filterMask uint64) ([]netlink.Rule, error) { | ||
| var rules []netlink.Rule | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| rules, err = netlink.RuleListFiltered(family, filter, filterMask) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return rules, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // RuleListFiltered calls h.Handle.RuleListFiltered, retrying if necessary. | ||
| func (h *Handle) RuleListFiltered(family int, filter *netlink.Rule, filterMask uint64) ([]netlink.Rule, error) { | ||
| var rules []netlink.Rule | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| rules, err = h.Handle.RuleListFiltered(family, filter, filterMask) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return rules, err | ||
| } | ||
|
|
||
| // FilterList calls netlink.FilterList, retrying if necessary. | ||
| func FilterList(link netlink.Link, parent uint32) ([]netlink.Filter, error) { | ||
| var filters []netlink.Filter | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| filters, err = netlink.FilterList(link, parent) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return filters, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // FilterList calls h.Handle.FilterList, retrying if necessary. | ||
| func (h *Handle) FilterList(link netlink.Link, parent uint32) ([]netlink.Filter, error) { | ||
| var filters []netlink.Filter | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| filters, err = h.Handle.FilterList(link, parent) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return filters, err | ||
| } | ||
|
|
||
| // RuleList calls netlink.RuleList, retrying if necessary. | ||
| func RuleList(family int) ([]netlink.Rule, error) { | ||
| var rules []netlink.Rule | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| rules, err = netlink.RuleList(family) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return rules, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // RuleList calls h.Handle.RuleList, retrying if necessary. | ||
| func (h *Handle) RuleList(family int) ([]netlink.Rule, error) { | ||
| var rules []netlink.Rule | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| rules, err = h.Handle.RuleList(family) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return rules, err | ||
| } | ||
|
|
||
| // ConntrackDeleteFilters calls netlink.ConntrackDeleteFilters, retrying if necessary. | ||
| func ConntrackDeleteFilters(table netlink.ConntrackTableType, family netlink.InetFamily, filters ...netlink.CustomConntrackFilter) (uint, error) { | ||
| var deleted uint | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| deleted, err = netlink.ConntrackDeleteFilters(table, family, filters...) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return deleted, discardErrDumpInterrupted(err) | ||
| } | ||
|
|
||
| // ConntrackDeleteFilters calls h.Handle.ConntrackDeleteFilters, retrying if necessary. | ||
| func (h *Handle) ConntrackDeleteFilters(table netlink.ConntrackTableType, family netlink.InetFamily, filters ...netlink.CustomConntrackFilter) (uint, error) { | ||
| var deleted uint | ||
| var err error | ||
| retryOnIntr(func() error { | ||
| deleted, err = h.Handle.ConntrackDeleteFilters(table, family, filters...) //nolint:forbidigo | ||
| return err | ||
| }) | ||
| return deleted, err | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this contains the docker patch in the library that will alliviate the problem considerably, as it will not return early
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha, interesting -- do you have a link to the commit / patch?