Skip to content

Commit fd1f6fb

Browse files
committed
dpif-netlink: Fix probing for broken meters on Linux v5.10+.
If someone creates a lot of meters (>1017) in the kernel datapath and then re-starts ovs-vswitchd, OVS fails to probe meter support and refuses to install any meters afterwards: dpif_netlink|INFO|The kernel module has a broken meter implementation. The reason is that probing for broken meters relies on creating two meters with high meter IDs. Inside the kernel, however, the meter table is not a real hash table since v5.10 and commit: c7c4c44c9a95 ("net: openvswitch: expand the meters supported number") Instead, it's just an array with meter IDs mapped to indexes with a simple modulo operation. This array can expand, but only when it is full. There is no handling of collisions, so if the meter at ID % size is not the right one, then the lookup just fails. This is fine as long as userspace creates meters with densely packed IDs, which is the case for ovs-vswitchd... except for probing. While probing, we attempt to create meters 54545401 and 54545402. Without expanding the table size, these map onto 1017 and 1018. So, creation of these meters fails if there are already meters with IDs 1017 or 1018. At this point OVS declares meter implementation broken. Ideally, we should make the "hash" table in the kernel handle collisions and otherwise be a more or less proper hash table. But we can also improve probing in userspace and avoid this issue by choosing lower numbered meter IDs and trying to get them from the kernel first before trying to create. Choosing high values at the top of the 0-1023 range, so they are guaranteed to fit into the minimal size table in the kernel (1024). If one of these already exists and has a proper ID, then meters are likely working fine and we don't need to install new ones for probing. If these meters are not in the kernel, then we can try to create and check. This logic should work fine for older or future kernels with a proper hash table as well. There is no Fixes tag here as the check was correct at the moment it was introduced. It's the kernel change that broke it. Reported-at: openvswitch/ovs-issues#337 Acked-by: Aaron Conole <[email protected]> Signed-off-by: Ilya Maximets <[email protected]>
1 parent 62348e8 commit fd1f6fb

File tree

2 files changed

+66
-9
lines changed

2 files changed

+66
-9
lines changed

lib/dpif-netlink.c

Lines changed: 38 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4277,15 +4277,23 @@ dpif_netlink_meter_get_stats(const struct dpif *dpif_,
42774277
ARRAY_SIZE(ovs_meter_stats_policy));
42784278
if (error) {
42794279
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
4280-
VLOG_INFO_RL(&rl, "dpif_netlink_meter_transact %s failed",
4281-
command == OVS_METER_CMD_GET ? "get" : "del");
4280+
VLOG_RL(&rl, error == ENOENT ? VLL_DBG : VLL_WARN,
4281+
"dpif_netlink_meter_transact %s failed: %s",
4282+
command == OVS_METER_CMD_GET ? "get" : "del",
4283+
ovs_strerror(error));
42824284
return error;
42834285
}
42844286

4285-
if (stats
4286-
&& a[OVS_METER_ATTR_ID]
4287-
&& a[OVS_METER_ATTR_STATS]
4288-
&& nl_attr_get_u32(a[OVS_METER_ATTR_ID]) == meter_id.uint32) {
4287+
if (a[OVS_METER_ATTR_ID]
4288+
&& nl_attr_get_u32(a[OVS_METER_ATTR_ID]) != meter_id.uint32) {
4289+
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
4290+
VLOG_INFO_RL(&rl,
4291+
"Kernel returned a different meter id than requested");
4292+
ofpbuf_delete(msg);
4293+
return EINVAL;
4294+
}
4295+
4296+
if (stats && a[OVS_METER_ATTR_STATS]) {
42894297
/* return stats */
42904298
const struct ovs_flow_stats *stat;
42914299
const struct nlattr *nla;
@@ -4363,13 +4371,34 @@ probe_broken_meters__(struct dpif *dpif)
43634371
{
43644372
/* This test is destructive if a probe occurs while ovs-vswitchd is
43654373
* running (e.g., an ovs-dpctl meter command is called), so choose a
4366-
* random high meter id to make this less likely to occur. */
4367-
ofproto_meter_id id1 = { 54545401 };
4368-
ofproto_meter_id id2 = { 54545402 };
4374+
* high meter id to make this less likely to occur.
4375+
*
4376+
* In Linux kernel v5.10+ meters are stored in a table that is not
4377+
* a real hash table. It's just an array with 'meter_id % size' used
4378+
* as an index. The numbers are chosen to fit into the minimal table
4379+
* size (1024) without wrapping, so these IDs are guaranteed to be
4380+
* found under normal conditions in the meter table, if such meters
4381+
* exist. It's possible to break this check by creating some meters
4382+
* in the kernel manually with different IDs that map onto the same
4383+
* indexes, but that should not be a big problem since ovs-vswitchd
4384+
* always allocates densely packed meter IDs with an id-pool.
4385+
*
4386+
* These IDs will also work in cases where the table in the kernel is
4387+
* a proper hash table. */
4388+
ofproto_meter_id id1 = { 1021 };
4389+
ofproto_meter_id id2 = { 1022 };
43694390
struct ofputil_meter_band band = {OFPMBT13_DROP, 0, 1, 0};
43704391
struct ofputil_meter_config config1 = { 1, OFPMF13_KBPS, 1, &band};
43714392
struct ofputil_meter_config config2 = { 2, OFPMF13_KBPS, 1, &band};
43724393

4394+
/* First check if these meters are already in the kernel. If we get
4395+
* a proper response from the kernel with all the good meter IDs, then
4396+
* meters are likley supported correctly. */
4397+
if (!dpif_netlink_meter_get(dpif, id1, NULL, 0)
4398+
|| !dpif_netlink_meter_get(dpif, id2, NULL, 0)) {
4399+
return false;
4400+
}
4401+
43734402
/* Try adding two meters and make sure that they both come back with
43744403
* the proper meter id. Use the "__" version so that we don't cause
43754404
* a recurve deadlock. */

tests/system-traffic.at

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2813,6 +2813,34 @@ OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
28132813
OVS_TRAFFIC_VSWITCHD_STOP
28142814
AT_CLEANUP
28152815

2816+
AT_BANNER([Meters])
2817+
2818+
AT_SETUP([meters - datapath probe])
2819+
dnl Linux kernel 4.18+ should properly support meters.
2820+
OVS_CHECK_MIN_KERNEL(4, 18)
2821+
OVS_TRAFFIC_VSWITCHD_START()
2822+
2823+
dnl Create a lot of meters.
2824+
for i in $(seq 1023); do
2825+
AT_CHECK([ovs-ofctl -OOpenFlow13 add-meter br0 "meter=$i kbps band=type=drop rate=100"])
2826+
done
2827+
2828+
dnl Kill the process to avoid any potential datapath cleanups.
2829+
pid=$(cat ovs-vswitchd.pid)
2830+
AT_CHECK([kill $pid])
2831+
OVS_WAIT_WHILE([kill -0 $pid])
2832+
2833+
dnl Start it back.
2834+
AT_CHECK([ovs-vswitchd --detach --no-chdir --pidfile --log-file], [0], [], [stderr])
2835+
on_exit "kill_ovs_vswitchd"
2836+
2837+
dnl It should still be possible to create meters.
2838+
AT_CHECK([ovs-ofctl -OOpenFlow13 add-meter br0 "meter=1 kbps band=type=drop rate=100"])
2839+
AT_CHECK([grep -q "broken meter implementation" ovs-vswitchd.log], [1])
2840+
2841+
OVS_TRAFFIC_VSWITCHD_STOP(['/terminating with signal/d'])
2842+
AT_CLEANUP
2843+
28162844
AT_BANNER([MPLS])
28172845

28182846
AT_SETUP([mpls - encap header dp-support])

0 commit comments

Comments
 (0)